Polars: rs-0.32.0 Release

Release date:
August 14, 2023
Previous version:
rs-0.31.1 (released July 15, 2023)
Magnitude:
30,821 Diff Delta
Contributors:
33 total committers
Data confidence:
Commits:

256 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored August 10, 2023

Top Contributors in rs-0.32.0

ritchie46
stinodego
alexander-beedie
orlp
MarcoGorelli
c-peters
zundertj
magarick
cmdlineluser
SeanTroyUWO

Directory Browser for rs-0.32.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸ† Highlights

  • common subexpression elemination (#9632)

πŸ’₯ Breaking changes

  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)

⚠️ Deprecations

  • renaming approx_unique as approx_n_unique (#10290)
  • remove/deprecate cache and its logic (#10066)
  • Add date_ranges/time_ranges expression functions (#10005)

πŸš€ Performance improvements

  • pre-alloc int_ranges (#10399)
  • use hash as CSE Identifier (#10385)
  • re-use regex capture allocation (#10302) (#10335)
  • don't parallelize literal expressions (#10321)
  • fix O(n^2) in sorted check during append (#10241)
  • speedup mode on sorted data (#10084)
  • speedup boolean apply (#10073)
  • shrink alp/lp ~2.5x (#10039)
  • Remove fused arithmetic from expressions with literals (#10011)

✨ Enhancements

  • quote style option for csv writer (#10422)
  • add "raise_if_empty" flag to read_excel, read_csv, scan_csv, and read_csv_batched (#10409)
  • be more permissive on predicate pushdown to left side of left join (#10442)
  • add use_earliest to to_datetime / strptime (#10426)
  • {any/all}_horizontal to expression architecture (#10412)
  • serialize flags (#10140)
  • allow unaligned pointers in arrow FFI (#10403)
  • add line_terminator option to write_csv (#10373)
  • Add is_local and to_local to categorical namespace (#10372)
  • cse for groupby.agg and reduced cse collisions (#10381)
  • re-use regex capture allocation (#10302) (#10335)
  • Add Series.cat.uses_lexical_ordering (#10325)
  • improve datetime parsing error message (#10332)
  • allow sequential runners in select/with_columns (#10322)
  • improve err msg parsing time, date, datetime (#10298)
  • Add str.extract_groups (#10179)
  • add extra build profiles (#10268)
  • Extend datetime expression function with time zone/time unit parameters (#10235)
  • added gcs to gcp cloud schema in polars-core::cloud #10206. (#10207)
  • support writing duration type in json (#10112)
  • inline lit(Series).cast(..) to -> lit(Series.cast(..)) (#10092)
  • Move transpose naming to Rust (#10009)
  • cse in groupby's (#10062)
  • Adds sql CASE statement expressions (#10065)
  • Add date_ranges/time_ranges expression functions (#10005)
  • comm_subexpr_elim in streaming 'select/with_columns' (#10050)
  • common subexpression elemination (#9632)
  • Let qcut create evenly spaced probabilities (#9960)
  • sorted flag on singletons (#9933)
  • maintain sorted flag after partition_by (#9944)
  • keep sorted flag in streaming left join (#9932)
  • Add cloudpickle for serializing python UDFs (#9921)

🐞 Bug fixes

  • Fix incorrect handling of VisitRecursion::Skip. (#10452)
  • fix negative decimal parsing (#10444)
  • ensure sorted_sink hash equals the default path (#10464)
  • fix sum agg (#10459)
  • ensure last aggregation deals with default chunk (#10453)
  • fix cse input schema (#10450)
  • fix list groupby of array dtype (#10408)
  • correct AnyValue::hash (#10391)
  • finalize cast in partitioned groupby (#10359)
  • fix oob in 'last' (#10329)
  • fix categorical lexical sort (#10318)
  • Fix join validation (#10257)
  • Set correct dtype for .extract_groups() (#10306)
  • clear window cache and run windows on proper runners (#10303)
  • fix sorted fast path in streaming groupby wrt nulls (#10289)
  • fix nan aggregation in groupby (#10287)
  • check dtypes of single-column 'by' parameter in asof-join (#10284)
  • fix pyo3 link errors on macos (#10256)
  • fix empty streaming parquet file (#10252)
  • fix logical columns of streaming multi-column sort (#10250)
  • fix date/datetime parsing for short inputs with exact=False (#10231)
  • correct agg_sum for ChunkedArray. (#10243)
  • don't panic in wildcard apply (#10240)
  • fix cse profile (#10239)
  • correct struct null counts (#10142)
  • no cse in groupby until fixed (#10216)
  • fix is_in on empty series (#10195)
  • fix cse windows (#10197)
  • block predicate pushdown is_in and null producing … (#10194)
  • prevent re-ordering of dict keys inside .apply (#10172)
  • initialize fixed null values (#10192)
  • ensure window function run partitioned when cse is hit (#10170)
  • adjust for null values in str.replace fast path (#10132)
  • clear bit settings in list iteration (#10131)
  • use row-encoded for struct::is_sorted (#10129)
  • fix(rust, python): don't run file-caching in streaming mode (#10117)
  • Allow initialize of pl.Array in Dataframe using schema alone (#10100)
  • don't panic if masked out values are invalid in temporal kernels (#10114)
  • Fix struct get field by index out of bounds error. (#10097)
  • fix ub in simd-json (#10093)
  • fix invalid access when groupby rolling produces empty sets (#10109)
  • respect null_on_oob=False in list.take when pa… (#10105)
  • fix is_sorted for structs (#10099)
  • add file path to io error in scan_csv (#10076)
  • fix false positive in parquet stats evaluation (#10087)
  • fix error message from cast-timezone to replace-time-zone (#10089)
  • Address .col(regex).exclude() operations not executing. (#10025)
  • fix Boolean::isin(null values) (#10074)
  • predicate pushdown #10058 (#10071)
  • Fix weighted quantile for 0 weights (#10051)
  • fix incorrect state in projection pushdown with joins (#9987)
  • don't pass predicates referring to renamed literal… (#9965)
  • fix regression in regex expansion (#9952)
  • potential SO in csv infer schema (#9950)
  • raise on unsupported transpose and object types (#9946)
  • Fix as-of join when by groups are interleaved (#9938)

πŸ› οΈ Other improvements

  • fix and run polars-plan tests (#10465)
  • Simplify flag methods (#10429)
  • match_block_trailing_comma (#10414)
  • implement ChunkArray::(try_)from_chunk_iter (#10395)
  • add test for 10401 (#10405)
  • Bump some dependencies (#10396)
  • Move dependency version info to workspace level (#10295)
  • patch reedline until fix released (#10382)
  • remove wasm-timer dependency (#10347)
  • write down invariants of ChunkedArray (#10334)
  • fix typo in lib.rs (#10313)
  • Exclude examples from workspace default (#10309)
  • Update CODEOWNERS (#10261)
  • avoid outputting docs of dependencies (#10292)
  • Do not keep history in gh-pages branch (#10282)
  • Use workspace package info / organize dependencies section (#10279)
  • fix dead links in Rust documentation (#10251)
  • Fix make pre-commit command (#10205)
  • Fix make integration-tests command (#10202)
  • Replace "question" issues with link to Stack Overflow (#10230)
  • Update dependabot config (#10222)
  • Fix LICENSE symlink for moved crates (#10150)
  • Re-organize folder structure for Rust crates (#10141)
  • update to rustc nightly-2023-07-27 (#10139)
  • temporarily turn off fail-fast so that ubuntu tests run (#10133)
  • Refactor when/then/otherwise internals (#9922)
  • move replace_time_zone to polars-ops (#10078)
  • remove unneeded branch (#10082)
  • remove deprecate tz_localize, name CastTimezone to ReplaceTimeZone (#10070)
  • fix typo in contribution example (#10038)
  • correct example in API reference (#10032)
  • add developer contribution examples (#10013)
  • Update autolabeler again (#9984)
  • fix docs build and add to CI (#9904)
  • Minor makeover for Rust Makefile (#9874)

Thank you to all our contributors for making this release possible! @0xbe7a, @CanglongCl, @JulianCologne, @MarcoGorelli, @OndrejSlamecka, @OneRaynyDay, @SeanTroyUWO, @StefanBRas, @TLouf, @alexander-beedie, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @duvenagep, @eltociear, @fsimkovic, @ion-elgreco, @jonashaag, @lfn3, @magarick, @mcrumiller, @orlp, @potzenhotz, @rea1bacon, @reswqa, @rikkaka, @ritchie46, @stinodego, @thomasaarholt, @varunmittal91 and @zundertj