Polars: rs-0.25.0 Release

Release date:
October 28, 2022
Previous version:
rs-0.24.3 (released September 28, 2022)
Magnitude:
20,012 Diff Delta
Contributors:
19 total committers
Data confidence:
Commits:

183 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 29, 2022
Authored October 24, 2022

Top Contributors in rs-0.25.0

ritchie46
alexander-beedie
ghuls
stinodego
slonik-az
matteosantama
owrior
zundertj
dannyvankooten
hpux735

Directory Browser for rs-0.25.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

Most notable mention this release is the start of Out Of Core support in polars, meaning we are able to process larger than RAM datasets. This is currently supported for parts of queries that read from csv or parquet and are limited to select, filter, and groupby operations. Many more operations will follow in next releases.

See https://github.com/pola-rs/polars/pull/5139#issuecomment-1274687634 where we were able to process a 80GB dataset on a laptop with only 16GB RAM.

Thanks to everyone who contributed to another release! :raised_hands:

⚠️ Breaking changes

  • rename expand_at_index -> new_from_index (#5259)

🚀 Performance improvements

  • lower contention in out of core filter (#5311)
  • improve pivot performance by using faster series… (#5172)
  • improve streaming performance (~15%) (#5170)
  • don't block projection pushdown on unnest (#5123)
  • more conservative JIT sort settings (#5080)
  • sort and unsort join key if other side is sorted (#5069)
  • do not rechunk left joins (#5066)
  • Prune unneeded projections (#5032)
  • Improve predicate pushdown + with_columns (#5029)
  • Don't execute unused with_column expressions (#5026)

✨ Enhancements

  • shrink_type expression (#5351)
  • tz_localize expression (#5340)
  • accept expr in arr.get (#5337)
  • Implement forward strategy in groupby join_asof (#5335)
  • improve dynamic inference of struct types (#5297)
  • Add newline to Aggregate..FROM describe_optimization_plan (#5253)
  • date_range expression (#5267)
  • show expression where error originated if raised … (#5263)
  • improve error msg if window expressions length do… (#5262)
  • Add round for date and datetime (#5153)
  • new n_chars functionality for utf8 strings (#5252)
  • added new Config formatting option set_tbl_column_data_type_inline, fixed reading of env vars, improved interaction between formatting options (#5243)
  • make date_range timezone aware (#5234)
  • Rust functions for typed JsonPath implementation (#5140)
  • allow polars Config options to be serialised/shared, and more easily unset (#5219)
  • batched csv reader (#5212)
  • accept expressions in arr.slice (#5191)
  • is_sorted aggregation fast path for Utf8Chunked (#5184)
  • hybrid streaming query engine (#5139)
  • add binary dtype (#5122)
  • improve function expansion (#5110)
  • add struct arithmetics (#5107)
  • add cumfold/cumsum expression (#5103)
  • error on invalid asof join inputs (#5100)
  • small plan and profile chart improvements (#5067)
  • Initial implementation of histogram algorithm (#4752)

🐞 Bug fixes

  • unnest only pushdown column if there are projections (#5360)
  • block is_null predicate in asof join (#5358)
  • ensure that no-projection is seen as select all in… (#5356)
  • resolve duplicated column names in pivot (#5349)
  • fix serde of expression (pickle) (#5333)
  • don't set auto-explode in apply_multiple (#5265)
  • export anonymousscan in lazy prelude (#5295)
  • fix explicit list + sort aggregation in groupby co… (#5317)
  • fix sort-merge dispatch of utf8 (#5315)
  • properly interpret FMT_MAX_ROWS - remove arbitrary minimum, fix Series formatting (#5281)
  • don't block non matching groups in binary expression (#5273)
  • fix logical type of nested take (#5271)
  • tag IntoSeries trait as unsafe (#5258)
  • include single null value in global cat builder (#5254)
  • include slice in sort fast path (#5247)
  • determine supertype of datetimes with timezones an… (#5240)
  • fix groupby dynamic truncate for > days resolution (#5235)
  • set timezone on groupby_dynamic boundaries (#5233)
  • fix incorrect duration dtype (#5226)
  • set string cache if lazy schema contains categorical (#5225)
  • fix pipeline dtypes (#5224)
  • fix asof_join schema (#5213)
  • fix single thread loop if schema lenght is off by 1 (#5210)
  • improve numeric stability of rolling_variance (#5207)
  • fix overflow in partitioned groupby mean of int32/… (#5204)
  • don't allow categorical append that is not under s… (#5195)
  • include offset in arr.get (#5193)
  • fix rolling_float in case closure returns None (#5180)
  • Implement missing extract conversion for Time datatype (#5161)
  • implement missing conversion to python time object (#5152)
  • microsecond noise on date >> time cast (add 00:00:00 fast-path) (#5149)
  • wrong operator mapped for LtEq (#5120)
  • unique include null (#5112)
  • don't recurse assign uniuns as it SO > 5k files (#5098)
  • block projection pushdown on unnest (#5093)
  • projection_node always do projection locally if no… (#5090)
  • fix iso_year for Date dtype (#5074)
  • fix bug in unneeded projection pruning (#5071)
  • Improve printing controls of DataFrame and Series (#5047)
  • Double projections should be checked on input schema (#5058)
  • Apply flat overlapping row groups when possible (#5039)
  • Ensure all predicates use same key function when inserting… (#5034)
  • Only consider dt series equal if they have the same tz (#5025)
  • Special-case ewm_mean(alpha=1) (#5019)
  • Time zone conversion bug (NY -> UTC works, UTC -> NY doesn't) (#5014)
  • Fix timezone cast (#5016)

🛠️ Other improvements

  • update to rustc to nightly-2022-10-24 (#5312)
  • update ahash and add nightly features of hashbrown (#5310)
  • Update comfy-table and memchr. (#5276)
  • rename expand_at_index -> new_from_index (#5259)
  • ensure streaming groupby take slice into account (#5178)
  • move polars-sql under polars folder (#5176)
  • remove aggregate pushdown optimization (#5173)
  • relax sync requirement on Executor trait impls (#5142)
  • Get rid of unnecessary check in SplitLines iterator (#5141)
  • Constant instead of literal (#5088)
  • Use release-drafter to draft releases with changelogs (#5033)
  • Fix docs by activating docfg feature (#5028)
  • Split up polars-lazy crate. (#5020)

Thank you to all our contributors for making this release possible! @AlecZorab, @YuRiTan, @alexander-beedie, @cjermain, @dannyvankooten, @dpatton-gr, @egorchakov, @ghuls, @hpux735, @matteosantama, @mcrumiller, @owrior, @ritchie46, @slonik-az, @sorhawell, @stinodego, @thatlittleboy, @universalmind303 and @zundertj