Polars: rs-0.26.0 Release

Release date:
December 22, 2022
Previous version:
rs-0.25.0 (released October 28, 2022)
Magnitude:
35,378 Diff Delta
Contributors:
27 total committers
Data confidence:
Commits:

245 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored November 3, 2022
Authored December 10, 2022
Authored December 17, 2022

Top Contributors in rs-0.26.0

ritchie46
stinodego
alexander-beedie
braaannigan
ghuls
chitralverma
universalmind303
zundertj
dannyvankooten
YuRiTan

Directory Browser for rs-0.26.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

⚠️ Breaking changes

  • remove Series::append_array (#5681)
  • iso weekday (#5598)

🚀 Performance improvements

  • improve reducing window function performance ~33% (#5878)
  • impove performance reducing window functions with numeric output ~-14% (#5841)
  • set_sorted flag when creating from literal (#5728)
  • use sorted fast path in streaming groupby (#5727)
  • ensure fast_explode propagates (#5676)
  • fix quadratic time complexity of groupby in stream… (#5614)
  • Aggregate projection pushdown (#5556)
  • improve streaming primitve groupby (#5575)
  • vectorize integer vec-hash by using very simple, … (#5572)
  • specialized utf8 groupby in streaming (#5535)

✨ Enhancements

  • make get_any_value fallible (#5877)
  • directly push all operator result into sink, prev… (#5856)
  • add sink_parquet (#5480)
  • Support parsing more float string representations. (#5824)
  • implement mean aggregation for duration (#5807)
  • implement sensible boolean aggregates (#5806)
  • allow expression as quantile input (#5751)
  • accept expression in str.extract_all (#5742)
  • tz-aware strptime (#5736)
  • Add "fmt_no_tty" feature for formatting support without r… (#5725)
  • lazy diagonal concat. (#5647)
  • to_struct add upper_bound (#5714)
  • inversely scale chunk_size with thread count in s… (#5699)
  • add streaming minmax (#5693)
  • improve dynamic inference of anyvalues and structs (#5690)
  • support is_in for boolean dtype (#5682)
  • add a cache to strptime (#5628)
  • add nearest interpolation strategy (#5626)
  • make cast recursive (#5596)
  • add arg_min/arg_max for series of dtype boolean (#5592)
  • prefer streaming groupby if partitionable (#5580)
  • make map_alias fallible (#5532)
  • pl.min & pl.max accept wildcard similar to pl.sum (#5511)
  • add predicate pushdown to anonymous_scan (#5467)
  • make streaming work with multiple sinks in a sing… (#5474)
  • add streaming slice operation (#5466)
  • run partial streaming queries (#5464)
  • streaming left joins (#5456)
  • file statistics so we only (try to) keep smallest table in memory (#5454)
  • streaming inner joins. (#5400)
  • build_info() provides detailed information how polars was built (#5423)
  • add missing width property to LazyFrame (#5431)
  • allow regex and wildcard in groupby (#5425)
  • Streaming joins architecture and Cross join implementation. (#5339)
  • add support for am/pm notation in parse_dates read_csv (#5373)
  • add reduce/cumreduce expression as an easier fold (#5364)

🐞 Bug fixes

  • fix lazy swapping rename (#5884)
  • improve equality consistency between types (#5873)
  • evaluate whole branch expression to determine if r… (#5864)
  • fix top_k on empty (#5865)
  • fix slice in streaming (#5854)
  • correct invalid type in struct anyvalue access (#5844)
  • don't set fast_explode if null values in list (#5838)
  • duration formatting (#5837)
  • respect fetch in union (#5836)
  • keep f32 dtype in fill_null by int (#5834)
  • err on epoch on time dtype (#5831)
  • fix panic in hmean (#5808)
  • asof join by logical groups (#5805)
  • fix parquet regression upstream in arrow2 (#5797)
  • Fix lazy cumsum and cumprod result types (#5792)
  • fix nested writer (#5777)
  • fix(rust, python) Summation on empty series evaluates to Some(0) (#5773)
  • empty concat utf8 (#5768)
  • projection pushdown with union and asof join (#5763)
  • check null values in asof_join + groupby (#5756)
  • fix generic streaming groupby on logical types (#5752)
  • fix date_range on expressions (#5750)
  • fix dtypes in join_asof_by (#5746)
  • fix group order in binary aggregation (#5744)
  • implement min/max aggregation for utf8 in groupby (#5737)
  • fix all_null/sorted into_groups panic (#5733)
  • asof join 'by', 'forward' combination (#5720)
  • fix pivot on floating point indexes (#5704)
  • fix arange with column/literal input (#5703)
  • fix double projection that leads to uneven union d… (#5700)
  • Fix a bug in floating regex handling used in CSV type inference (#5695)
  • fix asof join schema (#5686)
  • fix owned arithmetic schema (#5685)
  • take glob into account in scan_csv 'with_schema_mo… (#5683)
  • fix boolean schema in agg_max/min (#5678)
  • fix boolean arg-max if all equal (#5680)
  • early error on duplicate names in streaming groupby (#5638)
  • fix streaming groupby aggregate types (#5636)
  • convert panic to err in concat_list (#5637)
  • fix dot diagram of single nodes (#5624)
  • fix dynamic struct inference (#5619)
  • keep dtype when eval on empty list (#5597)
  • fix ternary with list output on empty frame (#5595)
  • fix tz-awareness of truncate (#5591)
  • check chunks before doing chunked_id join optimiza… (#5589)
  • invert cast_time_zone conversion (#5587)
  • asof join ensure join column is not dropped when '… (#5585)
  • fix ub due to invalid dtype on splitting dfs (#5579)
  • fix(rust, python); fix projection pushdown in asof joins (#5542)
  • streaming hstack allow duplicates (#5538)
  • fix streaming empty join panic (#5534)
  • fix duplicate caches in cse and prevent quadratic … (#5528)
  • allow appending categoricals that are all null (#5526)
  • tz-aware strftime (#5525)
  • make 'truncate' tz-aware (#5522)
  • fix coalesce expreession expansion (#5521)
  • fix nested aggregatin in when then and window expr… (#5520)
  • fix sort_by expression if groups already aggregated (#5518)
  • fix bug in batched parquet reader that dropped dfs… (#5506)
  • fix bugs in skew and kurtosis (#5484)
  • compute correct offset for streaming join on multi… (#5479)
  • return error on invalid sortby expression (#5478)
  • add missing AnyValueBuffer specialisation for Duration dtype (#5436)
  • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
  • properly handle json with unclosed strings (#5427)
  • fix null poisoning in rank operation (#5417)
  • correct expr::diff dtype for temporal columns (#5416)
  • fix cse for nested caches (#5412)
  • don't set sorted flag in argsort (#5410)
  • explicit nan comparison in min/max agg (#5403)
  • Correct CSV row indexing (#5385)

🛠️ Other improvements

  • Update rustc and fix clippy (#5880)
  • update arrow (#5862)
  • move join dispatch to polars-ops (#5809)
  • Remove dbg statement from union (#5791)
  • Continue removing compilation warnings (#5778)
  • shrink anyvalue size (#5770)
  • update arrow (#5766)
  • chore(rust,python) Change allow_streaming to streaming (#5747)
  • remove rev-map from ChunkedArray (#5721)
  • simplify fast projection by schema (#5716)
  • Reindent df! docs code (#5698)
  • remove Series::append_array (#5681)
  • Remove unused symbols and uneeded mut qualifier (#5672)
  • Include license files in Rust crates (#5675)
  • Use NaiveTime::from_hms_opt instead of NaiveTime::from_hms (#5664)
  • use xxhash3 for string types (#5617)
  • iso weekday (#5598)
  • Improve contributing guide (#5558)
  • streaming improvements (#5541)
  • Refer to DataFrame::unique instead of distinct (#5482)
  • don't panic if part of query cannot run strea… (#5458)
  • make generic join builder more dry (#5439)
  • use IdHash for streaming groupby generic (#5435)
  • fix freeze/stall when writing more than 2^31 string values to parquet (#5366)

Thank you to all our contributors for making this release possible! @AnatolyBuga, @CalOmnie, @Kuhlwein, @MarcoGorelli, @OneRaynyDay, @YuRiTan, @alexander-beedie, @andrewpollack, @ankane, @braaannigan, @chitralverma, @dannyvankooten, @ghais, @ghuls, @jjerphan, @matteosantama, @messense, @owrior, @pickfire, @ritchie46, @s1ck, @sa-, @slonik-az, @sorhawell, @stinodego, @universalmind303 and @zundertj