⚠️ Breaking changes
- remove Series::append_array (#5681)
- iso weekday (#5598)
🚀 Performance improvements
- improve reducing window function performance ~33% (#5878)
- impove performance reducing window functions with numeric output
~-14%
(#5841)
- set_sorted flag when creating from literal (#5728)
- use sorted fast path in streaming groupby (#5727)
- ensure fast_explode propagates (#5676)
- fix quadratic time complexity of groupby in stream… (#5614)
- Aggregate projection pushdown (#5556)
- improve streaming primitve groupby (#5575)
- vectorize integer vec-hash by using very simple, … (#5572)
- specialized utf8 groupby in streaming (#5535)
✨ Enhancements
- make get_any_value fallible (#5877)
- directly push all operator result into sink, prev… (#5856)
- add sink_parquet (#5480)
- Support parsing more float string representations. (#5824)
- implement mean aggregation for duration (#5807)
- implement sensible boolean aggregates (#5806)
- allow expression as quantile input (#5751)
- accept expression in str.extract_all (#5742)
- tz-aware strptime (#5736)
- Add "fmt_no_tty" feature for formatting support without r… (#5725)
- lazy diagonal concat. (#5647)
- to_struct add upper_bound (#5714)
- inversely scale chunk_size with thread count in s… (#5699)
- add streaming minmax (#5693)
- improve dynamic inference of anyvalues and structs (#5690)
- support is_in for boolean dtype (#5682)
- add a cache to strptime (#5628)
- add nearest interpolation strategy (#5626)
- make cast recursive (#5596)
- add arg_min/arg_max for series of dtype boolean (#5592)
- prefer streaming groupby if partitionable (#5580)
- make map_alias fallible (#5532)
- pl.min & pl.max accept wildcard similar to pl.sum (#5511)
- add predicate pushdown to anonymous_scan (#5467)
- make streaming work with multiple sinks in a sing… (#5474)
- add streaming slice operation (#5466)
- run partial streaming queries (#5464)
- streaming left joins (#5456)
- file statistics so we only (try to) keep smallest table in memory (#5454)
- streaming inner joins. (#5400)
- build_info() provides detailed information how polars was built (#5423)
- add missing
width
property to LazyFrame
(#5431)
- allow regex and wildcard in groupby (#5425)
- Streaming joins architecture and Cross join implementation. (#5339)
- add support for am/pm notation in parse_dates read_csv (#5373)
- add reduce/cumreduce expression as an easier fold (#5364)
🐞 Bug fixes
- fix lazy swapping rename (#5884)
- improve equality consistency between types (#5873)
- evaluate whole branch expression to determine if r… (#5864)
- fix top_k on empty (#5865)
- fix slice in streaming (#5854)
- correct invalid type in struct anyvalue access (#5844)
- don't set fast_explode if null values in list (#5838)
- duration formatting (#5837)
- respect fetch in union (#5836)
- keep f32 dtype in fill_null by int (#5834)
- err on epoch on time dtype (#5831)
- fix panic in hmean (#5808)
- asof join by logical groups (#5805)
- fix parquet regression upstream in arrow2 (#5797)
- Fix lazy cumsum and cumprod result types (#5792)
- fix nested writer (#5777)
- fix(rust, python) Summation on empty series evaluates to
Some(0)
(#5773)
- empty concat utf8 (#5768)
- projection pushdown with union and asof join (#5763)
- check null values in asof_join + groupby (#5756)
- fix generic streaming groupby on logical types (#5752)
- fix date_range on expressions (#5750)
- fix dtypes in join_asof_by (#5746)
- fix group order in binary aggregation (#5744)
- implement min/max aggregation for utf8 in groupby (#5737)
- fix all_null/sorted into_groups panic (#5733)
- asof join 'by', 'forward' combination (#5720)
- fix pivot on floating point indexes (#5704)
- fix arange with column/literal input (#5703)
- fix double projection that leads to uneven union d… (#5700)
- Fix a bug in floating regex handling used in CSV type inference (#5695)
- fix asof join schema (#5686)
- fix owned arithmetic schema (#5685)
- take glob into account in scan_csv 'with_schema_mo… (#5683)
- fix boolean schema in agg_max/min (#5678)
- fix boolean arg-max if all equal (#5680)
- early error on duplicate names in streaming groupby (#5638)
- fix streaming groupby aggregate types (#5636)
- convert panic to err in concat_list (#5637)
- fix dot diagram of single nodes (#5624)
- fix dynamic struct inference (#5619)
- keep dtype when eval on empty list (#5597)
- fix ternary with list output on empty frame (#5595)
- fix tz-awareness of truncate (#5591)
- check chunks before doing chunked_id join optimiza… (#5589)
- invert cast_time_zone conversion (#5587)
- asof join ensure join column is not dropped when '… (#5585)
- fix ub due to invalid dtype on splitting dfs (#5579)
- fix(rust, python); fix projection pushdown in asof joins (#5542)
- streaming hstack allow duplicates (#5538)
- fix streaming empty join panic (#5534)
- fix duplicate caches in cse and prevent quadratic … (#5528)
- allow appending categoricals that are all null (#5526)
- tz-aware strftime (#5525)
- make 'truncate' tz-aware (#5522)
- fix coalesce expreession expansion (#5521)
- fix nested aggregatin in when then and window expr… (#5520)
- fix sort_by expression if groups already aggregated (#5518)
- fix bug in batched parquet reader that dropped dfs… (#5506)
- fix bugs in skew and kurtosis (#5484)
- compute correct offset for streaming join on multi… (#5479)
- return error on invalid sortby expression (#5478)
- add missing
AnyValueBuffer
specialisation for Duration
dtype (#5436)
- fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
- properly handle json with unclosed strings (#5427)
- fix null poisoning in rank operation (#5417)
- correct expr::diff dtype for temporal columns (#5416)
- fix cse for nested caches (#5412)
- don't set sorted flag in argsort (#5410)
- explicit nan comparison in min/max agg (#5403)
- Correct CSV row indexing (#5385)
🛠️ Other improvements
- Update rustc and fix clippy (#5880)
- update arrow (#5862)
- move join dispatch to polars-ops (#5809)
- Remove dbg statement from union (#5791)
- Continue removing compilation warnings (#5778)
- shrink anyvalue size (#5770)
- update arrow (#5766)
- chore(rust,python) Change allow_streaming to streaming (#5747)
- remove rev-map from ChunkedArray (#5721)
- simplify fast projection by schema (#5716)
- Reindent df! docs code (#5698)
- remove Series::append_array (#5681)
- Remove unused symbols and uneeded
mut
qualifier (#5672)
- Include license files in Rust crates (#5675)
- Use
NaiveTime::from_hms_opt
instead of NaiveTime::from_hms
(#5664)
- use xxhash3 for string types (#5617)
- iso weekday (#5598)
- Improve contributing guide (#5558)
- streaming improvements (#5541)
- Refer to DataFrame::unique instead of
distinct
(#5482)
- don't panic if part of query cannot run strea… (#5458)
- make generic join builder more dry (#5439)
- use IdHash for streaming groupby generic (#5435)
- fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
Thank you to all our contributors for making this release possible!
@AnatolyBuga, @CalOmnie, @Kuhlwein, @MarcoGorelli, @OneRaynyDay, @YuRiTan, @alexander-beedie, @andrewpollack, @ankane, @braaannigan, @chitralverma, @dannyvankooten, @ghais, @ghuls, @jjerphan, @matteosantama, @messense, @owrior, @pickfire, @ritchie46, @s1ck, @sa-, @slonik-az, @sorhawell, @stinodego, @universalmind303 and @zundertj