π Performance improvements
- Address
group_by_dynamic slowness in sparse data (#24916)
- Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nans in group-by context (#24897)
- Implement
cumulative_eval using the group-by engine (#24889)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Implement native
null_count, any and all group-by aggregations (#24859)
- Speed up
reverse in group-by context (#24855)
- Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/last on Decimals, Categoricals and Enums (#24786)
- Implement indexed method for
BitMapIter::nth (#24766)
- Pushdown slices on plans within unions (#24735)
β¨ Enhancements
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len()) (#24602)
- Add
glob parameter to scan_ipc (#24898)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Add
list.agg and arr.agg (#24790)
- Implement
{Expr,Series}.rolling_rank() (#24776)
- Don't require PyArrow for
read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
- Make
Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
- Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval (#24472)
- Drop PyArrow requirement for non-batched usage of
read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
- Improve rolling_(sum|mean) accuracy (#24743)
- Add
separator to {Data,Lazy}Frame.unnest (#24716)
- Add
union() function for unordered concatenation (#24298)
- Add
name.replace to the set of column rename options (#17942)
- Support
np.ndarray -> AnyValue conversion (#24748)
- Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrame load from list of dicts (#24739)
- Add support for UInt128 to pyo3-polars (#24731)
π Bug fixes
- Properly release the GIL for
read_parquet_metadata (#24922)
- Broadcast
partition_by columns in over expression (#24874)
- Clear index cache on stacked
df.filter expressions (#24870)
- Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index() after scan() silently ignored (#24866)
- Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpr in group_by dispatch logic (#24548)
- Fix aggstate for
gather (#24857)
- Keep scalars for length preserving functions in
group_by (#24819)
- Have
range feature depend on dtype-array feature (#24853)
- Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr (#24650)
- Allow aggregations on
AggState::LiteralScalar (#24820)
- Dispatch to
group_aware for fallible expressions with masked out elements (#24815)
- Fix error for
arr.sum() on small integer Array dtypes containing nulls (#24478)
- Fix regression on
write_database() to Snowflake due to unsupported string view type (#24622)
- Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
- Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlapping instead of rolling (#24787)
- Fix iterable on
dynamic_group_by and rolling object (#24740)
- Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64 (#24775)
- Add
Expr.sign for Decimal datatype (#24717)
- Correct
str.replace with missing pattern (#24768)
- Ensure
schema_overrides is respected when loading iterable row data (#24721)
- Support
decimal_comma on Decimal type in write_csv (#24718)
π Documentation
- Add partitioning examples for
sink_* methods (#24918)
- Add more
{unique,value}_counts examples (#24927)
- Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.field into the api docs (#24846)
- Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
π¦ Build system
- Update pyo3 and numpy crates to version 0.26 (#24760)
π οΈ Other improvements
- Re-use iterators in
set_ operations (#24850)
- Remove
GroupByPartitioned and dispatch to streaming engine (#24903)
- Turn
element() into {A,}Expr::Element (#24885)
- Pass
ScanOptions to new_from_ipc (#24893)
- Update tests to be index type agnostic (#24891)
- Unset
Context in Window expression (#24875)
- Fix failing delta test (#24867)
- Move
FunctionExpr dispatch from plan to expr (#24839)
- Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr (#24825)
- Add
days_in_month to documentation (#24822)
- Enable ruff D417 lint (#24814)
- Turn
pl.format into proper elementwise expression (#24811)
- Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpr in group_by context on multiple inputs (#24520)
- IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rolling groups to overlapping (#24577)
- Refactor
DataType proptest strategies (#24763)
- Add
union to documentation (#24769)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean