π Highlights
- Stabilize decimal (#25020)
π Performance improvements
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
unique to native group-by and speed up n_unique in group-by context (#24976)
- Better parallelize
take{_slice,}_unchecked (#24980)
- Implement native
skew and kurtosis in group-by context (#24961)
- Use native group-by aggregations for
bitwise_* operations (#24935)
- Address
group_by_dynamic slowness in sparse data (#24916)
- Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nans in group-by context (#24897)
- Implement
cumulative_eval using the group-by engine (#24889)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Implement native
null_count, any and all group-by aggregations (#24859)
- Speed up
reverse in group-by context (#24855)
- Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/last on Decimals, Categoricals and Enums (#24786)
- Implement indexed method for
BitMapIter::nth (#24766)
- Pushdown slices on plans within unions (#24735)
β¨ Enhancements
- Stabilize decimal (#25020)
- Support
ewm_mean() in streaming engine (#25003)
- Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.item to strictly extract a single value from an expression (#24888)
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len()) (#24602)
- Add
glob parameter to scan_ipc (#24898)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Add
list.agg and arr.agg (#24790)
- Implement
{Expr,Series}.rolling_rank() (#24776)
- Don't require PyArrow for
read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
- Make
Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
- Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval (#24472)
- Drop PyArrow requirement for non-batched usage of
read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
- Improve rolling_(sum|mean) accuracy (#24743)
- Add
separator to {Data,Lazy}Frame.unnest (#24716)
- Add
union() function for unordered concatenation (#24298)
- Add
name.replace to the set of column rename options (#17942)
- Support
np.ndarray -> AnyValue conversion (#24748)
- Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrame load from list of dicts (#24739)
- Add support for UInt128 to pyo3-polars (#24731)
π Bug fixes
- Re-enable CPU feature check before import (#25010)
- Implement
read_excel workaround for fastexcel/calamine issue loading a column subset from a named table (#25012)
- Correctness
any(ignore_nulls) and OOB in all (#25005)
- Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asof on a casted expression (#25006)
- Optimize memory on rolling groups in
ApplyExpr (#24709)
- Fallback
Pyarrow scan to in-memory engine (#24991)
- Make
Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
- Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change (#24952)
- Raise length mismatch on
over with sliced groups (#24887)
- Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any / all for group-by (#24940)
- Do not optimize cross join to iejoin if order maintaining (#24950)
- Fix typing of
scan_parquet partially unknown (#24928)
- Properly release the GIL for
read_parquet_metadata (#24922)
- Broadcast
partition_by columns in over expression (#24874)
- Clear index cache on stacked
df.filter expressions (#24870)
- Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index() after scan() silently ignored (#24866)
- Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpr in group_by dispatch logic (#24548)
- Fix aggstate for
gather (#24857)
- Keep scalars for length preserving functions in
group_by (#24819)
- Have
range feature depend on dtype-array feature (#24853)
- Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr (#24650)
- Allow aggregations on
AggState::LiteralScalar (#24820)
- Dispatch to
group_aware for fallible expressions with masked out elements (#24815)
- Fix error for
arr.sum() on small integer Array dtypes containing nulls (#24478)
- Fix regression on
write_database() to Snowflake due to unsupported string view type (#24622)
- Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
- Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlapping instead of rolling (#24787)
- Fix iterable on
dynamic_group_by and rolling object (#24740)
- Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64 (#24775)
- Add
Expr.sign for Decimal datatype (#24717)
- Correct
str.replace with missing pattern (#24768)
- Ensure
schema_overrides is respected when loading iterable row data (#24721)
- Support
decimal_comma on Decimal type in write_csv (#24718)
π Documentation
- Introduce remote Polars MCP server (#24977)
- Add
{arr,list}.agg API references (#24970)
- Support LLM in docs (#24958)
- Update Cloud docs with correct fn argument order (#24939)
- Update
name.replace examples (#24941)
- Add i128 and u128 features to user guide (#24938)
- Add partitioning examples for
sink_* methods (#24918)
- Add more
{unique,value}_counts examples (#24927)
- Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.field into the api docs (#24846)
- Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
π¦ Build system
- Ensure
build_feature_flags.py is included in artifact (#25024)
- Update pyo3 and numpy crates to version 0.26 (#24760)
π οΈ Other improvements
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
- Remove symbolic links (#24982)
- Deprecate
Expr.agg_groups() and pl.groups() (#24919)
- Dispatch to no-op rayon thread-pool from streaming (#24957)
- Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Re-use iterators in
set_ operations (#24850)
- Remove
GroupByPartitioned and dispatch to streaming engine (#24903)
- Turn
element() into {A,}Expr::Element (#24885)
- Pass
ScanOptions to new_from_ipc (#24893)
- Update tests to be index type agnostic (#24891)
- Unset
Context in Window expression (#24875)
- Fix failing delta test (#24867)
- Move
FunctionExpr dispatch from plan to expr (#24839)
- Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr (#24825)
- Add
days_in_month to documentation (#24822)
- Enable ruff D417 lint (#24814)
- Turn
pl.format into proper elementwise expression (#24811)
- Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpr in group_by context on multiple inputs (#24520)
- IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rolling groups to overlapping (#24577)
- Refactor
DataType proptest strategies (#24763)
- Add
union to documentation (#24769)
Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean