π Highlights
- Add
LazyFrame.{sink,collect}_batches (#23980)
- Deterministic import order for Python Polars package variants (#24531)
π Performance improvements
- Lazy gather for
{forward,backward}_fill in group-by contexts (#25115)
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Skip filtering scan IR if no paths were filtered (#25037)
- Optimize ipc stream read performance (#24671)
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
unique to native group-by and speed up n_unique in group-by context (#24976)
- Better parallelize
take{_slice,}_unchecked (#24980)
- Implement native
skew and kurtosis in group-by context (#24961)
- Use native group-by aggregations for
bitwise_* operations (#24935)
- Address
group_by_dynamic slowness in sparse data (#24916)
- Native
filter/drop_nulls/drop_nans in group-by context (#24897)
- Implement
cumulative_eval using the group-by engine (#24889)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Implement native
null_count, any and all group-by aggregations (#24859)
- Speed up
reverse in group-by context (#24855)
- Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/last on Decimals, Categoricals and Enums (#24786)
- Implement indexed method for
BitMapIter::nth (#24766)
- Pushdown slices on plans within unions (#24735)
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every (#24700)
- Pushdown filter with
strptime if input is literal (#24694)
- Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groups call in aggregated (#24651)
- Skip files in
scan_iceberg with filter based on metadata statistics (#24547)
- Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
β¨ Enhancements
- Improve error message on unsupported SQL subquery comparisons (#25135)
- Rewrite
IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
- Support
ewm_var/std in streaming engine (#25109)
- Make DSL-hash skippable (#25140)
- Streaming
{Expr,LazyFrame}.rolling (#25058)
- Set polars/<version> user-agent (#25112)
- Add
BIT_NOT support to the SQL interface (#25094)
- Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Add
allow_empty flag to item (#25048)
- Support
ewm_mean() in streaming engine (#25003)
- Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.item to strictly extract a single value from an expression (#24888)
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Add
glob parameter to scan_ipc (#24898)
- Prevent generation of copies of
Dataframes in DslPlan serialization (#24852)
- Add
list.agg and arr.agg (#24790)
- Implement
{Expr,Series}.rolling_rank() (#24776)
- Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval (#24472)
- Improve rolling_(sum|mean) accuracy (#24743)
- Add
nth_set_bit_u64() with unit test (#24035)
- Add
separator to {Data,Lazy}Frame.unnest (#24716)
- Add
union() function for unordered concatenation (#24298)
- Add
name.replace to the set of column rename options (#17942)
- Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrame load from list of dicts (#24739)
- Add support for UInt128 to pyo3-polars (#24731)
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}() duration values as fractionals (#24598)
- Support scanning from
file:/path URIs (#24603)
- Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches (#23980)
- Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefix parameter to scan_parquet (#24507)
- Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
π Bug fixes
- Fix CSV
select(len()) off by 1 with comment prefix (#25069)
- Fix incorrect reshape on sliced lists (#25139)
- Support "index" as column name in
group_by iterator (#25138)
- DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Fix panic in
dt.truncate for invalid duration strings (#25124)
- Don't trigger
DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
- Return the correct string-case
Expr reprs (#25101)
- Fix
groups update on slices with different offsets (#25097)
- Fix handling
Null dtype in ApplyExpr on group_by (#25077)
- Raise error for all/any on list instead of panic (#25018)
- Unique key names in streaming sort/top_k (#25082)
- The
SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
- Fix panic if scan predicate produces 0 length mask (#25089)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Panic in
group_by_dynamic with group_by and multiple chunks (#25075)
- Fix panic when using struct field as join key (#25059)
- Allow broadcast in
group_by for ApplyExpr and BinaryExpr (#25053)
- Fix field metadata for nested categorical PyCapsule export (#25052)
- Block predicate pushdown when
group_by key values are changed (#25032)
- Group-By aggregation problems caused by
AmortSeries (#25043)
- Don't push down predicates passed inserted cache nodes (#25042)
- Allow for negative time in
group_by_dynamic iterator (#25041)
- Re-enable CPU feature check before import (#25010)
- Correctness
any(ignore_nulls) and OOB in all (#25005)
- Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asof on a casted expression (#25006)
- Optimize memory on rolling groups in
ApplyExpr (#24709)
- Fallback
Pyarrow scan to in-memory engine (#24991)
- Make
Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
- Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change (#24952)
- Raise length mismatch on
over with sliced groups (#24887)
- Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any / all for group-by (#24940)
- Do not optimize cross join to iejoin if order maintaining (#24950)
- Broadcast
partition_by columns in over expression (#24874)
- Clear index cache on stacked
df.filter expressions (#24870)
- Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index() after scan() silently ignored (#24866)
- Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpr in group_by dispatch logic (#24548)
- Fix aggstate for
gather (#24857)
- Keep scalars for length preserving functions in
group_by (#24819)
- Have
range feature depend on dtype-array feature (#24853)
- Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr (#24650)
- Allow aggregations on
AggState::LiteralScalar (#24820)
- Dispatch to
group_aware for fallible expressions with masked out elements (#24815)
- Fix error for
arr.sum() on small integer Array dtypes containing nulls (#24478)
- Fix XOR did not follow kleene when one side is unit-length (#24810)
- Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlapping instead of rolling (#24787)
- Fix iterable on
dynamic_group_by and rolling object (#24740)
- Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64 (#24775)
- Add
Expr.sign for Decimal datatype (#24717)
- Correct
str.replace with missing pattern (#24768)
- Support
decimal_comma on Decimal type in write_csv (#24718)
- Parse
Decimal with comma as decimal separator in CSV (#24685)
- Make
Categories pickleable (#24691)
- Shift on array within list (#24678)
- Fix handling of
AggregatedScalar in ApplyExpr single input (#24634)
- Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools (#24656)
- Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction (#24590)
- Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
- Fix
unsupported arrow type Dictionary error in scan_iceberg() (#24573)
- Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/diff to polars-plan/abs (#24613)
- Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on any Expr.reshape dimension except the first (#24591)
- Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Have
log() prioritize the leftmost dtype for its output dtype (#24581)
- CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::join for cloud paths replace on absolute paths (#24514)
- Correct dtype for cum_agg in streaming engine (#24510)
- Escape backslashes in EscapeLabel to produce valid DOT labels (#24532)
π Documentation
- Mention Narwhals in ecosystem page (#25100)
- Fix typo in public dataset URL (#25044)
- Introduce remote Polars MCP server (#24977)
- Update Cloud docs with correct fn argument order (#24939)
- Add i128 and u128 features to user guide (#24938)
- Relax fsspec wording (#24881)
- Fix duplicated article in SECURITY.md (#24762)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
- Fix syntax error in data-types-and-structures.md (#24606)
π¦ Build system
- Make building the docs on macOS more reliable (#25095)
- Ensure
build_feature_flags.py is included in artifact (#25024)
- Python pre-release 1.34.0b5 (#24699)
- Use cargo-run to call dsl-schema script (#24607)
π οΈ Other improvements
- Support for named/anonymous aggregations (#25118)
- Silence unused mut warning (#25093)
- Remove old join projection pushdown logic (#25088)
- Disable recursive CSPE for now (#25085)
- Remove unused row-count (#25080)
- Add
proptest strategies for Series logical types (#24849)
- Add stateful
EwmCov kernel (#25065)
- Add IR for
scan_lines (#25066)
- Change group length mismatch error to
ShapeError (#25004)
- Move asof
tolerance type coercion to IR conversion (#25033)
- Move
EwmMeanState to polars-compute (#25034)
- Update toolchain (#25007)
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
- Update row estimation and reader schema in
filter_scan_ir (#24995)
- Insert casts for
ewm_mean inputs in type coercion (#24992)
- Remove unused
expr_eval (#24988)
- Remove symbolic links (#24982)
- Add stateful
EwmMean kernel (#24972)
- Dispatch to no-op rayon thread-pool from streaming (#24957)
- Add function to filter
IR::Scan based on indices (#24979)
- Organize code for opaque functions in a module (#24978)
- Move scan filter code to
polars-mem-engine (#24959)
- Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Expose
polars_compute from polars (#24556)
- Re-use iterators in
set_ operations (#24850)
- Move order code to instance function (#24895)
- Visualization data generator for streaming physical plan (#24896)
- Remove
GroupByPartitioned and dispatch to streaming engine (#24903)
- Improve IR visualization for IEJoin (#24902)
- Turn
element() into {A,}Expr::Element (#24885)
- Pass
ScanOptions to new_from_ipc (#24893)
- Update tests to be index type agnostic (#24891)
- Remove legacy
order_sensitive code (#24894)
- Rename
text_plan_graph to visualization_data (#24878)
- Use
UnifiedScanArgs in new_from_ipc and remove LazyIpcReader (#24883)
- Document safety of
CategoricalToArrowConverter (#24876)
- Unset
Context in Window expression (#24875)
- Unify expression order resolution (#24723)
- Move
FunctionExpr dispatch from plan to expr (#24839)
- Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr (#24825)
- Add
days_in_month to documentation (#24822)
- Enable ruff D417 lint (#24814)
- Turn
pl.format into proper elementwise expression (#24811)
- Fix remote benchmark by no-longer saving builds (#24812)
- Expose function on IPC writer to write dictionary batches (#24802)
- Refactor
ApplyExpr in group_by context on multiple inputs (#24520)
- IR text plan graph generator (#24733)
- Move Series
to_arrow() logic to struct function (#24794)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rolling groups to overlapping (#24577)
- Refactor
DataType proptest strategies (#24763)
- Add
union to documentation (#24769)
- Cleaner whitespace skipping in CSV field parser (#24705)
- Remove duplicate maintain_order from CrossJoinOptions (#24725)
- Change function order flags to be less error prone (#24604)
- Remove
{Upper,Lower}Bound expressions in IR (#24701)
- Fix Makefile
uv pip option syntax (#24711)
- Add egg-info to gitignore (#24712)
- Restructure python project directories again (#24676)
- Use IR for
polars-expr output field resolution (#24661)
- Add
proptest strategies for Series physical types (#24549)
- Expose
CloudScheme via polars::prelude (#24643)
- Remove dist/ from release python workflow (#24639)
- Escape
sed ampersand in release script (#24631)
- Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove tokio-util dependency (#24617)
- Remove unused
UnknownKind::Ufunc (#24614)
- Use cargo-run to call dsl-schema script (#24607)
- Genericize UnitVec for any T (#24597)
- Cleanup and prepare
to_field for element and struct field context (#24592)
- Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add
CloudScheme::FileNoHostname variant (#24535)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Move collapse_joins optimizer logic into predicate pushdown optimizer (#24495)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EndPositive, @EnricoMi, @FBruzzesi, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Object905, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dangotbanned, @deanm0000, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @itamarst, @jan-krueger, @jordanosborn, @kdn36, @lzcmian, @math-hiyoko, @mcrumiller, @mjanssen, @moizescbf, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @thomasjpfan and @williambdean