π₯ Breaking changes
- Remove old streaming engine (#23103)
π Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of
(#22903)
- Optimise low-level
null
scans and arg_max
for bools (when chunked) (#22897)
- Optimize multiscan performance (#22886)
β¨ Enhancements
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Make match_chunks public (#23101)
- Implement StructFunction expressions in into_py (#23022)
- Basic implementation of
DataTypeExpr
in Rust DSL (#23049)
- Add
required: bool
to ParquetFieldOverwrites
(#23013)
- Support serializing
name.map_fields
(#22997)
- Support serializing
Expr::RenameAlias
(#22988)
- Remove duplicate verbose logging from
FetchedCredentialsCache
(#22973)
- Add
keys
column in finish_callback
(#22968)
- Add
extra_columns
parameter to scan_parquet
(#22699)
- Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Add and test DataFrame equality functionality (#22865)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
π Bug fixes
- Restrict custom
aggregate_function
in pivot
to pl.element()
(#23155)
- Don't leak
SourceToken
in in-memory sink linearize (#23201)
- Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncate
when mixing month/week/day/sub-daily units (#23176)
- Materialize
list.eval
with unknown type (#23186)
- Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat
(#23137)
- Ensure projection pushdown maintains right table schema (#22603)
- Don't create i128 scalars if dtype-128 is not set (#23118)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__
(#23074)
- Fix
AssertionError
when using scan_delta()
on AWS with storage_options
(#23076)
- Fix deadlock on
collect(background=True)
/ collect_concurrently()
(#23075)
- Incorrect null count in rolling_min/max (#23073)
- Preserve
file://
in LazyFrame node traverser (#23072)
- Respect column order in
register_io_source
schema (#23057)
- Incorrect output when using
sort
with group_by
and cum_sum
(#23001)
- Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nulls
to Agg::Count
CSE check (#23032)
- View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect
size_hint()
for FlatIter
(#23010)
- Fix incorrect result selecting
pl.len()
from scan_csv
with skip_lines
(#22949)
- Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfill
was inconsistent with Python and pandas when string contained leading '+' (#22985)
- Integer underflow in
propagate_nulls
(#22986)
- Fix cum_min and cum_max does not preserve inf or -inf values at series start (#22896)
- Setting
compat_level=0
for sink_ipc
(#22960)
- Support arrow Decimal32 and Decimal64 types (#22954)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__
when only single input (#22913)
- Add inline implodes in type coercion (#22885)
- Correct
int_ranges
to raise error on invalid inputs (#22894)
- Set the sorted flag on Array after it is sorted (#22822)
- Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csv
with storage_options
(#22881)
- Schema resolution
.over(mapping_strategy="join")
with non-aggregations (#22875)
- Ensure rename behaves the same as select (#22852)
π Documentation
- Update when_then in user guide (#23245)
- Minor improvement to
cum_count
docstring example (#23099)
- Add missing entry for LazyFrame
__getitem__
(#22924)
π¦ Build system
- Actually disable
ir_serde
by default (#23046)
- Add a feature flag for
serde_ignored
(#22957)
- Fix warnings, update DSL version and schema hash (#22953)
π οΈ Other improvements
- Update Rust Polars versions (#23229)
- Change flake to use venv (#23219)
- Add
default_alloc
feature to py-polars
(#23202)
- Added more descriptive error message by replacing
FixedSizeList
with Array
(#23168)
- Connect Python
assert_series_equal()
to Rust back-end (#23141)
- Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serde
instead of serde
for IRFunctionExpr
(#23148)
- Separate
FunctionExpr
and IRFunctionExpr
(#23140)
- Improve Series equality functionality and prepare for Python integration (#23136)
- Add PolarsPhysicalType and use it to dispatch into_series (#23080)
- Remove
AExpr::Alias
(#23070)
- Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode
(#23060)
- Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_eval
into its own AExpr
(#22994)
- Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Use a ref-counted
UniqueId
instead of usize
for cache_id
(#22984)
- Implement
Hash
and use SpecialEq
for RenameAliasFn
(#22989)
- Turn
list.eval
into an AExpr
(#22911)
- Only check for unknown DSL fields if minor is higher (#22970)
- Don't enable
ir_serde
together with serde
(#22969)
- Make dtype field on Logical non-optional (#22966)
- Add new (Frozen)Categories and CategoricalMapping (#22956)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta
(#22906)
- Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @borchero, @bschoenmaeckers, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @math-hiyoko, @mcrumiller, @mrkn, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst, @thomasfrederikhoeck and @zyctree