π₯ Breaking changes
- Remove old streaming engine (#23103)
β οΈ Deprecations
- Deprecate
allow_missing_columns
in scan_parquet
in favor of missing_columns
(#22784)
π Performance improvements
- Improve streaming groupby CSE (#23092)
- Move row index materialization in post-apply to occur after slicing (#22995)
- Add first_(true|false)_idx to BooleanChunked and use in bool arg_(min|max) (#22907)
- Don't go through row encoding for most types on
index_of
(#22903)
- Optimise low-level
null
scans and arg_max
for bools (when chunked) (#22897)
- Optimize multiscan performance (#22886)
β¨ Enhancements
- DataType expressions in Python (#23167)
- Native implementation for Iceberg positional deletes (#23091)
- Remove old streaming engine (#23103)
- Basic implementation of
DataTypeExpr
in Rust DSL (#23049)
- Add
required: bool
to ParquetFieldOverwrites
(#23013)
- Support serializing
name.map_fields
(#22997)
- Support serializing
Expr::RenameAlias
(#22988)
- Remove duplicate verbose logging from
FetchedCredentialsCache
(#22973)
- Add
keys
column in finish_callback
(#22968)
- Add
extra_columns
parameter to scan_parquet
(#22699)
- Add CORR function to polars SQL (#22690)
- Add per partition sort and finish callback to sinks (#22789)
- Support descendingly-sorted values in search_sorted() (#22825)
- Derive DSL schema (#22866)
π Bug fixes
- Remove axis in
show_graph
(#23218)
- Remove axis ticks in
show_graph
(#23210)
- Restrict custom
aggregate_function
in pivot
to pl.element()
(#23155)
- Don't leak
SourceToken
in in-memory sink linearize (#23201)
- Fix panic reading empty parquet with multiple boolean columns (#23159)
- Raise ComputeError instead of panicking in
truncate
when mixing month/week/day/sub-daily units (#23176)
- Materialize
list.eval
with unknown type (#23186)
- Only set sorting flag for 1st column with PQ SortingColumns (#23184)
- Typo in AExprBuilder (#23171)
- Null return from var/std on scalar column (#23158)
- Support Datetime broadcast in
list.concat
(#23137)
- Ensure projection pushdown maintains right table schema (#22603)
- Add Null dtype support to arg_sort_by (#23107)
- Raise error by default on invalid CSV quotes (#22876)
- Fix group_by mean and median returning all nulls for Decimal dtype (#23093)
- Fix hive partition pruning not filtering out
__HIVE_DEFAULT_PARTITION__
(#23074)
- Fix
AssertionError
when using scan_delta()
on AWS with storage_options
(#23076)
- Fix deadlock on
collect(background=True)
/ collect_concurrently()
(#23075)
- Incorrect null count in rolling_min/max (#23073)
- Preserve
file://
in LazyFrame node traverser (#23072)
- Respect column order in
register_io_source
schema (#23057)
- Don't call unnest for objects implementing
__arrow_c_array__
(#23069)
- Incorrect output when using
sort
with group_by
and cum_sum
(#23001)
- Implement owned arithmetic for Int128 (#23055)
- Do not schema-match structs with different field counts (#23018)
- Fix confusing error message on duplicate row_index (#23043)
- Add
include_nulls
to Agg::Count
CSE check (#23032)
- View buffer exceeding 2^32 - 1 bytes in concatenate_view (#23017)
- Fix incorrect result selecting
pl.len()
from scan_csv
with skip_lines
(#22949)
- Allow for IO plugins with reordered columns in streaming (#22987)
- Method
str.zfill
was inconsistent with Python and pandas when string contained leading '+' (#22985)
- Integer underflow in
propagate_nulls
(#22986)
- Setting
compat_level=0
for sink_ipc
(#22960)
- Narrow return type for
DataType.is_
, improve Pyright's type completeness from 69% to 95% (#22962)
- Support arrow Decimal32 and Decimal64 types (#22954)
- Guard against dictionaries being passed to projection keywords (#22928)
- Update arrow format (#22941)
- Fix filter pushdown to IO plugins (#22910)
- Improve numeric stability rolling_mean<f32> (#22944)
- Guard against invalid nested objects in 'map_elements' (#22932)
- Allow subclasses in type equality checking (#22915)
- Return early in
pl.Expr.__array_ufunc__
when only single input (#22913)
- Add inline implodes in type coercion (#22885)
- Add {top, bottom}_k_by to Series (#22902)
- Correct
int_ranges
to raise error on invalid inputs (#22894)
- Don't silently overflow for temporal casts (#22901)
- Fix error using
write_csv
with storage_options
(#22881)
- Schema resolution
.over(mapping_strategy="join")
with non-aggregations (#22875)
- Ensure rename behaves the same as select (#22852)
π Documentation
- Document aggregations that return identity when there's no non-null values, suggest workaround for those who want SQL-standard behaviour (#23143)
- Fix reference to non-existent
Expr.replace_all
in replace_strict
docs (#23144)
- Fix typo on pandas comparison page (#23123)
- Minor improvement to
cum_count
docstring example (#23099)
- Add missing
DataFrame.__setitem__
to API reference (#22938)
- Add missing entry for LazyFrame
__getitem__
(#22924)
- Add missing
top_k_by
and bottom_k_by
to Series
reference (#22917)
π¦ Build system
- Update
pyo3
and numpy
crates to version 0.25
(#22763)
- Actually disable
ir_serde
by default (#23046)
- Add a feature flag for
serde_ignored
(#22957)
- Fix warnings, update DSL version and schema hash (#22953)
π οΈ Other improvements
- Change flake to use venv (#23219)
- Add
default_alloc
feature to py-polars
(#23202)
- Added more descriptive error message by replacing
FixedSizeList
with Array
(#23168)
- Connect Python
assert_series_equal()
to Rust back-end (#23141)
- Refactor skip_batches to use AExprBuilder (#23147)
- Use
ir_serde
instead of serde
for IRFunctionExpr
(#23148)
- Separate
FunctionExpr
and IRFunctionExpr
(#23140)
- Remove
AExpr::Alias
(#23070)
- Add components for Iceberg deletion file support (#23059)
- Feature gate
StructFunction::JsonEncode
(#23060)
- Propagate iceberg position delete information to IR (#23045)
- Add environment variable to get Parquet decoding metrics (#23052)
- Turn
pl.cumulative_eval
into its own AExpr
(#22994)
- Add make test-streaming (#23044)
- Move scan parameter parsing for parquet to reusable function (#23019)
- Prepare deltalake 1.0 (#22931)
- Implement
Hash
and use SpecialEq
for RenameAliasFn
(#22989)
- Turn
list.eval
into an AExpr
(#22911)
- Fix CI for latest pandas-stubs release (#22971)
- Add a CI check for DSL schema changes (#22898)
- Add schema parameters to
expr.meta
(#22906)
- Update rust toolchain in nix flake (#22905)
- Update toolchain (#22859)
Thank you to all our contributors for making this release possible!
@Athsus, @DahaoALG, @FabianWolff, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @SanjitBasker, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @florian-klein, @i1oveMyse1f, @ion-elgreco, @itamarst, @kdn36, @kutal10, @mcrumiller, @mroeschke, @nameexhaustion, @nikaltipar, @orlp, @paskhaver, @ritchie46, @stijnherfst and @thomasfrederikhoeck