π₯ Breaking changes
- Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
π Performance improvements
- Native streaming
int_range
with len
or count
(#24280)
- Lower
arg_unique
natively to the streaming engine (#24279)
- Move unordering optimization to end (#24286)
- Do ordering simplification step after common sub-plan elimination (#24269)
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema
(#24213)
- Lower
arg_where
natively to streaming engine (#24088)
- Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
β¨ Enhancements
- Add CSE for custom io sources using pointer for hashing (#24297)
- Allow pl.Expr.log to take in an expression (#24226)
- Add caching to user credential providers (#23789)
- Expose
mkdir
parameter on write_parquet
(#24239)
- Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Drop PyArrow requirement for
write_database
with the ADBC engine (#24136)
- Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
LazyFrame.pipe_with_schema
(#24075)
- Catch additional temporal attributes in
BytecodeParser
function analysis (#24076)
- Add
cum_*
as native streaming nodes (#23977)
- Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columns
for eager evaluation (#23821)
π Bug fixes
- Invalid conversion from non-bit numpy bools (#24312)
- Make
dt.epoch('s')
serializable (#24302)
- Make
Expr.rechunk
serializable (#24303)
- Schema mismatch for 'log' operation (#24300)
- Incorrect first/last aggregate in streaming engine (#24289)
- Fix group offsets in sliced groups (#24274)
- Panic in inexact date(time) conversion (#24268)
- Keep DSL cache after serialization and deserialization (#24265)
- Sanitize and warn about eval usage (#24262)
- Correct incorrect default in
from_pandas
overload for include_index
(#24258)
- Unique with keep="none" in new optimization pass (#24261)
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
slice
on Literal
in agg context (#24137)
- Fix incorrect
filter(lit(True))
when scanning hive (#24237)
- In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gather
inside groupby with invalid indices (#24182)
- Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}
(#24221)
- Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix credential provider did not auto-init on partition sinks (#24188)
- Fix engine type for
concat_list
on AggScalar implode
(#24160)
- Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_in
predicate for Parquet plain strings (#24184)
- Support native DuckDB connection in read_database (#24177)
- Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice
(#24173)
- PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_by
for group_by_dynamic
context (#24152)
- Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Fix mismatched pytest test collection error (#24133)
- Resolve schema mismatch for div on Boolean (#24111)
- Fix from_repr parsing of negative durations (#24115)
- Make
group_by
/partition_by
iterator keys tuple[Any, ...]
to enable tuple-unpacking (#24113)
- Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list
(#24078)
- Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sorted
for all types (#24077)
- Include datatypes in
row_encode
expression (#24074)
- Include UDF materialized type in serialization (#24073)
- Correct
.rolling()
output type for non-aggregations (#24072)
- Correct planner output schema for
join_asof
(#24071)
- Correct output for
fold
and reduce
(#24069)
- Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Date
default to microsecond precision (#23981)
- Add peak_{min,max} support for booleans (#24068)
- Planner output type for
mean
with strange input type (#24052)
- Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
π Documentation
- Fix few typos (#24305)
- Add missing reference to
LazyFrame.pipe_with_schema()
on the website (#24285)
- Automatically register
doctest.ELLIPSIS
so we don't have to add the inline directive each time (#24146)
- Update categorical comparison documentation in user guide (#24249)
- Add missing references for
Seriers.rolling_*_by
methods (#24254)
- Fix formatting of Series.value_counts examples (#24245)
- Add hint to use
DataFrame/Series
constructors in from_arrow
docstring (#22942)
- Update GPU un/supported features (#24195)
- Add
DataFrame.map_columns
to API (#24128)
- Update multiple pages in the Polars Cloud user guide (#23661)
- Fix
str.find_many()
docstring example (#24092)
π¦ Build system
- Re-enable macos-x86-64 (#24266)
- Drop binary support for macos_x86-64 (#24257)
π οΈ Other improvements
- Remove PDS-H code (#24301)
- Get ready for even more cloud tests (#24292)
- Add tests for slices with caches (#24288)
- Readd ordering tests (#24284)
- Fix Makefile venv path (#24251)
- Remove unnecessary parentheses (#24244)
- Make non-nested shift{,_and_fill} ops generic (#24224)
- Remove unused
Wrap
(#24214)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Automatically label a few more types of PR (#24147)
- Update toolchain (#24156)
- Add
order_sensitive
property for AExpr
(#24116)
- Mark more tests as not possible on cloud (#24103)
- Turn
AggExpr::Count
from tuple to struct (#24096)
- Mark tests that may fail in cloud (#24067)
- Extend read database tests to capture more ADBC functionality (#24002)
- Make CI perf failures more lenient (#24066)
- Fix hive partition string encoding in CI by upgrading
deltalake
(#24018)
- Make tests with sinks run on cloud again (#24048)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @NeejWeej, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-