Polars: py-1.33.0 Release

Release date:
September 1, 2025
Previous version:
py-1.33.0-beta.1 (released August 28, 2025)
Magnitude:
1,435 Diff Delta
Contributors:
8 total committers
Data confidence:
Commits:

28 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 1, 2025
Authored August 29, 2025
Authored September 1, 2025
Authored September 1, 2025
Authored August 29, 2025
Authored August 28, 2025
Authored August 30, 2025

Top Contributors in py-1.33.0

ritchie46
orlp
coastalwhite
eitsupi
alexander-beedie
gab23r
etiennebacher
mcrumiller

Directory Browser for py-1.33.0

All files are compared to previous version, py-1.33.0-beta.1. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ’₯ Breaking changes

  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

πŸš€ Performance improvements

  • Native streaming int_range with len or count (#24280)
  • Lower arg_unique natively to the streaming engine (#24279)
  • Move unordering optimization to end (#24286)
  • Do ordering simplification step after common sub-plan elimination (#24269)
  • Always simplify order requirements in IR (#24192)
  • Basic de-duplication of filter expressions (#24220)
  • Cache the IR in pipe_with_schema (#24213)
  • Lower arg_where natively to streaming engine (#24088)
  • Lower Expr.shift to streaming engine (#24106)
  • Lower order-preserving groupby to streaming engine (#24053)

✨ Enhancements

  • Add CSE for custom io sources using pointer for hashing (#24297)
  • Allow pl.Expr.log to take in an expression (#24226)
  • Add caching to user credential providers (#23789)
  • Expose mkdir parameter on write_parquet (#24239)
  • Implement diff() in streaming engine (#24189)
  • Enable Expr.diff(n) for negative n (#24200)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
  • Drop PyArrow requirement for write_database with the ADBC engine (#24136)
  • Add a deprecation warning for pl.Series.shift(Null) (#24114)
  • Improve Debug formatting of DataType (#24056)
  • Add LazyFrame.pipe_with_schema (#24075)
  • Catch additional temporal attributes in BytecodeParser function analysis (#24076)
  • Add cum_* as native streaming nodes (#23977)
  • Add peak_{min,max} support for booleans (#24068)
  • Add DataFrame.map_columns for eager evaluation (#23821)

🐞 Bug fixes

  • Invalid conversion from non-bit numpy bools (#24312)
  • Make dt.epoch('s') serializable (#24302)
  • Make Expr.rechunk serializable (#24303)
  • Schema mismatch for 'log' operation (#24300)
  • Incorrect first/last aggregate in streaming engine (#24289)
  • Fix group offsets in sliced groups (#24274)
  • Panic in inexact date(time) conversion (#24268)
  • Keep DSL cache after serialization and deserialization (#24265)
  • Sanitize and warn about eval usage (#24262)
  • Correct incorrect default in from_pandas overload for include_index (#24258)
  • Unique with keep="none" in new optimization pass (#24261)
  • Correct size limits for Decimal cast (#24252)
  • Unordered unions in check order observing pass (#24253)
  • Fix dtype for slice on Literal in agg context (#24137)
  • Fix incorrect filter(lit(True)) when scanning hive (#24237)
  • In-memory group_by on 128-bit integers (#24242)
  • Fix panic in gather inside groupby with invalid indices (#24182)
  • Release the GIL in map_groups (#24225)
  • Remove extra explode in LazyGroupBy.{head,tail} (#24221)
  • Fix panic in polars cloud CSV scan (#24197)
  • Fix panic when loading categorical columns from IO plugin (#24205)
  • Fix credential provider did not auto-init on partition sinks (#24188)
  • Fix engine type for concat_list on AggScalar implode (#24160)
  • Rolling_mean handle centered weights with len(values) < window_size (#24158)
  • Reading is_in predicate for Parquet plain strings (#24184)
  • Support native DuckDB connection in read_database (#24177)
  • Make PyCategories pickleable (#24170)
  • Remove unused unsound function to_mutable_slice (#24173)
  • PyO3 extension types giving compat_level errors (#24166)
  • Allow non-elementwise by in top_k (#24164)
  • Fix sort_by for group_by_dynamic context (#24152)
  • Input-independent length aggregations in streaming (#24153)
  • Release GIL when iterating df in to_arrow (#24151)
  • Respect non-elementwise join_where conditions (#24135)
  • Fix mismatched pytest test collection error (#24133)
  • Resolve schema mismatch for div on Boolean (#24111)
  • Fix from_repr parsing of negative durations (#24115)
  • Make group_by/partition_by iterator keys tuple[Any, ...] to enable tuple-unpacking (#24113)
  • Keep name when doing empty group-aware aggregation (#24098)
  • Implode instead of reshape_list (#24078)
  • Rolling mean with weights incorrect when min_samples < window_size (#23485)
  • Allow merge_sorted for all types (#24077)
  • Include datatypes in row_encode expression (#24074)
  • Include UDF materialized type in serialization (#24073)
  • Correct .rolling() output type for non-aggregations (#24072)
  • Correct planner output schema for join_asof (#24071)
  • Correct output for fold and reduce (#24069)
  • Expr.meta.output_name for struct fields (#24064)
  • Ensure upcast operations on pl.Date default to microsecond precision (#23981)
  • Add peak_{min,max} support for booleans (#24068)
  • Planner output type for mean with strange input type (#24052)
  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

πŸ“– Documentation

  • Fix few typos (#24305)
  • Add missing reference to LazyFrame.pipe_with_schema() on the website (#24285)
  • Automatically register doctest.ELLIPSIS so we don't have to add the inline directive each time (#24146)
  • Update categorical comparison documentation in user guide (#24249)
  • Add missing references for Seriers.rolling_*_by methods (#24254)
  • Fix formatting of Series.value_counts examples (#24245)
  • Add hint to use DataFrame/Series constructors in from_arrow docstring (#22942)
  • Update GPU un/supported features (#24195)
  • Add DataFrame.map_columns to API (#24128)
  • Update multiple pages in the Polars Cloud user guide (#23661)
  • Fix str.find_many() docstring example (#24092)

πŸ“¦ Build system

  • Re-enable macos-x86-64 (#24266)
  • Drop binary support for macos_x86-64 (#24257)

πŸ› οΈ Other improvements

  • Remove PDS-H code (#24301)
  • Get ready for even more cloud tests (#24292)
  • Add tests for slices with caches (#24288)
  • Readd ordering tests (#24284)
  • Fix Makefile venv path (#24251)
  • Remove unnecessary parentheses (#24244)
  • Make non-nested shift{,_and_fill} ops generic (#24224)
  • Remove unused Wrap (#24214)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Automatically label a few more types of PR (#24147)
  • Update toolchain (#24156)
  • Add order_sensitive property for AExpr (#24116)
  • Mark more tests as not possible on cloud (#24103)
  • Turn AggExpr::Count from tuple to struct (#24096)
  • Mark tests that may fail in cloud (#24067)
  • Extend read database tests to capture more ADBC functionality (#24002)
  • Make CI perf failures more lenient (#24066)
  • Fix hive partition string encoding in CI by upgrading deltalake (#24018)
  • Make tests with sinks run on cloud again (#24048)

Thank you to all our contributors for making this release possible! @Kevin-Patyk, @MarcoGorelli, @NeejWeej, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-