Polars: rs-0.51.0 Release

Release date:
September 16, 2025
Previous version:
rs-0.50.0 (released July 30, 2025)
Magnitude:
31,090 Diff Delta
Contributors:
22 total committers
Data confidence:
Commits:

238 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 1, 2025
Authored September 9, 2025
Authored August 22, 2025

Top Contributors in rs-0.51.0

coastalwhite
orlp
nameexhaustion
kdn36
ritchie46
r-brink
cgevans
mcrumiller
alexander-beedie
JakubValtar

Directory Browser for rs-0.51.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸ’₯ Breaking changes

  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

πŸš€ Performance improvements

  • Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
  • Allocate only for read items when reading Parquet with predicate (#24401)
  • Don't aggregate groups for strict cast if original len (#24381)
  • Allocate only for read items when reading Parquet with predicate (#24324)
  • Native streaming int_range with len or count (#24280)
  • Lower arg_unique natively to the streaming engine (#24279)
  • Move unordering optimization to end (#24286)
  • Do ordering simplification step after common sub-plan elimination (#24269)
  • Always simplify order requirements in IR (#24192)
  • Basic de-duplication of filter expressions (#24220)
  • Cache the IR in pipe_with_schema (#24213)
  • Lower arg_where natively to streaming engine (#24088)
  • Lower Expr.shift to streaming engine (#24106)
  • Lower order-preserving groupby to streaming engine (#24053)
  • Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
  • Lower top-k to streaming engine (#23979)
  • Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)
  • Support S3 virtual-hosted–style URI (#24405)
  • Remove explicit file create for local async writes (#24358)
  • Support Partitioning sinks in cloud (#24399)
  • User-friendly error message on empty path expansion (#24337)
  • Add Polars security policy (#24314)
  • Allow pl.Expr.log to take in an expression (#24226)
  • Implement diff() in streaming engine (#24189)
  • Enable Expr.diff(n) for negative n (#24200)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
  • Add a deprecation warning for pl.Series.shift(Null) (#24114)
  • Improve Debug formatting of DataType (#24056)
  • Add cum_* as native streaming nodes (#23977)
  • Add peak_{min,max} support for booleans (#24068)
  • Add DataFrame.map_columns for eager evaluation (#23821)
  • Add native streaming for peaks_{min,max} (#24039)
  • IR graph arrows, monospace font, box nodes (#24021)
  • Add DataTypeExpr.default_value (#23973)
  • Lower rle to a native streaming engine node (#23929)
  • Add support for Int128 to pyo3-polars (#23959)
  • Lower rle_id to a native streaming node (#23894)
  • Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
  • Dispatch scan_iceberg to native by default (#23912)
  • Lower unique_counts and value_counts to streaming engine (#23890)
  • Implement dt.days_in_month function (#23119)
  • Fix errors on native scan_iceberg (#23811)
  • Reinterpret binary data to fixed size numerical array (#22840)
  • Make rolling_map serializable (#23848)

🐞 Bug fixes

  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Replace unsafe with collect (#24494)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Emit proper tuple for Log in expression nodes (#24426)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)
  • Correct sink_ipc overload for compression (#24398)
  • Enable all integer dtypes for by parameter in join_asof (#24384)
  • Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
  • Fix incorrect output ordering for row-separable exprs (#24354)
  • Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
  • Match output type to engine for Struct arithmetic (#23805)
  • Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
  • Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
  • Incorrect logic in negative streaming slice (#24326)
  • Do not error on non-list Sequence for columns parameter in read_excel (#23967)
  • Invalid conversion from non-bit numpy bools (#24312)
  • Make dt.epoch('s') serializable (#24302)
  • Make Expr.rechunk serializable (#24303)
  • Schema mismatch for 'log' operation (#24300)
  • Incorrect first/last aggregate in streaming engine (#24289)
  • Fix group offsets in sliced groups (#24274)
  • Panic in inexact date(time) conversion (#24268)
  • The index_of feature should not depends on the object feature (#24256)
  • Keep DSL cache after serialization and deserialization (#24265)
  • Sanitize and warn about eval usage (#24262)
  • Unique with keep="none" in new optimization pass (#24261)
  • Correct size limits for Decimal cast (#24252)
  • Unordered unions in check order observing pass (#24253)
  • Fix dtype for slice on Literal in agg context (#24137)
  • Fix incorrect filter(lit(True)) when scanning hive (#24237)
  • In-memory group_by on 128-bit integers (#24242)
  • Fix panic in gather inside groupby with invalid indices (#24182)
  • Release the GIL in map_groups (#24225)
  • Remove extra explode in LazyGroupBy.{head,tail} (#24221)
  • Fix panic in polars cloud CSV scan (#24197)
  • Fix panic when loading categorical columns from IO plugin (#24205)
  • Fix engine type for concat_list on AggScalar implode (#24160)
  • Rolling_mean handle centered weights with len(values) < window_size (#24158)
  • Reading is_in predicate for Parquet plain strings (#24184)
  • Make PyCategories pickleable (#24170)
  • Remove unused unsound function to_mutable_slice (#24173)
  • PyO3 extension types giving compat_level errors (#24166)
  • Allow non-elementwise by in top_k (#24164)
  • Fix sort_by for group_by_dynamic context (#24152)
  • Input-independent length aggregations in streaming (#24153)
  • Release GIL when iterating df in to_arrow (#24151)
  • Respect non-elementwise join_where conditions (#24135)
  • Resolve schema mismatch for div on Boolean (#24111)
  • Keep name when doing empty group-aware aggregation (#24098)
  • Implode instead of reshape_list (#24078)
  • Rolling mean with weights incorrect when min_samples < window_size (#23485)
  • Allow merge_sorted for all types (#24077)
  • Include datatypes in row_encode expression (#24074)
  • Include UDF materialized type in serialization (#24073)
  • Correct .rolling() output type for non-aggregations (#24072)
  • Correct planner output schema for join_asof (#24071)
  • Allow %B to work without specifying day (#24009)
  • Correct output for fold and reduce (#24069)
  • Expr.meta.output_name for struct fields (#24064)
  • Ensure upcast operations on pl.Date default to microsecond precision (#23981)
  • Add peak_{min,max} support for booleans (#24068)
  • Planner output type for mean with strange input type (#24052)
  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)
  • Scan of multiple sources with null datatype (#24065)
  • Categorical in nested data in row encoding (#24051)
  • Missing length update in builder for pl.Array repetition (#24055)
  • Race condition in global categories init (#24045)
  • Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
  • Error when using named functions (#24041)
  • Don't encode entire CategoricalMapping when going to Arrow (#24036)
  • Fix cast on arithmetic with lit (#23941)
  • Incorrect slice-slice pushdown (#24032)
  • Dedup common cache subplan in IR graph (#24028)
  • Allow join on Decimal in in-memory engine (#24026)
  • Fix datatypes for eval.list in aggregation context (#23911)
  • Allocator capsule fallback panic (#24022)
  • Accept another zlib "magic header" file signature (#24013)
  • Fix truediv dtypes so cast in list.eval is not dropped (#23936)
  • Don't reuse cached return_dtype for expanded map expressions (#24010)
  • Cache id is not a valid dot node id (#24005)
  • Align map_elements with and without return_dtype (#24007)
  • Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
  • Allow serializing LazyGroupBy.map_groups (#23964)
  • Correct allocator name in PyCapsule (#23968)
  • Mismatched types for write function for windows (#23915)
  • Fix unpivot panic when index= column not found (#23958)
  • Fix assert_frame_equal with check_dtypes=False for all-null series with different types (#23943)
  • Return correct python package version (#23951)
  • Categorical namespace functions fail on Enum columns (#23925)
  • Properly set sumwise complete on filter for missing columns (#23877)
  • Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
  • Group By with filters (#23917)
  • Fix read_csv ignoring Decimal schema for header-only data (#23886)
  • Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
  • Writing List(Array) columns to JSON without panic (#23875)
  • Fill Iceberg missing fields with partition values if present in metadata (#23900)
  • Create file for streaming sink even if unspawned (#23672)
  • Update cloud testing environment (#23908)
  • Parquet filtering on multiple RGs with literal predicate (#23903)
  • Incorrect datatype passed to libc::write (#23904)
  • Properly feature gate TZ_AWARE_RE usage (#23888)
  • Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
  • Spawning tokio task outside reactor (#23884)
  • Correctly raise DuplicateError on asof_join with suffix="" (#23864)
  • Fix errors on native scan_iceberg (#23811)
  • Fix index out of bounds panic filtering parquet (#23850)
  • Fix error on empty range requests (#23844)
  • Fix handling of hive partitioning hive_start_idx parameter (#23843)

πŸ“– Documentation

  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Update to Polars Cloud user guide (#24187)
  • Update distributed page (#24323)
  • Add Polars security policy (#24314)
  • Fix few typos (#24305)
  • Add missing reference to LazyFrame.pipe_with_schema() on the website (#24285)
  • Fix formatting of Series.value_counts examples (#24245)
  • Add DataFrame.map_columns to API (#24128)
  • Update multiple pages in the Polars Cloud user guide (#23661)
  • Improve StackOverflow links in contributing guide (#23895)
  • Fix pyo3 documentation page link (#23839)
  • Document the pureness requirements of udfs (#23787)

πŸ“¦ Build system

  • Re-enable macos-x86-64 (#24266)
  • Drop binary support for macos_x86-64 (#24257)

πŸ› οΈ Other improvements

  • Use PlanCallback in name.map_* (#24484)
  • Replace unsafe with collect (#24494)
  • Move dataset expansion to end and refactor not to use stack optimizer (#24457)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add methods to EnumUnitVec and shorten name (#24415)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)
  • Bump c-api (#24412)
  • Add a regression test for #7631 (#24363)
  • Update cloud test InteractiveQuery to DirectQuery (#24287)
  • Mark some tests as slow (#24327)
  • Mark more tests as ready for cloud (#24315)
  • Remove unnecessary stable_features for AVX512 (#24321)
  • Remove PDS-H code (#24301)
  • Get ready for even more cloud tests (#24292)
  • Add tests for slices with caches (#24288)
  • Readd ordering tests (#24284)
  • Expand BitRepr to u8/u16 and use in in_memory group_by (#24248)
  • Fix Makefile venv path (#24251)
  • Remove unnecessary parentheses (#24244)
  • Remove some transmutes (#24246)
  • Wrap Py* data structures in polars-python in locks (#24209)
  • Make non-nested shift{,_and_fill} ops generic (#24224)
  • Remove unused Wrap (#24214)
  • Propagate some python feature flags (#24201)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Automatically label a few more types of PR (#24147)
  • Update toolchain (#24156)
  • InMemoryJoin should be coloured as InMemoryFallback (#24154)
  • Fool-proof retrieve_error_msg (#24132)
  • Add order_sensitive property for AExpr (#24116)
  • Mark more tests as not possible on cloud (#24103)
  • Turn AggExpr::Count from tuple to struct (#24096)
  • Mark tests that may fail in cloud (#24067)
  • Make CI perf failures more lenient (#24066)
  • Fix hive partition string encoding in CI by upgrading deltalake (#24018)
  • Avoid unreachable if dtype feature is not enabled (#24062)
  • Make tests with sinks run on cloud again (#24048)
  • Update pyo3-polars versions (#24031)
  • Remove insert_error_function (#24023)
  • Remove cache hits, clean up in-mem prefill (#24019)
  • Use .venv instead of venv in pyo3-polars examples (#24024)
  • Fix test failing mypy (#24017)
  • Remove outdated comment (#23998)
  • Add a _plr.pyi to remove mypy issues (#23970)
  • Don't define CountStar as dyn OptimizationRule (#23976)
  • Rename atol and rtol to abs_tol and rel_tol (#23961)
  • Introduce Row{Encode,Decode} as FunctionExpr (#23933)
  • Dispatch through pl.map_batches and AnonymousColumnsUdf (#23867)
  • Ensure clippy and rustfmt run in CI when changing pyo3-polars (#23930)
  • Split column_selector.rs (#23921)
  • Fix pyo3-polars proc-macro re-exports (#23918)
  • Make GetBatchState polling functions unsafe (#23795)
  • Rewrite evaluate_on_groups for .gather / .get (#23700)
  • Remove Context from logical layer (#23863)
  • Add proptest strategy for Polars DataType schemas (#23854)
  • Move Python C API to python-polars (#23876)
  • Refactor directory structure of streaming multi-scan (#23865)
  • Add subphase and query task spawning to StreamingExecState (#23725)
  • Update Rust Polars versions (#23861)
  • Make polars-parquet optional (#23860)
  • Relax constraint on maximum Python version for numba (#23838)

Thank you to all our contributors for making this release possible! @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NeejWeej, @VictorAtIfInsurance, @agossard, @alexander-beedie, @aparna2198, @borchero, @c-peters, @camriddell, @cgevans, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @gfvioli, @henryharbeck, @iishutov, @itamarst, @jarondl, @jimmmmmmmmmmmy, @jjurm, @joshuamarkovic, @juansolm, @kdn36, @kuril, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @mroeschke, @nameexhaustion, @nesb1, @orlp, @pka, @pomo-mondreganto, @r-brink, @rawhuul, @ritchie46, @stijnherfst, @vdrn and @wence-