Polars: py-1.32.0-beta.1 Release

Release date:
July 26, 2025
Previous version:
py-1.31.0 (released June 18, 2025)
Magnitude:
30,063 Diff Delta
Contributors:
24 total committers
Data confidence:
Commits:

223 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored July 16, 2025
Authored June 24, 2025
Authored July 22, 2025
Authored July 11, 2025

Top Contributors in py-1.32.0-beta.1

coastalwhite
orlp
nameexhaustion
ritchie46
kdn36
JakubValtar
mcrumiller
stijnherfst
borchero
Kevin-Patyk

Directory Browser for py-1.32.0-beta.1

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸ† Highlights

  • Make Selector a concrete part of the DSL (#23351)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)

πŸš€ Performance improvements

  • Lower Expr.slice to streaming engine (#23683)
  • Elide bound check (#23653)
  • Preserve Column repr in ColumnTransform operations (#23648)
  • Lower any() and all() to streaming engine (#23640)
  • Lower row-separable functions in streaming engine (#23633)
  • Lower int_range(len()) to with_row_index (#23576)
  • Avoid double field resolution in with_columns (#23530)
  • Rolling quantile lower time complexity (#23443)
  • Use single-key optimization with Categorical (#23436)
  • Improve null-preserving identification for boolean functions (#23317)
  • Improve boolean bitwise aggregate performance (#23325)
  • Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
  • Re-write join types during filter pushdown (#23275)
  • Generate PQ ZSTD decompression context once (#23200)
  • Trigger cache/cse optimizations when multiplexing (#23274)
  • Cache FileInfo upon DSL -> IR conversion (#23263)
  • Push more filters past joins (#23240)
  • Optimize Bitmap::make_mut (#23138)

✨ Enhancements

  • Add Python-side caching for credentials and provider auto-initialization (#23736)
  • Expand on DataTypeExpr (#23249)
  • Lower row-separable functions in streaming engine (#23633)
  • Add scalar checks to range expressions (#23632)
  • Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
  • Implement mean function in arr namespace (#23486)
  • Implement vec_hash for List and Array (#23578)
  • Add unstable pl.row_index() expression (#23556)
  • Add Categories on the Python side (#23543)
  • Implement partitioned sinks for the in-memory engine (#23522)
  • Raise and Warn on UDF's without return_dtype set (#23353)
  • IR pruning (#23499)
  • Support min/max reducer for null dtype in streaming engine (#23465)
  • Implement streaming Categorical/Enum min/max (#23440)
  • Allow cast to Categorical inside list.eval (#23432)
  • Support pathlib.Path as source for read/scan_delta() (#23411)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Pass payload in ExprRegistry (#23412)
  • Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
  • Support row group skipping with filters when cast_options is given (#23356)
  • Execute bitwise reductions in streaming engine (#23321)
  • Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
  • Add dtype to str.to_integer() (#22239)
  • Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
  • Add is_close method (#23273)
  • Drop superfluous casts from optimized plan (#23269)
  • Added drop_nulls option to to_dummies (#23215)
  • Support comma as decimal separator for CSV write (#23238)
  • Don't format keys if they're empty in dot (#23247)
  • Improve arity simplification (#23242)
  • Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

  • Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
  • Fix credential refresh logic (#23730)
  • Fix to_datetime() fallible identification (#23735)
  • Correct output datatype for dt.with_time_unit (#23734)
  • Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
  • Allow DataType expressions with selectors (#23720)
  • Match output type to engine for interpolate on Decimal (#23706)
  • Remaining bugs in with_exprs_and_input and pruning (#23710)
  • Match output dtype to engine for cum_sum_horizontal (#23686)
  • Field names for pl.struct in group-by (#23703)
  • Fix output for str.extract_groups with empty string pattern (#23698)
  • Match output type to engine for rolling_map (#23702)
  • Moved passing DeltaTable._storage_options (#23673)
  • Fix incorrect join on single Int128 column for in-memory engine (#23694)
  • Match output field name to lhs for BusinessDaycount (#23679)
  • Correct the planner output datatype for strptime (#23676)
  • Sort and Scan with_exprs_and_input (#23675)
  • Revert to old behavior with name.keep (#23670)
  • Fix panic loading from arrow Map containing timestamps (#23662)
  • Selectors in self part of list.eval (#23668)
  • Fix output field dtype for ToInteger (#23664)
  • Allow decimal_comma with , separator in read_csv (#23657)
  • Fix handling of UTF-8 in write_csv to IO[str] (#23647)
  • Selectors in {Lazy,Data}Frame.filter (#23631)
  • Stop splitfields iterator at eol in simd branch (#23652)
  • Correct output datatype of dt.year and dt.mil (#23646)
  • Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
  • Order-preserving equi-join didn't always flush final matches (#23639)
  • Fix ColumnNotFound error when joining on col().cast() (#23622)
  • Fix agg groups on when/then in group_by context (#23628)
  • Output type for sign (#23572)
  • Apply agg_fn on null values in pivot (#23586)
  • Remove nonsensical duration variance (#23621)
  • Don't panic when sinking nested categorical to Parquet (#23610)
  • Correctly set value count output field name (#23611)
  • Casting unused columns in to_torch (#23606)
  • Allow inferring of hours-only timezone offset (#23605)
  • Bug in Categorical <-> str compare with nulls (#23609)
  • Honor n=0 in all cases of str.replace (#23598)
  • Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
  • Relabel duplicate sequence IDs in distributor (#23593)
  • Round-trip Enum and Categorical metadata in plugins (#23588)
  • Fix incorrect join_asof with by followed by head/slice (#23585)
  • Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
  • Allow writing nested Int128 data to Parquet (#23580)
  • Enum serialization assert (#23574)
  • Output type for peak_min / peak_max (#23573)
  • Make Scalar Categorical, Enum and Struct values serializable (#23565)
  • Preserve row order within partition when sinking parquet (#23462)
  • Prevent in-mem partition sink deadlock (#23562)
  • Update AWS cloud documentation (#23563)
  • Correctly handle null values when comparing structs (#23560)
  • Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
  • Make Expr.append serializable (#23515)
  • Float by float division dtype (#23529)
  • Division on empty DataFrame generating null row (#23516)
  • Partition sink copy_exprs and with_exprs_and_input (#23511)
  • Unreachable with pl.self_dtype (#23507)
  • Rolling median incorrect min_samples with nulls (#23481)
  • Make Int128 roundtrippable via Parquet (#23494)
  • Fix panic when common subplans contain IEJoins (#23487)
  • Properly handle non-finite floats in rolling_sum/mean (#23482)
  • Make read_csv_batched respect skip_rows and skip_lines (#23484)
  • Always use cloudpickle for the python objects in cloud plans (#23474)
  • Support string literals in index_of() on categoricals (#23458)
  • Don't panic for finish_callback with nested datatypes (#23464)
  • Pass DeltaTable._storage_options if no storage_options are provided (#23456)
  • Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
  • Fix var/moment dtypes (#23453)
  • Fix agg_groups dtype (#23450)
  • Fix incorrect _get_path_scheme (#23444)
  • Fix missing overload defaults in read_ods and tree_format (#23442)
  • Clear cached_schema when apply changes dtype (#23439)
  • Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
  • Null handling in full-null group_by_dynamic mean/sum (#23435)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Fix index calculation for nearest interpolation (#23418)
  • Overload for eager default in Schema.to_frame was False instead of True (#23413)
  • Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
  • Removed special handling for bytes like objects in read_ndjson (#23361)
  • Parse parquet footer length into unsigned integer (#23357)
  • Fix incorrect results with group_by aggregation on empty groups (#23358)
  • Fix boolean min() in group_by aggregation (streaming) (#23344)
  • Respect data-model in map_elements (#23340)
  • Properly join URI paths in PlPath (#23350)
  • Ignore null values in bitwise aggregation on bools (#23324)
  • Fix panic filtering after left join (#23310)
  • Out-of-bounds index in hot hash table (#23311)
  • Fix scanning '?' from cloud with glob=False (#23304)
  • Fix filters on inserted columns did not remove rows (#23303)
  • Don't ignore return_dtype (#23309)
  • Raise error instead of return in Series class (#23301)
  • Use safe parsing for get_normal_components (#23284)
  • Fix output column names/order of streaming coalesced right-join (#23278)
  • Restore concat_arr inputs expansion (#23271)
  • Expose FieldsMapper (#23232)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

πŸ“– Documentation

  • Fix str.replace_many examples trigger deprecation warning (#23695)
  • Point the R Polars version on R-multiverse (#23660)
  • Update example for writing to cloud storage (#20265)
  • Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
  • Add docs of Expr.list.filter and Series.list.filter (#23589)
  • Add page about billing to Polars Cloud user guide (#23564)
  • Small user-guide improvement and fixes (#23549)
  • Correct note in from_pandas about data being cloned (#23552)
  • Fix a few typos in the "Streaming" section (#23536)
  • Update streaming page (#23535)
  • Update structure of Polars Cloud documentation (#23496)
  • Update example code in pandas migration guide (#23403)
  • Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
  • Add example of using OR in join_where (#23375)
  • Update when_then in user guide (#23245)

πŸ“¦ Build system

  • Update all rand code (#23387)

πŸ› οΈ Other improvements

  • Add hashes json (#23758)
  • Add AExpr::is_expr_equal_to (#23740)
  • Fix rank test to respect maintain order (#23723)
  • IR inputs and exprs iterators (#23722)
  • Store more granular schema hashes to reduce merge conflicts (#23709)
  • Use UUID for UniqueID (#23704)
  • Remove scan id (#23697)
  • Propagate Iceberg physical ID schema to IR (#23671)
  • Remove unused and confusing match arm (#23691)
  • Remove unused ALLOW_GROUP_AWARE flag (#23690)
  • Remove unused evaluate_inline (#23687)
  • Remove unused field from AggregationContext (#23685)
  • Remove node_to_lp (#23678)
  • Fix cloud bytes scanning and read_* functions (#23642)
  • Group By maintain order on test (#23643)
  • Add maintain_order tests for streaming joins (#23577)
  • Continue on cloud testing (#23616)
  • Add pyo3-polars (#23571)
  • Remove _fetch (#23607)
  • Replace agg_list in AExpr::to_field with is_scalar_ae (#23582)
  • Mark select test case as write_disk (#23566)
  • Rolling order checking of test (#23568)
  • Multiple in-mem plans with reused cache #23561 (#23567)
  • Reduce warning in docs serve (#23534)
  • Remove left-behind print statement (#23533)
  • Make list.to_struct and arr.to_struct serializable (#23504)
  • Small conftest improvement (#23508)
  • Improve Categories error message (#23510)
  • Add test to ensure the global categories gets cleaned up (#23502)
  • Add more testing to group_by sorted test (#23500)
  • Pruning follow-up (#23501)
  • Make arg_min, arg_max, arg_sort and product into concrete DSL and IR constructs (#23493)
  • Simpify arena iterators (#23495)
  • Remove unnecessary may_fail_auto_streaming (#23477)
  • Remove StringCache from the test suite (#23473)
  • Make Selector a concrete part of the DSL (#23351)
  • Add streaming engine to code-coverage (#23441)
  • Remove hashbrown_nightly_hack (#23445)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)
  • Fix typing error from new pandas-stubs release (#23414)
  • Ignore Sort if 'by' is empty (#23320)
  • Rename from_buffer()/FromBuffer to reinterpret()/Reinterpret (#23362)
  • Only conver to ExprIR once in with_columns (#23352)
  • Update rust version in nix flake (#23347)
  • Update toolchain and fix clippy issues (#23334)
  • Improve cloud tests (#23312)
  • Casting from binview to primitives code moved from polars-ops to polars-compute (#23234)
  • Improve DSL source cache (#23282)
  • Add new PlPath that abstracts over PathBuf and URI (#23280)
  • Add may_fail_cloud mark for pytest (#23279)
  • Organize dsl_to_ir logic into modules (#23277)
  • Add flag for auto distributed testing (#23220)
  • Remove unused PyDataType (#23265)
  • Split FileScan in FileScanDsl and FileScanIR (#23260)
  • Update Rust Polars versions (#23239)
  • Connect Python assert_dataframe_equal() to Rust back-end (#23207)
  • Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)
  • Update Rust Polars versions (#23229)

Thank you to all our contributors for making this release possible! @Declow, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @TheLostLambda, @Washiil, @borchero, @c-peters, @cmdlineluser, @coastalwhite, @deanm0000, @eitsupi, @etiennebacher, @florian-klein, @gfvioli, @habaneraa, @itamarst, @kdn36, @ldhwaddell, @math-hiyoko, @mcrumiller, @mrkn, @nameexhaustion, @orlp, @othijssens, @r-brink, @ritchie46, @stijnherfst and @zyctree