Polars: rs-0.50.0 Release

Release date:
July 30, 2025
Previous version:
rs-0.49.1 (released June 19, 2025)
Magnitude:
33,846 Diff Delta
Contributors:
23 total committers
Data confidence:
Commits:

250 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored July 28, 2025
Authored July 16, 2025
Authored June 24, 2025
Authored July 22, 2025

Top Contributors in rs-0.50.0

coastalwhite
nameexhaustion
orlp
ritchie46
kdn36
JakubValtar
Liyixin95
borchero
stijnherfst
math-hiyoko

Directory Browser for rs-0.50.0

All files are compared to previous version, rs-0.49.1. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ† Highlights

  • Make Selector a concrete part of the DSL (#23351)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)

πŸš€ Performance improvements

  • Lower Expr.slice to streaming engine (#23683)
  • Elide bound check (#23653)
  • Preserve Column repr in ColumnTransform operations (#23648)
  • Lower any() and all() to streaming engine (#23640)
  • Lower row-separable functions in streaming engine (#23633)
  • Lower int_range(len()) to with_row_index (#23576)
  • Avoid double field resolution in with_columns (#23530)
  • Rolling quantile lower time complexity (#23443)
  • Use single-key optimization with Categorical (#23436)
  • Improve null-preserving identification for boolean functions (#23317)
  • Improve boolean bitwise aggregate performance (#23325)
  • Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
  • Re-write join types during filter pushdown (#23275)
  • Generate PQ ZSTD decompression context once (#23200)
  • Trigger cache/cse optimizations when multiplexing (#23274)
  • Cache FileInfo upon DSL -> IR conversion (#23263)
  • Push more filters past joins (#23240)

✨ Enhancements

  • Expand on DataTypeExpr (#23249)
  • Lower row-separable functions in streaming engine (#23633)
  • Add scalar checks to range expressions (#23632)
  • Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
  • Implement mean function in arr namespace (#23486)
  • Implement vec_hash for List and Array (#23578)
  • Add unstable pl.row_index() expression (#23556)
  • Add Categories on the Python side (#23543)
  • Implement partitioned sinks for the in-memory engine (#23522)
  • Expose IRFunctionExpr::Rank in the python visitor (#23512)
  • Raise and Warn on UDF's without return_dtype set (#23353)
  • IR pruning (#23499)
  • Expose IRFunctionExpr::FillNullWithStrategy in the python visitor (#23479)
  • Support min/max reducer for null dtype in streaming engine (#23465)
  • Implement streaming Categorical/Enum min/max (#23440)
  • Allow cast to Categorical inside list.eval (#23432)
  • Support pathlib.Path as source for read/scan_delta() (#23411)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Pass payload in ExprRegistry (#23412)
  • Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
  • Support row group skipping with filters when cast_options is given (#23356)
  • Execute bitwise reductions in streaming engine (#23321)
  • Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
  • Add dtype to str.to_integer() (#22239)
  • Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
  • Add is_close method (#23273)
  • Drop superfluous casts from optimized plan (#23269)
  • Added drop_nulls option to to_dummies (#23215)
  • Support comma as decimal separator for CSV write (#23238)
  • Don't format keys if they're empty in dot (#23247)
  • Improve arity simplification (#23242)

🐞 Bug fixes

  • Fix credential refresh logic (#23730)
  • Fix to_datetime() fallible identification (#23735)
  • Correct output datatype for dt.with_time_unit (#23734)
  • Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
  • Allow DataType expressions with selectors (#23720)
  • Match output type to engine for interpolate on Decimal (#23706)
  • Remaining bugs in with_exprs_and_input and pruning (#23710)
  • Match output dtype to engine for cum_sum_horizontal (#23686)
  • Field names for pl.struct in group-by (#23703)
  • Fix output for str.extract_groups with empty string pattern (#23698)
  • Match output type to engine for rolling_map (#23702)
  • Fix incorrect join on single Int128 column for in-memory engine (#23694)
  • Match output field name to lhs for BusinessDaycount (#23679)
  • Correct the planner output datatype for strptime (#23676)
  • Sort and Scan with_exprs_and_input (#23675)
  • Revert to old behavior with name.keep (#23670)
  • Fix panic loading from arrow Map containing timestamps (#23662)
  • Selectors in self part of list.eval (#23668)
  • Fix output field dtype for ToInteger (#23664)
  • Allow decimal_comma with , separator in read_csv (#23657)
  • Fix handling of UTF-8 in write_csv to IO[str] (#23647)
  • Selectors in {Lazy,Data}Frame.filter (#23631)
  • Stop splitfields iterator at eol in simd branch (#23652)
  • Correct output datatype of dt.year and dt.mil (#23646)
  • Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
  • Order-preserving equi-join didn't always flush final matches (#23639)
  • Fix ColumnNotFound error when joining on col().cast() (#23622)
  • Fix agg groups on when/then in group_by context (#23628)
  • Output type for sign (#23572)
  • Apply agg_fn on null values in pivot (#23586)
  • Remove nonsensical duration variance (#23621)
  • Don't panic when sinking nested categorical to Parquet (#23610)
  • Correctly set value count output field name (#23611)
  • Casting unused columns in to_torch (#23606)
  • Allow inferring of hours-only timezone offset (#23605)
  • Bug in Categorical <-> str compare with nulls (#23609)
  • Honor n=0 in all cases of str.replace (#23598)
  • Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
  • Relabel duplicate sequence IDs in distributor (#23593)
  • Round-trip Enum and Categorical metadata in plugins (#23588)
  • Fix incorrect join_asof with by followed by head/slice (#23585)
  • Allow writing nested Int128 data to Parquet (#23580)
  • Enum serialization assert (#23574)
  • Output type for peak_min / peak_max (#23573)
  • Make Scalar Categorical, Enum and Struct values serializable (#23565)
  • Preserve row order within partition when sinking parquet (#23462)
  • Panic in create_multiple_physical_plans when branching from a single cache node (#23561)
  • Prevent in-mem partition sink deadlock (#23562)
  • Update AWS cloud documentation (#23563)
  • Correctly handle null values when comparing structs (#23560)
  • Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
  • Make Expr.append serializable (#23515)
  • Float by float division dtype (#23529)
  • Division on empty DataFrame generating null row (#23516)
  • Partition sink copy_exprs and with_exprs_and_input (#23511)
  • Unreachable with pl.self_dtype (#23507)
  • Rolling median incorrect min_samples with nulls (#23481)
  • Make Int128 roundtrippable via Parquet (#23494)
  • Fix panic when common subplans contain IEJoins (#23487)
  • Properly handle non-finite floats in rolling_sum/mean (#23482)
  • Make read_csv_batched respect skip_rows and skip_lines (#23484)
  • Always use cloudpickle for the python objects in cloud plans (#23474)
  • Support string literals in index_of() on categoricals (#23458)
  • Don't panic for finish_callback with nested datatypes (#23464)
  • Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
  • Fix var/moment dtypes (#23453)
  • Fix agg_groups dtype (#23450)
  • Clear cached_schema when apply changes dtype (#23439)
  • Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
  • Null handling in full-null group_by_dynamic mean/sum (#23435)
  • Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
  • Fix index calculation for nearest interpolation (#23418)
  • Fix compilation failure with --no-default-features and --features lazy,strings (#23384)
  • Parse parquet footer length into unsigned integer (#23357)
  • Fix incorrect results with group_by aggregation on empty groups (#23358)
  • Fix boolean min() in group_by aggregation (streaming) (#23344)
  • Respect data-model in map_elements (#23340)
  • Properly join URI paths in PlPath (#23350)
  • Ignore null values in bitwise aggregation on bools (#23324)
  • Fix panic filtering after left join (#23310)
  • Out-of-bounds index in hot hash table (#23311)
  • Fix scanning '?' from cloud with glob=False (#23304)
  • Fix filters on inserted columns did not remove rows (#23303)
  • Don't ignore return_dtype (#23309)
  • Use safe parsing for get_normal_components (#23284)
  • Fix output column names/order of streaming coalesced right-join (#23278)
  • Restore concat_arr inputs expansion (#23271)

πŸ“– Documentation

  • Point the R Polars version on R-multiverse (#23660)
  • Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
  • Add page about billing to Polars Cloud user guide (#23564)
  • Small user-guide improvement and fixes (#23549)
  • Correct note in from_pandas about data being cloned (#23552)
  • Fix a few typos in the "Streaming" section (#23536)
  • Update streaming page (#23535)
  • Update structure of Polars Cloud documentation (#23496)
  • Update when_then in user guide (#23245)

πŸ“¦ Build system

  • Update all rand code (#23387)
  • Bump up rand & rand_distr (#22619)

πŸ› οΈ Other improvements

  • Remove incorrect DeletionFilesList::slice (#23796)
  • Remove old schema file (#23798)
  • Remove Default for StreamingExecutionState (#23729)
  • Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
  • Expose PlPathRef via polars::prelude (#23754)
  • Add hashes json (#23758)
  • Add AExpr::is_expr_equal_to (#23740)
  • Fix rank test to respect maintain order (#23723)
  • IR inputs and exprs iterators (#23722)
  • Store more granular schema hashes to reduce merge conflicts (#23709)
  • Add assertions for unique ID (#23711)
  • Use RelaxedCell in multiscan (#23712)
  • Debug assert ColumnTransform cast is non-strict (#23717)
  • Use UUID for UniqueID (#23704)
  • Remove scan id (#23697)
  • Propagate Iceberg physical ID schema to IR (#23671)
  • Remove unused and confusing match arm (#23691)
  • Remove unused ALLOW_GROUP_AWARE flag (#23690)
  • Remove unused evaluate_inline (#23687)
  • Remove unused field from AggregationContext (#23685)
  • Remove node_to_lp (#23678)
  • Underscore prefix for get_backing_series/to_new_from_backing (#23659)
  • Make helper functions private for equality assertions and update test (#23650)
  • Use RelaxedCell for fully relaxed atomics (#23644)
  • Replace PlSmallStr::from_static("item") with LIST_VALUES_NAME (#23645)
  • Fix cloud bytes scanning and read_* functions (#23642)
  • Group By maintain order on test (#23643)
  • Add maintain_order tests for streaming joins (#23577)
  • Add logic to support struct field renames on arbitrary nesting levels (#23532)
  • Continue on cloud testing (#23616)
  • Add pyo3-polars (#23571)
  • Remove _fetch (#23607)
  • Replace agg_list in AExpr::to_field with is_scalar_ae (#23582)
  • Mark select test case as write_disk (#23566)
  • Rolling order checking of test (#23568)
  • Multiple in-mem plans with reused cache #23561 (#23567)
  • Reduce warning in docs serve (#23534)
  • Remove left-behind print statement (#23533)
  • Make list.to_struct and arr.to_struct serializable (#23504)
  • Small conftest improvement (#23508)
  • Improve Categories error message (#23510)
  • Add test to ensure the global categories gets cleaned up (#23502)
  • Add more testing to group_by sorted test (#23500)
  • Pruning follow-up (#23501)
  • Make arg_min, arg_max, arg_sort and product into concrete DSL and IR constructs (#23493)
  • Simpify arena iterators (#23495)
  • Remove unnecessary may_fail_auto_streaming (#23477)
  • Remove StringCache from the test suite (#23473)
  • Make Selector a concrete part of the DSL (#23351)
  • Add streaming engine to code-coverage (#23441)
  • Remove hashbrown_nightly_hack (#23445)
  • Move options out of RollingFunction (#23430)
  • Drop New from RowEncodingCategoricalContext (#23431)
  • Remove unneeded allocations when creating PlPath (#23417)
  • Rework Categorical/Enum to use (Frozen)Categories (#23016)
  • Do not depends on pyo3 without the python feature (#23420)
  • Ignore Sort if 'by' is empty (#23320)
  • Rename from_buffer()/FromBuffer to reinterpret()/Reinterpret (#23362)
  • Clean up ChunkFilter implementation (#23378)
  • Only conver to ExprIR once in with_columns (#23352)
  • Update rust version in nix flake (#23347)
  • Update toolchain and fix clippy issues (#23334)
  • Optimize equality comparisons and fix error handling (#23281)
  • Improve cloud tests (#23312)
  • Casting from binview to primitives code moved from polars-ops to polars-compute (#23234)
  • Improve DSL source cache (#23282)
  • Add new PlPath that abstracts over PathBuf and URI (#23280)
  • Add may_fail_cloud mark for pytest (#23279)
  • Organize dsl_to_ir logic into modules (#23277)
  • Add flag for auto distributed testing (#23220)
  • Remove unused PyDataType (#23265)
  • Split FileScan in FileScanDsl and FileScanIR (#23260)

Thank you to all our contributors for making this release possible! @Declow, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @TheLostLambda, @Washiil, @alexander-beedie, @borchero, @c-peters, @cmdlineluser, @coastalwhite, @deanm0000, @eitsupi, @etiennebacher, @florian-klein, @gfvioli, @habaneraa, @itamarst, @kdn36, @ldhwaddell, @math-hiyoko, @mpasa, @nameexhaustion, @orlp, @othijssens, @r-brink, @ritchie46 and @stijnherfst