π Performance improvements
- Speed up
SQL interface "ORDER BY" clauses (#26037)
- Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
- Optimize ArrayFromIter implementations for ObjectArray (#25712)
- New streaming NDJSON sink pipeline (#25948)
- New streaming CSV sink pipeline (#25900)
- Dispatch partitioned usage of
sink_* functions to new-streaming by default (#25910)
- Replace ryu with faster zmij (#25885)
- Reduce memory usage for .item() count in grouped first/last (#25787)
- Skip schema inference if schema provided for
scan_csv/ndjson (#25757)
- Add width-aware chunking to prevent degradation with wide data (#25764)
- Use new sink pipeline for write/sink_ipc (#25746)
- Reduce memory usage when scanning multiple parquet files in streaming (#25747)
- Don't call cluster_with_columns optimization if not needed (#25724)
β¨ Enhancements
- Add new
pl.PartitionBy API (#26004)
- ArrowStreamExportable and sink_delta (#25994)
- Release musl builds (#25894)
- Implement streaming decompression for CSV
COUNT(*) fast path (#25988)
- Add nulls support for rolling_mean_by (#25917)
- Add lazy
collect_all (#25991)
- Add streaming decompression for NDJSON schema inference (#25992)
- Improved handling of unqualified SQL
JOIN columns that are ambiguous (#25761)
- Drop Python 3.9 support (#25984)
- Expose record batch size in
{sink,write}_ipc (#25958)
- Add
null_on_oob parameter to expr.get (#25957)
- Suggest correct timezone if timezone validation fails (#25937)
- Support streaming IPC scan from S3 object store (#25868)
- Implement streaming CSV schema inference (#25911)
- Support hashing of meta expressions (#25916)
- Improve
SQLContext recognition of possible table objects in the Python globals (#25749)
- Add pl.Expr.(min|max)_by (#25905)
- Improve MemSlice Debug impl (#25913)
- Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
- Expand scatter to more dtypes (#25874)
- Implement streaming CSV decompression (#25842)
- Add Series
sql method for API consistency (#25792)
- Mark Polars as safe for free-threading (#25677)
- Support Binary and Decimal in arg_(min|max) (#25839)
- Allow Decimal parsing in str.json_decode (#25797)
- Add
shift support for Object data type (#25769)
- Add missing
Series.arr.mean (#25774)
- Allow scientific notation when parsing Decimals (#25711)
π Bug fixes
- Release GIL on collect_batches (#26033)
- Missing buffer update in String is_in Parquet pushdown (#26019)
- Make
struct.with_fields data model coherent (#25610)
- Incorrect output order for order sensitive operations after join_asof (#25990)
- Use SeriesExport for pyo3-polars FFI (#26000)
- Add pl.Schema to type signature for DataFrame.cast (#25983)
- Don't write Parquet min/max statistics for i128 (#25986)
- Ensure chunk consistency in in-memory join (#25979)
- Fix varying block metadata length in IPC reader (#25975)
- Implement collect_batches properly in Rust (#25918)
- Fix panic on arithmetic with bools in list (#25898)
- Convert to index type with strict cast in some places (#25912)
- Empty dataframe in streaming non-strict hconcat (#25903)
- Infer large u64 in json as i128 (#25904)
- Set http client timeouts to 10 minutes (#25902)
- Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
- Raise error on duplicate
group_by names in upsample() (#25811)
- Correctly export view buffer sizes nested in Extension types (#25853)
- Fix
DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
- Ensure Kahan sum does not introduce NaN from infinities (#25850)
- Trim excess bytes in parquet decode (#25829)
- Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
- Fix quantile
midpoint interpolation (#25824)
- Don't use cast when converting from physical in list.get (#25831)
- Invalid null count on int -> categorical cast (#25816)
- Update groups in
list.eval (#25826)
- Use downcast before FFI conversion in PythonScan (#25815)
- Double-counting of row metrics (#25810)
- Cast nulls to expected type in streaming union node (#25802)
- Incorrect slice pushdown into map_groups (#25809)
- Fix panic writing parquet with single bool column (#25807)
- Fix upsample with
group_by incorrectly introduced NULLs on group key columns (#25794)
- Panic in top_k pruning (#25798)
- Fix incorrect
collect_schema for unpivot followed by join (#25782)
- Verify
arr namespace is called from array column (#25650)
- Ensure
LazyFrame.serialize() unchanged after collect_schema() (#25780)
- Function map_(rows|elements) with return_dtype = pl.Object (#25753)
- Fix incorrect cargo sub-feature (#25738)
π Documentation
- Fix display of deprecation warning (#26010)
- Document null behaviour for
rank (#25887)
- Add
QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
- Update mixed-offset datetime parsing example in user guide (#25915)
- Update bare-metal docs for mounted anonymous results (#25801)
- Fix credential parameter name in cloud-storage.py (#25788)
- Configuration options update (#25756)
π οΈ Other improvements
- Update rust compiler (#26017)
- Improve csv test coverage (#25980)
- Ramp up CSV read size (#25997)
- Mark
lazy parameter to collect_all as unstable (#25999)
- Update
ruff action and simplify version handling (#25940)
- Run python lint target as part of pre-commit (#25982)
- Disable HTTP timeout for receiving response body (#25970)
- Fix mypy lint (#25963)
- Add AI contribution policy (#25956)
- Fix failing scan delta S3 test (#25932)
- Improve MemSlice Debug impl (#25913)
- Remove and deprecate batched csv reader (#25884)
- Remove unused AnonymousScan functions (#25872)
- Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
- Various small improvements (#25835)
- Clear venv with appropriate version of Python (#25851)
- Skip schema inference if schema provided for
scan_csv/ndjson (#25757)
- Ensure proper async connection cleanup on DB test exit (#25766)
- Ensure we uninstall other Polars runtimes in CI (#25739)
- Make 'make requirements' more robust (#25693)
- Remove duplicate compression level types (#25723)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]