π Highlights
- improve join performance through radix partitioned join (#12270)
π₯ Breaking changes
- Rename cumulative functions
cumsum -> cum_sum and similar (#12513)
- Rename
take to gather (#12528)
- Add dedicated horizontal aggregation methods to
DataFrame (#12492)
- Rename
take_every to gather_every (#12531)
- Deprecate
parse_int in favor of to_integer (#12464)
- plugins add version and context (#12433)
- Fix
scan_csv error type (#12355)
- Rename
write_csv parameter has_header to include_header (#12351)
- Rename
is_signed to is_signed_integer (#12220)
- Rename
dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
- Rename
ljust/rjust to pad_end/pad_start (#11975)
π Performance improvements
- speed up cov/corr with SIMD + strength-reduction
~3x 0.19.13/ ~2x numpy (#12471)
- apply predicates and statistics of parquet files in streaming mode (#12439)
- use online algorithm for cov/corr
~2x (#12412)
- indexvec in group-by (#12371)
- reduce allocations in hash join (#12368)
- change concurrency parameters (#12321)
- improve join performance through radix partitioned join (#12270)
- remove extra multiplication in hash_to_partition (#12233)
- allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
- improve parquet downloading (#12061)
β¨ Enhancements
- Add dedicated horizontal aggregation methods to
DataFrame (#12492)
- support http scan_parquet (#12517)
- Add support for UTF-8 BOM option in
write_csv and sink_csv (#12253)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- Allow comparison of two local categories with the same hash (#12503)
- more changes for versioned plugins (#12504)
- plugins add version and context (#12433)
- include i128 in more primitive functions (#12413)
- write rolling functions as private expressions. (#12379)
- Add
round_sig_figs expression for rounding to significant figures (#11959)
- change concurrency parameters (#12321)
- deprecate
_saturating in duration string language, make it the default (#12301)
- auto infer
ambiguous for truncate and round (#12204)
- Rename
is_signed to is_signed_integer (#12220)
- New
Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
- allow non-aggregation predicate in ternary groupby (#12286)
- Add
name= in .write_avro to set schema name (#12255)
- Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- start prefetching all files immediately (#12201)
- Add
.list.to_array expression (#12192)
- consolidate & improve all casting failure error messages (#12168)
- tunable concurrency (#12171)
- support reverse sort in streaming (#12169)
- Add
.arr.to_list expression (#12136)
- add concurrency budget (#12117)
- Introduce ignore_nulls for str.concat (#12108)
- casting utf8 to temporal (#12072)
- Add supertype for
List/Array (#12016)
- enable eq and neq for array dtype (#12020)
- Expressify n of shift (#12004)
- add dedicated
name namespace for operations that affect expression names (#11973)
π Bug fixes
- fix incorrect ternary agg states (#12538)
- fix and improve ternary evaluation on groups (#12529)
- saturating sub in debug msg (#12525)
- fix panic when writing
Decimal type to parquet (#12532)
- pre-fefetch struct columns in async projection pd (#12514)
- rechunk cross join output in streaming (#12511)
- fix as_list logical types (#12507)
- fix streaming cross join on empty df (#12491)
- dont overflow when calculating date range over very long periods (#12479)
- Allow append/zip_with/extend on local categoricals (#12369)
- Do not panic if time is invalid (#12466)
- empty csv no-raise (#12434)
- Fix
scan_csv error type (#12355)
- binary operations in aggregation context on literals (#12430)
- update groups state after binary aggregation (#12415)
- Remove extra
\n when reading file-like object wi⦠(#12333)
- revert ternary special broadcast, ensure broadcast is always to max height (#12395)
- ensure first/last return null if empty (#12401)
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
- uint64 should be correctly extracted from python object (#12338)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range (#12317)
- scan emtpy csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period case (#12267)
- Raise more informative error on invalid
reshape input (#12288)
- incorrect super type for literals in nested binary exprs (#12238)
- Update
null_count after arithmetic (#12280)
- fix ambiguous aggregation type (#12269)
- Consistently propagate nulls for
numpy ufuncs (#12212)
- respect return_scalar of list scalars (#12251)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Raise if *_horizontal without inputs (#12106)
- fix incorrect desc sort behavior (#12141)
take should block predicate pushdown (#12130)
- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
- fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- str.concat on empty list (#12066)
- binary agg should group aware if literal not a scalar (#12043)
- Use Arrow schema for file readers (#12048)
- Error on duplicates in hive partitioning (#12040)
- display fmt for str split (#12039)
- sum_horizontal should not always cast to int (#12031)
- fix apply_to_inner's dtype (#12010)
- Fix padding for non-ASCII strings (#12008)
- inline parts of unstable unicode module for stable (#12003)
- fix dot visualization of anonymous scans (#12002)
- SQL table aliases (#11988)
π οΈ Other improvements
- Rename cumulative functions
cumsum -> cum_sum and similar (#12513)
- fix and improve ternary evaluation on groups (#12529)
- Rename
take to gather (#12528)
- Add dedicated horizontal aggregation methods to
DataFrame (#12492)
- Rename
take_every to gather_every (#12531)
- Add
polars-ds to list of community plugins (#12527)
- add schema test (#12523)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- add test for previous commit (#12510)
- Support Python 3.12 (#12094)
- Fix some typos (#12485)
- Deprecate
parse_int in favor of to_integer (#12464)
- update rustc (#12468)
- rename the
DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
- Replace outdated dev dependency
tempdir (#12462)
- move cov/corr to polars-ops (#12411)
- use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
- dprint/markdown link checker minor updates (#12409)
- replace as_u64 with dirty_hash (#12327)
- Fix ruff linting invocation (#12350)
- Rename
write_csv parameter has_header to include_header (#12351)
- Build and verify Rust examples in docs (#12334)
- Fix some feature flags (#12325)
- Organize Cargo.toml (#12323)
- remove fxhash (#12322)
- Run rustfmt on doc examples (#12319)
- Consolidate "getting started" and "user guide" sections (#12246)
- deprecate
_saturating in duration string language, make it the default (#12301)
- simplify expr checking in predicate push down (#12287)
- Replace dev dependency
avro-rs with apache-avro (#12295)
- Run
clippy on all targets (#12293)
- Add top-level
make clippy, simplify Rust linting workflows (#12290)
- ensure we git-ignore ALL
.venv dirs (#12289)
- incorrect super type for literals in nested binary exprs (#12238)
- remove unwrap from group_by (#12263)
- update object_store (#12006) (#12273)
- Remove recommended setting from IDE docs (#12275)
- Add feature flag for
list.eval (#12254)
- factor out some shared code in
truncate_impl (#12229)
- update Cargo.lock (#12226)
- Make all functions in string namespace non-anonymous (#12215)
- Rename
dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
- use enum for Ambiguous (#12193)
- Standardize project name formatting across docs (#12185)
- Update
sqlparser to 0.39 (#12173)
- pin ring (#12176)
- Refactor
FunctionExpr module (#12162)
- Fix tests for pyarrow 14 (#12170)
- Fix triggers for docs deployment (#12159)
- Make all functions in binary namespace non-anonymous (#12126)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor improvements to the docs website (#12084)
- reshape and repeat_by non-anoymous (#12064)
- upgrade zstd to 0.13 in
polars-parquet (#12062)
- Direct CONTRIBUTING to the docs website (#12042)
- inline parquet2 (#12026)
- remove parquet logic from
polars-arrow and consolidate logic in polars-parquet crate. (#12022)
- move abs to ops (#12005)
- Rename
ljust/rjust to pad_end/pad_start (#11975)
- Disable type checking for
dataframe_api_compat dependency (#11997)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl