π Highlights
- improve join performance through radix partitioned join (#12270)
π₯ Breaking changes
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513)
- Rename
take
to gather
(#12528)
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492)
- Rename
take_every
to gather_every
(#12531)
- Deprecate
parse_int
in favor of to_integer
(#12464)
- plugins add version and context (#12433)
- Fix
scan_csv
error type (#12355)
- Rename
write_csv
parameter has_header
to include_header
(#12351)
- Rename
is_signed
to is_signed_integer
(#12220)
- Rename
dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
- Rename
ljust
/rjust
to pad_end
/pad_start
(#11975)
π Performance improvements
- speed up cov/corr with SIMD + strength-reduction
~3x 0.19.13/ ~2x numpy
(#12471)
- apply predicates and statistics of parquet files in streaming mode (#12439)
- use online algorithm for cov/corr
~2x
(#12412)
- indexvec in group-by (#12371)
- reduce allocations in hash join (#12368)
- change concurrency parameters (#12321)
- improve join performance through radix partitioned join (#12270)
- remove extra multiplication in hash_to_partition (#12233)
- allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
- improve parquet downloading (#12061)
β¨ Enhancements
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492)
- support http scan_parquet (#12517)
- Add support for UTF-8 BOM option in
write_csv
and sink_csv
(#12253)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- Allow comparison of two local categories with the same hash (#12503)
- more changes for versioned plugins (#12504)
- plugins add version and context (#12433)
- include i128 in more primitive functions (#12413)
- write rolling functions as private expressions. (#12379)
- Add
round_sig_figs
expression for rounding to significant figures (#11959)
- change concurrency parameters (#12321)
- deprecate
_saturating
in duration string language, make it the default (#12301)
- auto infer
ambiguous
for truncate and round (#12204)
- Rename
is_signed
to is_signed_integer
(#12220)
- New
Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
- allow non-aggregation predicate in ternary groupby (#12286)
- Add
name=
in .write_avro
to set schema name (#12255)
- Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- start prefetching all files immediately (#12201)
- Add
.list.to_array
expression (#12192)
- consolidate & improve all casting failure error messages (#12168)
- tunable concurrency (#12171)
- support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136)
- add concurrency budget (#12117)
- Introduce ignore_nulls for str.concat (#12108)
- casting utf8 to temporal (#12072)
- Add supertype for
List
/Array
(#12016)
- enable eq and neq for array dtype (#12020)
- Expressify n of shift (#12004)
- add dedicated
name
namespace for operations that affect expression names (#11973)
π Bug fixes
- fix incorrect ternary agg states (#12538)
- fix and improve ternary evaluation on groups (#12529)
- saturating sub in debug msg (#12525)
- fix panic when writing
Decimal
type to parquet (#12532)
- pre-fefetch struct columns in async projection pd (#12514)
- rechunk cross join output in streaming (#12511)
- fix as_list logical types (#12507)
- fix streaming cross join on empty df (#12491)
- dont overflow when calculating date range over very long periods (#12479)
- Allow append/zip_with/extend on local categoricals (#12369)
- Do not panic if time is invalid (#12466)
- empty csv no-raise (#12434)
- Fix
scan_csv
error type (#12355)
- binary operations in aggregation context on literals (#12430)
- update groups state after binary aggregation (#12415)
- Remove extra
\n
when reading file-like object wi⦠(#12333)
- revert ternary special broadcast, ensure broadcast is always to max height (#12395)
- ensure first/last return null if empty (#12401)
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
- uint64 should be correctly extracted from python object (#12338)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range
(#12317)
- scan emtpy csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period
case (#12267)
- Raise more informative error on invalid
reshape
input (#12288)
- incorrect super type for literals in nested binary exprs (#12238)
- Update
null_count
after arithmetic (#12280)
- fix ambiguous aggregation type (#12269)
- Consistently propagate nulls for
numpy
ufuncs (#12212)
- respect return_scalar of list scalars (#12251)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Raise if *_horizontal without inputs (#12106)
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)
- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045)
- fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- str.concat on empty list (#12066)
- binary agg should group aware if literal not a scalar (#12043)
- Use Arrow schema for file readers (#12048)
- Error on duplicates in hive partitioning (#12040)
- display fmt for str split (#12039)
- sum_horizontal should not always cast to int (#12031)
- fix apply_to_inner's dtype (#12010)
- Fix padding for non-ASCII strings (#12008)
- inline parts of unstable unicode module for stable (#12003)
- fix dot visualization of anonymous scans (#12002)
- SQL table aliases (#11988)
π οΈ Other improvements
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513)
- fix and improve ternary evaluation on groups (#12529)
- Rename
take
to gather
(#12528)
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492)
- Rename
take_every
to gather_every
(#12531)
- Add
polars-ds
to list of community plugins (#12527)
- add schema test (#12523)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- add test for previous commit (#12510)
- Support Python 3.12 (#12094)
- Fix some typos (#12485)
- Deprecate
parse_int
in favor of to_integer
(#12464)
- update rustc (#12468)
- rename the
DataType
in the polars-arrow crate to ArrowDataType
for clarity, preventing conflation with our own/native DataType
(#12459)
- Replace outdated dev dependency
tempdir
(#12462)
- move cov/corr to polars-ops (#12411)
- use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
- dprint/markdown link checker minor updates (#12409)
- replace as_u64 with dirty_hash (#12327)
- Fix ruff linting invocation (#12350)
- Rename
write_csv
parameter has_header
to include_header
(#12351)
- Build and verify Rust examples in docs (#12334)
- Fix some feature flags (#12325)
- Organize Cargo.toml (#12323)
- remove fxhash (#12322)
- Run rustfmt on doc examples (#12319)
- Consolidate "getting started" and "user guide" sections (#12246)
- deprecate
_saturating
in duration string language, make it the default (#12301)
- simplify expr checking in predicate push down (#12287)
- Replace dev dependency
avro-rs
with apache-avro
(#12295)
- Run
clippy
on all targets (#12293)
- Add top-level
make clippy
, simplify Rust linting workflows (#12290)
- ensure we git-ignore ALL
.venv
dirs (#12289)
- incorrect super type for literals in nested binary exprs (#12238)
- remove unwrap from group_by (#12263)
- update object_store (#12006) (#12273)
- Remove recommended setting from IDE docs (#12275)
- Add feature flag for
list.eval
(#12254)
- factor out some shared code in
truncate_impl
(#12229)
- update Cargo.lock (#12226)
- Make all functions in string namespace non-anonymous (#12215)
- Rename
dt.seconds
to dt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
- use enum for Ambiguous (#12193)
- Standardize project name formatting across docs (#12185)
- Update
sqlparser
to 0.39
(#12173)
- pin ring (#12176)
- Refactor
FunctionExpr
module (#12162)
- Fix tests for pyarrow 14 (#12170)
- Fix triggers for docs deployment (#12159)
- Make all functions in binary namespace non-anonymous (#12126)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor improvements to the docs website (#12084)
- reshape and repeat_by non-anoymous (#12064)
- upgrade zstd to 0.13 in
polars-parquet
(#12062)
- Direct CONTRIBUTING to the docs website (#12042)
- inline parquet2 (#12026)
- remove parquet logic from
polars-arrow
and consolidate logic in polars-parquet
crate. (#12022)
- move abs to ops (#12005)
- Rename
ljust
/rjust
to pad_end
/pad_start
(#11975)
- Disable type checking for
dataframe_api_compat
dependency (#11997)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl