π Highlights
- Add Extension types (#25322)
β¨ Enhancements
- Add SQL support for
ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
- Add SQL support for named
WINDOW references (#25400)
- Add
BIT_NOT support to the SQL interface (#25094)
- Add
LazyFrame.pivot (#25016)
- Add
allow_empty flag to item (#25048)
- Add
empty_as_null and keep_nulls flags to Expr.explode (#25289)
- Add
empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
- Add
having to group_by context (#23550)
- Add
ignore_nulls to first / last (#25105)
- Add
maintain_order to Expr.mode (#25377)
- Add
quantile for missing temporals (#25464)
- Add leftmost option to
str.replace_many / str.find_many / str.extract_many (#25398)
- Add strict parameter to pl.concat(how='horizontal') (#25452)
- Add support for
Float16 dtype (#25185)
- Add unstable
Schema.to_arrow (#25149)
- Allow
Expr.rolling in aggregation contexts (#25258)
- Allow
Expr.unique on List/Array with non-numeric types (#25285)
- Allow
glimpse to return a DataFrame (#24803)
- Allow
hash for all List dtypes (#25372)
- Allow
implode and aggregation in aggregation context (#25357)
- Allow
slice on scalar in aggregation context (#25358)
- Allow arbitrary Expressions in "subset" parameter of
unique frame method (#25099)
- Allow arbitrary expressions as the
Expr.rolling index_column (#25117)
- Allow bare
.row on a single-row DataFrame, equivalent to .item on a single-element DataFrame (#25229)
- Allow elementwise
Expr.over in aggregation context (#25402)
- Allow pl.Object in pivot value (#25533)
- Automatically Parquet dictionary encode floats (#25387)
- Display function of streaming physical plan
map node (#25368)
- Documentation on Polars Cloud manifests (#25295)
- Expose and document pl.Categories (#25443)
- Expose fields for generating physical plan visualization data (#25562)
- Extend SQL
UNNEST support to handle multiple array expressions (#25418)
- Improve SQL
UNNEST behaviour (#22546)
- Improve error message on unsupported SQL subquery comparisons (#25135)
- Make DSL-hash skippable (#25140)
- Minor improvement for
as_struct repr (#25529)
- Move GraphMetrics into StreamingQuery (#25310)
- Raise suitable error on non-integer "n" value for
clear (#25266)
- Rewrite
IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
- Set polars/<version> user-agent (#25112)
- Streaming
{Expr,LazyFrame}.rolling (#25058)
- Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Support
ewm_var/std in streaming engine (#25109)
- Support
unique_counts for all datatypes (#25379)
- Support additional forms of SQL "CREATE TABLE" statements (#25191)
- Support arbitrary expressions in SQL
JOIN constraints (#25132)
- Support column-positional SQL "UNION" operations (#25183)
- Support decimals in search_sorted (#25450)
- Temporal
quantile in rolling context (#25479)
- Use reference to Graph pipes when flushing metrics (#25442)
π Performance improvements
- Add parquet prefiltering for string regexes (#25381)
- Add streaming native
LazyFrame.group_by_dynamic (#25342)
- Add streaming sorted Group-By (#25013)
- Allow detecting plan sortedness in more cases (#25408)
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Enable predicate expressions on unsigned integers (#25416)
- Fast find start window in
group_by_dynamic with large offset (#25376)
- Faster kernels for rle_lengths (#25448)
- Fuse positive
slice into streaming LazyFrame.rolling (#25338)
- Lazy gather for
{forward,backward}_fill in group-by contexts (#25115)
- Mark
Expr.reshape((-1,)) as row separable (#25326)
- Mark output of more non-order-maintaining ops as unordered (#25419)
- Optimize ipc stream read performance (#24671)
- Reduce HuggingFace API calls (#25521)
- Return references from
aexpr_to_leaf_names_iter (#25319)
- Skip filtering scan IR if no paths were filtered (#25037)
- Use bitmap instead of Vec<bool> in first/last w. skip_nulls (#25318)
- Use fast path for
agg_min/agg_max when nulls present (#25374)
- Use strong hash instead of traversal for CSPE equality (#25537)
π Bug fixes
- Add
.rolling_rank support for temporal types and pl.Boolean (#25509)
- Address issues with SQL
OVER clause behaviour for window functions (#25249)
- Aggregation with
drop_nulls on literal (#25356)
- Allow
Null dtype values in scatter (#25245)
- Allow broadcast in
group_by for ApplyExpr and BinaryExpr (#25053)
- Allow empty list in
sort_by in list.eval context (#25481)
- Allow for negative time in
group_by_dynamic iterator (#25041)
- Always respect return_dtype in map_elements and map_rows (#25504)
- AnyValue::to_physical for categoricals (#25341)
- Apply CSV dict overrides by name only (#25436)
- Block predicate pushdown when
group_by key values are changed (#25032)
- Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
- Correct
drop_items for scalar input (#25351)
- Correct
eq_missing for struct with nulls (#25363)
- Correct
{first,last}_non_null if there are empty chunks (#25279)
- Correct handle requested stops in streaming shift (#25239)
- Correctly prune projected columns in hints (#25250)
- DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Don't push down predicates passed inserted cache nodes (#25042)
- Don't quietly allow unsupported SQL
SELECT clauses (#25282)
- Don't trigger
DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
- Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
- Fix CSV
select(len) off by 1 with comment prefix (#25069)
- Fix
arr.{eval,agg} in aggregation context (#25390)
- Fix
format_str in case of multiple chunks (#25162)
- Fix
groups update on slices with different offsets (#25097)
- Fix assertion panic on
group_by (#25179)
- Fix building polars-expr without timezones feature (#25254)
- Fix building polars-mem-engine with the async feature (#25300)
- Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix dictionary replacement error in
write_ipc (#25497)
- Fix expr slice pushdown causing shape error on literals (#25485)
- Fix field metadata for nested categorical PyCapsule export (#25052)
- Fix group lengths check in
sort_by with AggregatedScalar (#25503)
- Fix handling
Null dtype in ApplyExpr on group_by (#25077)
- Fix incorrect
.list.eval after slicing operations (#25540)
- Fix incorrect reshape on sliced lists (#25139)
- Fix length preserving check for
eval expressions in streaming engine (#25294)
- Fix occurence of exact matches of
.join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
- Fix off-by-one bug in
ColumnPredicates generation for inequalities operating on integer columns (#25412)
- Fix panic if scan predicate produces 0 length mask (#25089)
- Fix panic in
dt.truncate for invalid duration strings (#25124)
- Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Fix panic when using struct field as join key (#25059)
- Fix serialization of lazyframes containing huge tables (#25190)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Fix small bug with
PyExpr to PyObject conversion (#25265)
- Group-By aggregation problems caused by
AmortSeries (#25043)
- Handle some unusual
pl.col.<colname> edge-cases (#25153)
- Incorrect result in aggregated
first/last with ignore_nulls (#25414)
- Incorrect results for aggregated
{n_,}unique on bools (#25275)
- Invert
drop_nans filtering in group-by context (#25146)
- Make
str.json_decode output deterministic with lists (#25240)
- Mark
{forward,backward}_fill as length_preserving (#25352)
- Minor improvement to internal
is_pycapsule utility function (#25073)
- Nested dtypes in streaming
first_non_null/last_non_null (#25375)
- Nested dtypes in streaming
first/last (#25298)
- Panic exception when calling
Expr.rolling in .over (#25283)
- Panic in
group_by_dynamic with group_by and multiple chunks (#25075)
- Parquet
is_in for mixed validity pages (#25313)
- Prevent panic when joining sorted LazyFrame with itself (#25453)
- Raise error for all/any on list instead of panic (#25018)
- Raise error on out-of-range dates in temporal operations (#25471)
- Remove
Expr casts in pl.lit invocations (#25373)
- Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
- Return the correct string-case
Expr reprs (#25101)
- Reverse on chunked
struct (#25281)
- Revert
pl.format behavior with nulls (#25370)
- Rolling
mean/median for temporals (#25512)
- Run async DB queries with regular
asyncio if not inside a running loop (#25268)
- SQL "NATURAL" joins should coalesce the key columns (#25353)
- Schema mismatch with
list.agg, unique and scalar (#25348)
- Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Strict conversion AnyValue to Struct (#25536)
- Support "index" as column name in
group_by iterator (#25138)
- Support
AggregatedList in list.{eval,agg} context (#25385)
- The
SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
- Unique key names in streaming sort/top_k (#25082)
- Unique on literal in aggregation context (#25359)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Validate list.slice parameters are not lists (#25458)
- Wide-table join performance regression (#25222)
π Documentation
- Add Extension and BaseExtension to doc index (#25444)
- Add
LazyFrame.pivot to reference guide (#25482)
- Add
having API references (#25428)
- Add docstring example showing
str.slice taking Expression params (#25461)
- Add polars-on-premise documentation (#25431)
- Clarify bitwise behaviour of
and_, or_, and not_ Expressions on integer columns (#25092)
- Correct link to
datetime_range instead of date_range in resampling page (#25532)
- Deprecate
Categorical functions for lexical ordering and local checks (#25514)
- Document schema parameter in meta methods (#25543)
- Explain aggregation & sorting of lists (#25260)
- Fix LanceDB URL (#25198)
- Fix incorrect 'bitwise' in
any_horizontal/all_horizontal docstring (#25469)
- Fix link errors reported by
markdown-link-check (#25314)
- Fix non-existent
replace_all reference in replace docs (#25161)
- Fix source path (#25170)
- Fix typo in public dataset URL (#25044)
- Mention Narwhals in ecosystem page (#25100)
- Remove lzo from parquet write options (#25522)
- Update
LazyFrame.collect_schema docstring (#25508)
- Update
LazyFrame.remote signature (#25175)
- Update on-premise documentation (#25489)
- Update user guide for QueryProgress rename to QueryProfile (#25195)
π§ͺ Tests
- Add
assert_sql_matches coverage for SQL "DISTINCT" and "DISTINCT ON" syntax (#25440)
- Add reliable test for
pl.format on multiple chunks (#25164)
- Add test for unique with column subset (#25241)
- Better coverage for
group_by aggregations (#25290)
- Test for
group_by(...).having(...) (#25430)
π§ CI
- Automatically label pull requests that change the DSL (#25177)
- Avoid relabelling changes-dsl on every commit (#25216)
- Print expected DSL schema hashes if mismatched (#25526)
- Skip existing files in pypi upload (#25576)
ποΈ Build system
- Fix
make fmt and make lint commands (#25200)
- Make building the docs on macOS more reliable (#25095)
π οΈ Other improvements
- Add
Final type-qualifier to module-level constants (#25556)
- Add
proptest AnyValue strategies (#25510)
- Add
proptest DataFrame strategy (#25446)
- Add
proptest strategies for Series logical types (#24849)
- Add
proptest strategies for Series nested types (#25220)
- Add some cleanup (#25445)
- Add toolchain file to runtimes for sdist (#25311)
- Enable more streaming tests (#25364)
- Fix --uv argument for benchmark-remote (#25513)
- Fix Decimal precision annotation (#25227)
- Fix feature gating TZ_AWARE_RE again (#25493)
- Fix template path in release-python workflow (#25565)
- Fix typo in CI release workflow (#25309)
- Make python docs build again (#25165)
- Remove
Column::Partitioned (#25324)
- Remove debug file write from test suite (#25393)
- Remove unused import (#25365)
- Run
maturin with --uv option (#25490)
- Silence unused mut warning (#25093)
- Skip rust integration tests for coverage in CI (#25558)
- Update markdown link checker (#25201)
- Update toolchain (#25007)
- Update versions (#25141)
- Upgrade to schemars 0.9.0 (#25158)
- Upgraded
ruff and typos and made the necessary lint updates (#25196)
β»οΈ Refactoring
- Accept multiple files in
pipe_with_schema (#25388)
- Add IR for
scan_lines (#25066)
- Add
ElementExpr for _eval expressions (#25199)
- Add asserts and tests for
list.eval on multiple chunks with slicing (#25559)
- Add functions for
scan_lines (#25136)
- Add oneshot channel to polars-stream (#25378)
- Add stateful
EwmCov kernel (#25065)
- Change group length mismatch error to
ShapeError (#25004)
- Clean up CSPE callsite (#25215)
- Directly take
CloudScheme in parse_cloud_options (#25304)
- Disable recursive CSPE for now (#25085)
- Dispatch
Series.set to zip_with_same_dtype (#25327)
- Fix unsoundness in ChunkedArray::{first, last} (#25449)
- Make
pipe_with_schema work on Arced schema (#25155)
- Move
EwmMeanState to polars-compute (#25034)
- Move asof
tolerance type coercion to IR conversion (#25033)
- Move ewm variance code to polars-compute (#25188)
- Move supertype determination and casting to IR for
date_range and related functions (#24084)
- Refactor
dt_range functions (#25225)
- Refactor sink IR (#25308)
- Remove
ClosableFile (#25330)
- Remove
PyPartitioning (#25303)
- Remove aggregation context
Context (#25424)
- Remove incorrect cast in reduce code (#25321)
- Remove lower_ir conversion from Scan to InMemorySource (#25150)
- Remove old join projection pushdown logic (#25088)
- Remove some dead argminmax impl code (#25501)
- Remove unused
optimization_toggle (#25130)
- Remove unused row-count (#25080)
- Remove verbose prints on file opens (#25523)
- Rename
URL_ENCODE_CHARSET to HIVE_ENCODE_CHARSET (#25554)
- Simplify sink parameter passing from Python (#25302)
- Support for named/anonymous aggregations (#25118)
- Take
&dyn Any instead of Box<dyn Any> in python object converters (#25421)
- Take
sync parameter in Writeable::close (#25475)
- Update partitioned sink IR (#25524)
- Use dedicated runtime packages from template (#25284)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @TNieuwdorp, @alexander-beedie, @borchero, @c-peters, @cBournhonesque, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @dsprenkels, @etiennebacher, @feliblo, @itamarst, @jannickj, @jetuk, @kdn36, @lun3x, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @vyasr, @wtn and more!