An upgrade guide is available on our website.
π Highlights
- implementing sink_csv for LazyFrame (#10682)
- Support
DataFrame init from queries against users' existing database connections (#10649)
- Rename
groupby to group_by (#10656)
π₯ Breaking changes
- return
f64 for rank when method="average" (#10734)
- Update a lot of error types (#10637)
- Remove deprecated behavior from vertical aggregations (#10602)
- Read/write support for IPC streams in DataFrames (#10606)
- Change behavior of
all - fix Kleene logic implementation for all/any (#10564)
- Improve consistency of parsing expression input (#9512)
- allow
from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
- remove fixed_seed and add pl.set_random_seed (#10388)
- Make
arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
- Remove various functionalities deprecated before
0.18 (#10527)
- Improve some error types and messages (#10470)
β οΈ Deprecations
- Rename
map to map_batches (#10801)
- Rename
GroupBy.apply to map_groups (#10799)
- Rename
DataFrame.apply to map_rows (#10797)
- Rename
Series/Expr.rolling_apply to rolling_map (#10750)
- Rename
Series/Expr.apply to map_elements (#10678)
- Rename
groupby to group_by (#10656)
- Deprecate some parameters of
cut/qcut (#10484)
π Performance improvements
- parse time zones outside of downcast_iter() in replace_time_zone (#10713)
- use binary abstraction for atan2 (#10588)
- use binary abstraction in pow (#10562)
β¨ Enhancements
- activate cse for group_by (again) (#10749)
- implementing sink_csv for LazyFrame (#10682)
- Supports series unique & arg_unique & n_unique for list (#10743)
- repeat_by should also support broadcasting of LHS (#10735)
- deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
- is_first also supports numeric list type. (#10727)
- improve slice pushdown in unions (#10723)
- Explicitly implement
Protocol for interchange classes (#10688)
- Support min and max strategy for binary & str columns fill null (#10673)
- support broadcasting in list set operations (#10668)
- csv: add schema argument (#10665)
- Support
DataFrame init from queries against users' existing database connections (#10649)
- add
truncate_ragged_lines (#10660)
- supports cast to list (#10623)
- Update a lot of error types (#10637)
- preserve whitespace in notebook output (#10644)
- Remove deprecated behavior from vertical aggregations (#10602)
- support selector usage in
write_excel arguments (#10589)
- Add
LazyFrame.collect_async and pl.collect_all_async (#10616)
- Read/write support for IPC streams in DataFrames (#10606)
- propagate null is in
is_in and more generic array construction (#10614)
- Change behavior of
all - fix Kleene logic implementation for all/any (#10564)
- frame-level
cast support (#10504)
- Improve consistency of parsing expression input (#9512)
- Add failed column to cast exception (#10507)
- allow
from_arrow to take a generator of RecordBatches, change error type to TypeError (#10529)
- Remove deprecated
get_idx_type - use get_index_type instead (#10556)
- Make
arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
- Remove various functionalities deprecated before
0.18 (#10527)
- Improve some error types and messages (#10470)
- suggest str.to_datetime instead of apply and stdlib strptime (#10266)
π Bug fixes
- get_single_leaf can't handle Expr::Count (#10790)
- support groupby literal in streaming (#10771)
ORDER BY on unselected columns (#10752)
- Fix is_in cannot cast list type for float (#10769)
- whitespace CSS in Notebook HTML updated to use
pre-wrap instead of pre (#10739)
- only preserve sortedness flag in replace_time_zone when safe (#10738)
- Error on
value_counts on column named "counts" (#10737)
- return
f64 for rank when method="average" (#10734)
- Keep min/max and arg_min/arg_max consistent. (#10716)
- use time zone from dtype to overwrite output time zone when initialising Series (#10689)
- Cast small int type when scan csv in streaming mode. (#10679)
- raise exception with invalid
on arg type for join_asof (#10690)
- Reused input series in rolling_apply should not be orderly (#10694)
- re-sort buffer when update window swap the whole buffer (#10696)
- Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
- Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
- Correctly handle time zones in
write_delta (#10633)
- fix apply for empty series in threading mode (#10651)
- respect 'ignore_errors=False' in csv parser (#10641)
- fix rename + projection pushdown (#10624)
- fix int/float downcast in
is_in (#10620)
- Change behavior of
all - fix Kleene logic implementation for all/any (#10564)
- Fix serialization for categorical chunked. (#10609)
- Take input_schema to create physical expr for Selection (#10571)
- Clear window cache after evaluate predication expr (#10505)
- Parsing regex col in Expr::Columns (#10551)
- sanitize column naming in boolean ops (#10531)
- Fix
write_delta with schema in delta_write_options (#10541)
- remove fixed_seed and add pl.set_random_seed (#10388)
- respect
pl.Config options relating to shape, column names, and types when rendering HTML (#10449)
π οΈ Other improvements
- update cargo.lock (#10800)
- Create
.venv in repo root (#10789)
- refactored
write_database unit tests to properly separate concerns (#10773)
- Fix some broken links / formatting (#10772)
- Document chained when-then behaviour more prominently (#10759)
- Fix test failing due to new
adbc release (#10763)
- Unpin
connectorx and bump other Python dependencies (#10753)
- add note to
testing docs about module import (#10741)
- Clear GitHub Actions caches weekly (#10715)
- Update for new pyarrow
13.0.0 behavior (#10691)
- Fix minor issue with
sink_parquet docs (#10669)
- Remove
deprecate_renamed_methods util (#10537)
- add "see also" entries to ne/eq_missing and update related examples (#10667)
- fix potential memory leak from usage of
inspect.currentframe (#10630)
- give more relevant example for polars.apply (#10631)
- Bump ruff and enable new setting (#10626)
- Add docstrings for
Expr.meta namespace (#10617)
- Enforce up-to-date
Cargo.lock (#10555)
- deprecate DataFrame.replace (#10600)
- ensure that
make requirements fully refreshes unpinned packages/deps (#10591)
- fix out-of-date explain default parameter (#10566)
- Fix
expr_dispatch decorator to work on methods with decorators (#10549)
- Fix link to source code (#10542)
- Add title to index page (#10539)
- Disable SIM108 lint (#10519)
- Keep versioned docs (#10500)
- switch to
pyo3/maturin-action (#10503)
- Update URLs for dev documentation (#10495)
- Skip failing test (#10496)
- Add version switcher to API reference (#10488)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Object905, @OndrejSlamecka, @SeanTroyUWO, @VasanthakumarV, @alexander-beedie, @aminalaee, @braaannigan, @c-peters, @ion-elgreco, @lorepozo, @marki259, @mcrumiller, @messense, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @sdamashek, @stinodego, @svaningelgem, @titoeb, @trueb2, @washcycle and @zundertj