π Highlights
- implementing sink_csv for LazyFrame (#10682)
π₯ Breaking changes
- empty product returns identity (#10842)
- return
f64
for rank
when method="average"
(#10734)
- Rename
groupby
to group_by
(#10654)
- Read/write support for IPC streams in DataFrames (#10606)
- Change behavior of
all
- fix Kleene logic implementation for all
/any
(#10564)
- remove fixed_seed and add pl.set_random_seed (#10388)
- Make
arange
an alias for int_range
(#9983)
date_range
/time_range
no longer return a List
type (#10526)
- Remove various functionalities deprecated before
0.18
(#10527)
β οΈ Deprecations
- Rename
is_first/last
to is_first/last_distinct
(#11130)
- Rename
count_match
to count_matches
(#11028)
- Rename
strip
to strip_chars
(#10813)
- Add
datetime_range
expression function (#10213)
- Rename
Series/Expr.rolling_apply
to rolling_map
(#10750)
π Performance improvements
- improve performance of fast projection (#10945)
- parse time zones outside of downcast_iter() in replace_time_zone (#10713)
- use binary abstraction for atan2 (#10588)
- use binary abstraction in pow (#10562)
β¨ Enhancements
- Expressify str.split argument. (#11117)
- Expressify argument of binary contains (#11091)
- dt.offset_by supports broadcasting lhs (#11095)
- Expressify argument of binary starts_with and ends_with (#11076)
- json_extract supports extract static and string value to list dtype (#11057)
- add quote_style="never" option for
write_csv
(#11015)
- add support for nextest (#11048)
- Add
literal
for str count_match (#10996)
- More dtypes supports cast to list (#11025)
- ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
- Add
strip_prefix
and strip_suffix
to the string namespace (#10958)
- Add
datetime_range
expression function (#10213)
- add proper cache for Regex compilation (#10934)
- implementation of
array_to_string
(#10839)
- apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
- accept expr in
str.count_match
(#10900)
- accept expressions in
.offset_by
(#9967)
- implement drop as special case of
select
(#10885)
- Supports is_last operation (#10760)
- activate cse for group_by (again) (#10749)
- add pairwise float sum implementation (#10756)
- implementing sink_csv for LazyFrame (#10682)
- Supports series unique & arg_unique & n_unique for list (#10743)
- repeat_by should also support broadcasting of LHS (#10735)
- deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
- is_first also supports numeric list type. (#10727)
- improve slice pushdown in unions (#10723)
- Support min and max strategy for binary & str columns fill null (#10673)
- support broadcasting in list set operations (#10668)
- add
truncate_ragged_lines
(#10660)
- supports cast to list (#10623)
- Rename
groupby
to group_by
(#10654)
- preserve whitespace in notebook output (#10644)
- Read/write support for IPC streams in DataFrames (#10606)
- improve binary (arity) generics (#10622)
- propagate null is in
is_in
and more generic array construction (#10614)
- Change behavior of
all
- fix Kleene logic implementation for all
/any
(#10564)
- frame-level
cast
support (#10504)
- Add failed column to cast exception (#10507)
- Make
arange
an alias for int_range
(#9983)
date_range
/time_range
no longer return a List
type (#10526)
- Remove various functionalities deprecated before
0.18
(#10527)
π Bug fixes
- Correct hash and fmt for struct expr (#11119)
- enforce sortedness of by argument in rolling_* functions (#11002)
- Filter on empty objectChunked should not throw error (#11073)
- ensure null_count statistics accounts for null array (#11070)
- toggle off cse if ext_context is used (#11051)
- Correct field dtype of string concat (#11055)
- pushed-down expr should be considered when evaluating ExternalContext (#11023)
- fix rolling_* functions when "by" has nanosecond resolution (#11005)
- Don't reuse member for Selector::Add (#11026)
- fix the construction of List<Null> (#10969)
- allow singular null in regex pattern (#10948)
- compute length of null array in explode (#10946)
- Allow exactly one value in start/end for
int_range
(#10914)
- count was falsy tagged as cse in group by (#10917)
- Retain original dtype when deserializing an empty list (#10893)
- CSE don't accept opaque functions (#10905)
- Make
int_range(s)
exclusive on the upper bound when step is negative (#10898)
- fix conversion from decimal to float (#10776)
- Add broadcasting for list comparisons (#10857)
- don't overflow length before checking limit (#10883)
- fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
- tag amortized iter unsafe and add safe alternatives (#10881)
- use pool in dataframe arithmetic (#10864)
- remove debug
println!
from datetime fn (#10862)
- repair polars_err string interpolation (#10863)
- make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
- empty product returns identity (#10842)
- never panic in hash/equality doesn't hold in cse (#10836)
- Improve bound checks on temporal ranges (#10837)
- var/std behavior around few elements (#10828)
- Fix divided by zero error when read empty csv in streaming mode (#10819)
- fix equality of quantile aggregation node (#10816)
- Reading an only-header csv file in streaming mode should not panic (#10810)
- get_single_leaf can't handle Expr::Count (#10790)
- string to decimal parsing (#10712)
- support groupby literal in streaming (#10771)
ORDER BY
on unselected columns (#10752)
- Fix is_in cannot cast list type for float (#10769)
- fix unicode truncation in json parsing (#10761)
- Error message of list unique should not display inner type (#10748)
- create
chunks_mut
entry in vtable (#10745)
- Prevent panic on sample_n with replacement from empty df (#10731)
- only preserve sortedness flag in replace_time_zone when safe (#10738)
- Error on
value_counts
on column named "counts"
(#10737)
- Build Series from empty Series vector (#10558)
- return
f64
for rank
when method="average"
(#10734)
- Keep min/max and arg_min/arg_max consistent. (#10716)
- Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
- Cast small int type when scan csv in streaming mode. (#10679)
- Reused input series in rolling_apply should not be orderly (#10694)
- re-sort buffer when update window swap the whole buffer (#10696)
- Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
- Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
AllHorizontal
format string (#10658)
- List<null> chunked builder should take care of series name (#10642)
- respect 'ignore_errors=False' in csv parser (#10641)
- fix rename + projection pushdown (#10624)
- fix int/float downcast in
is_in
(#10620)
- Change behavior of
all
- fix Kleene logic implementation for all
/any
(#10564)
- Fix serialization for categorical chunked. (#10609)
- join_asof missing
tolerance
implementation, address edge-cases (#10482)
- Take input_schema to create physical expr for Selection (#10571)
- fix serialization of empty lists (#10563)
- Clear window cache after evaluate predication expr (#10505)
- Parsing regex col in Expr::Columns (#10551)
- sanitize column naming in boolean ops (#10531)
- fix build for wasm (#10536)
- remove fixed_seed and add pl.set_random_seed (#10388)
- fix build for wasm (#9502)
- rollback cse in groupby: python 0.18.15 (#10491)
π οΈ Other improvements
- Removed duplicated example (#11109)
- Add CODEOWNERS for docs folder (#11107)
- Refactor starts_with and ends_with for string (#11085)
- Integrate user guide (#11089)
- remove feature gate join/groupby in polars-core (#10965)
- Add Documentation issue type (#11042)
- complete intra-docs in api documentation (#11007)
- genericize take implementation (#10976)
- genericize PolarsDataType (#10952)
- enhance internal crates readme with reference to main crate (#10928)
- Add
Duration
method for checking full days (#10850)
- apply with_name in more places (#10899)
- never compare opaque functions (#10906)
- eliminate repetition in utf8 datetime functions (#10860)
- Fix issue templates for bug reports (#10896)
- remove
LocalProjection
(#10886)
- request verbose logging output of minimal reproducable examples (#10882)
- Reorganize
range
expression module (#10871)
- introduce with_name for Series/ChunkedArray (#10859)
- Further refactor temporal range functions (#10844)
- Refactor
range
related functions (#10830)
- Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
- Fix some broken links / formatting (#10772)
- Improve docs for
polars-lazy
(#10729)
- update rustc nightly_2023-08-26 (#10467)
- default to rust native flate2 lib (#10733)
- Clear GitHub Actions caches weekly (#10715)
- move 'is_in' to polars-ops (#10645)
- Clean up schema calculation for
date_range
(#10653)
- remove unused apply functions and add fallible generic apply functions (#10621)
- Enforce up-to-date
Cargo.lock
(#10555)
- make binary chunkedarray functions DRY (#10607)
- bump MSRV to 1.65 (#10568)
- genericize chunk implementation (#10506)
- use ChunkArray::(try_)from_chunk_iter (#10497)
- add VSCode rust-analyzer settings (#10498)
- Update URLs for dev documentation (#10495)
- Update features for latest
flate2
release (#10492)
Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj