π Highlights
- new implementation for
String/Binary
type. (#13748)
π₯ Breaking changes
- Remove
DatetimeChunked::convert_time_zone
(#14046)
- Rename
LiteralValue::to_anyvalue
to LiteralValue::to_any_value
(#14033)
- Rename
drop_columns
to drop
(#13754)
- Rename
pl.count()
to pl.len()
(#13719)
- Rename
row_count_name
/row_count_offset
parameters in IO functions to row_index_*
(#13563)
- Rename
with_row_count
to with_row_index
(#13494)
π Performance improvements
- prune parquet row groups when
is_not_null
is used (#14260)
- use is_between to skip parquet row groups (#14244)
- Use a compression API that is designed for this use case (#11699) (#14194)
- Use
UnitVec
in polars-plan traversal (#14199)
- use
UnitVec
in streaming joins (#14197)
- improve
ChunkId
(#14175)
- improve iteration performance (#14126)
- elide unneeded work in window? (#14108)
- run window functions more in parallel (#14095)
- improve skip row group using statistics condition (#14056)
- improve string/binary reverse performance (#14016)
- optimize
DataFrame.describe
by presorting columns (#13822)
- elide redundant bound checks. (#13909)
- speedup boolean filter (#13905)
- speedup binview filter (#13902)
- improve binview filter (#13878)
- apply string view GC more conservatively (#13850)
- add optimized BinaryViewArray comparison kernels (#13839)
- lazy cache binview bytes len (#13830)
- fast-path for eager int_range (#13811)
- Optimize
arr.sum
for inner non-null bool (#13800)
- directly embed data ptr in Buffer (#13744)
- elide parallelism restriction on generic rolling expressions (#13662)
- ensure time groups are parallelized (#13660)
- do not eagerly compute bitcount (#13562)
- optimise SQL engine string concat (#13499)
- remove lifetime requirement from CategoricalChunkedBuilder (#13319)
β¨ Enhancements
- add
u8
/i8
/u16
/i16
parsers to CSV reader (#14241)
- Implements
list.gather_every
(#14253)
- Implements
prefix/suffix_fields
(#14251)
- Polish decimal arithmetic (#14172)
- Introduce
arr.to_struct
(#14202)
- Supports map fields name of struct (#14203)
- make
IdxVec
generic as UnitVec
(#14196)
- add new arithmetic kernels (#14026)
- Supports
unique
and hash_rows
for null
column (#14111)
- Implement arithmetic operations for
Null
columns (#14107)
- Add strict/non-strict construction of Boolean/Binary series (#14073)
- Improve
Series::from_any_values
logic (#14052)
- Adapt extend_constant to function expr architecture and expressify it (#14058)
- add integer negation (#14049)
list
& array
measures of dispersion (#13245)
- gc binview when writing ipc (#14035)
- When calling
convert_time_zone
on time-zone-naive datetime, convert as if converting from UTC (#13960)
- DataFrame supports explode by array column (#13958)
- improve binary formatting (#13981)
- preserve Enum information when going to IPC (#13943)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (#13944)
- support cast decimal to utf8 (#13829)
- add SQL support for
timestamp
precision modifier (#13936)
- support negative indexing and expressions for
LEFT
, RIGHT
and SUBSTR
SQL string funcs (#13888)
- Introduce
explode
for ArrayNameSpace
(#13923)
- raise better error message for .dt.time on Date column (#13932)
- List set_operations supports float (#13920)
- Add
ignore_nulls
for arr.join
(#13919)
- register 'set_sorted' as batch/elementwise (#13896)
- move Enum/Categorical categories to binview (#13882)
- Add
ignore_nulls
for list.join
(#13701)
- Add
ignore_nulls
for pl.concat_str
(#13877)
- fix parquet for binview (#13873)
- support mmap for binview in OOC (#13872)
- implement ffi for
binview
(#13871)
- Support zero fill null strategy for binary and string columns (#13869)
- Implement/fix unary minus operator
-pl.col(...)
(#13776)
- extend SQL
EXTRACT
with "century", "millennium", and "timezone" parts (#13634)
- fix binview ipc format (#13842)
- add SQL support for
numeric
and/or decimal
types (#13739)
- improve panic message (#13836)
- Expressify
str.zfill
(#13790)
- new implementation for
String/Binary
type. (#13748)
- Add
nulls_last
for Series.sort
(#13794)
- Impl
count_matches
for array namespace (#13675)
- Add
nulls_last
for list/array.sort
(#13795)
- Rename
drop_columns
to drop
(#13754)
- convert fixed-offset timezones to respective Etc timezone from time zone database (#13738)
- Expressify
str.slice
(#13747)
- implement binview for polars-row (#13736)
- implement binview for polars-json (#13737)
- add architecture for polars-flavored IPC (#13734)
- implement binview comparison kernels (#13715)
- raise default frame/series repr height from 8 to 10 (#13699)
- write parquet ColumnOrder (#13672)
- Impl
contains
for ArrayNameSpace (#13638)
- improve
rolling()
expression formatting (#13657)
- Implement
is_between
in Rust (#11945)
- Expressify
pattern
of str.extract
(#13607)
- Impl
join
for ArrayNameSpace (#13586)
- add SQL engine support for string cast to
json
(#13624)
- add SQL engine support for
EXTRACT
and DATE_PART
(#13603)
- add
BinaryView
to parquet
writer/reader. (#13489)
- add SQL engine support for
POSITION
and STRPOS
(#13585)
is_in
support for array dtype (#13559)
- add new
str.find
expression, returning the index of a regex pattern or literal substring (#13561)
- add SQL engine support for
LIKE
and ILIKE
pattern matching (#13522)
- improve hive partition pruning (#13358) (#13426)
- don't rechunk by default in lazy scans (#13518)
- Add
cum_count
expression function (#13478)
- add SQL engine support for
IF
control flow function (#13491)
- add SQL engine support for
MOD
function (#13502)
- return datetime for datetime mean & median (#13417)
- add SQL engine support for
CONCAT_WS
string function (#13483)
BinaryView
/Utf8View
IPC support (#13464)
- Implement wasm Pool::scope (#13476)
- add SQL engine support for
RIGHT
and REVERSE
string functions (#13461)
- implement
BinaryView
and Utf8View
in polars-arrow
(#13243)
- add SQL engine support for variadic string
CONCAT
function (#13428)
- add support for AND in SQL join-clause context (#13242)
- Impl ordering ops for array namespace (#13414)
- add SQL engine support for
REPLACE
string function (#13431)
- add SQL engine support for
SIGN
function (#13429)
- add SQL engine support for
IFNULL
function (#13432)
- additional SQL support for
bytes
, bit
, and hex
literals (#13389)
π Bug fixes
- deduplicate recursive growables (#14264)
- Fix
glimpse
overload signature (#14258)
- allow set operations on list of categoricals (#14110)
any/all_horizontal
with single input has incorrect type (#14256)
- load numpy array with np array values #14237 (#14238)
- Fix join validation for String types (#14229)
- make csv parser more robust to edge cases (#14210)
- Fix for
set_operations
of binary dtype (#14152)
- fix read_csv date/datetime inference and parsing (#14113)
- don't see files as hive partitions (#14128)
- allow eval on list of categoricals (#14132)
- add missing conditional compile flag for
StringFunction::Find
(#14129)
- Forbid casting from
Date
to Time
and vice versa (#14127)
- preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
- Implements
gt/lt
cmp for null dtype (#14119)
- ignore comments at beginning of csv if schema provided (#14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
- some temporal conversion errors for datetimes earlier than
1970-01-01
(#14050)
- Preserve name when casting from categorical (#14085)
- fix cse bug when window function is nested (#14070)
- Fix
melt
panic when there are no value vars (#14057)
json_encode
should respect the logical type (#14063)
- improve skip row group using statistics condition (#14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
- handle
SliceSink
with empty data (#14025)
- correct field type schema inference (using read_csv) (#14042)
- Map
AnyValue::Null
to datatype Null
(#14045)
- Use int formatter for unsigned ints (#14043)
- quick fix for multiple chunks binary reverse (#14024)
- count matches on list categorical (#14021)
list.min/max
with empty and/or None elements (#14018)
- allow get access to list of categoricals (#14015)
- Fix casting from categorical to numeric (#13957)
- read_csv preserve whitespace and newlines (#13934)
- append decimal with different scale (#13977)
- Allow casting integer types to Enum (#13955)
arg_min/max
on categoricals should respect ordering (#13998)
- serialize decimal type (#13997)
- check input type for
arr/list.contains
(#13959)
- Allow dtype merge when inner dtype is enum (#13938)
- recurse less in streaming shared sinks (#13930)
- ensure order is preserved if streaming from different sources (#13922)
- Fix
is_not_null
for Struct columns (#13921)
- make 100 * pl.col(pl.Boolean).mean() work (#13725)
- allow extract of numeric from str AnyValue (#13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (#13808)
- prune emtpy chunks before set operations (#13898)
- treat null columns as zero in
sum_horizontal
(#13880)
- include null count in rolling window validity with
min_periods
(#13863)
- don't return NaN as free memory fraction (#13860)
- parquet hybrid RLE encoding did not always align to bit width (#13883)
- Add
ignore_nulls
for list.join
(#13701)
- .dt.time() was panicking for datetimes prior to unix epoch (#13812)
- Correct err message of
check_map_output_len
(#13854)
- allow list creation of decimals (#13851)
- Implement
abs
for Decimal, error on Date/Time/Datetime (#13821)
- decompress the right number of rows when reading compressed CSVs (#13721)
- rolling nested groups deadlock (#13835)
gather_every
should work on agg context (#13810)
- When reading Parquet or Arrow, convert +00:00 timezone to UTC (#13816)
- Fix segfault of
is_in
(#13814)
- don't panic on full null qcut (#13815)
- do not read data for zero-length compressed buffer (#13791)
- Fix the non-null test of
transpose
(#13783)
- Raise error instead of panic when joining on wildcard/nth (#13742)
str.concat
correctly ignore single null value (#13751)
- Selectors
by_name
and by_dtype
should allow empty list as input (#11024)
- Use
NonZeroUsize
for batch_size
parameter in write_csv/sink_csv/scan_ndjson
(#13726)
- error instead of panicking in sql if empty function (#13691)
- gather.get schema (#13679)
- ensure we hit proper cache in nested
rolling
expressions (#13666)
- Allow
av_buffer
cast numeric record to temporal type (#13661)
- streaming cross join if swapped is hit (#13656)
- Make sure rolling key is projected when process projection (#13622)
- fix schema inference for json (#13637)
- Empty series of AggregatedList should also have list dtype (#13620)
- fallback to cast kernel if
inline_cast
AnyValue raise (#13595)
LazyFrame::join()
no longer ignores 3 JoinArgs
parameters (#13570)
- fix reverse variable row decoding (#13587)
- Fix
scatter
for null values (#13578)
- Fix
cum_count
with regards to start value / null values (#13535)
- Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
- Treat Python
None
as null value for Object
dtype (#13564)
Expr.replace
to single value did not replace NULLs (#13551)
AnyValue::StructOwned
panic when hashing (#13553)
- improve hive partition pruning (#13358) (#13426)
- fix projection pushdown for new outer join schema (#13527)
- ensure size-hint of TrueIdxIter is correct (#13508)
- correct 'outer_coalesce' logic in case of duplicate names (#13501)
- raise for out-of-range datetimes in to_datetime/strptime (#13403)
- Keep logical type when getting values from list (#13456)
- Handle duplicate/ambiguous inputs for
replace
(#13217)
- skip null/empty values if replace_lit_n_char (#13400)
- fix is_in operator when comparing string with global categoricals (#13412)
- use different generics for
shift_and_fill
parameters (#13379)
π Documentation
- fix code block in user-guide/lazy/schemas (#14228)
- Fix typo in contributing guide (#14181)
- Small improvements Ecosystem page (#14176)
- fix code blocks in user-guide/concepts/data-structures (#14146)
- Fix bullet point formatting in CI contributing guide (#14117)
- Remove outdated reference to horizontal concat feature (#14105)
- Replace alternatives page with more objective comparison (#13784)
- Improve structure of user guide (#13951)
- Improve structure of user guide (#13639)
- Introduce ecosystem page in user guide (#13903)
- Mention deltalake write support in README (#13890)
- Fix typo in deprecation message of
with_row_count
(#13793)
- Fix incorrect "coming from pandas" syntax (#13767)
- Improve streaming section of the user guide (#13750)
- fix linking to feature flags in user guide (#13644)
- Improve documentation on broadcasting (#13394)
- Add note about toolchain issue under native Windows (#13590)
- update SQL section of the README (#13529)
- update polars-business > polars-xdt link (#13509)
π¦ Build system
- Enable feature nightly with optional sql feature (#14222)
- remove horizontal_concat feature (#13390)
π οΈ Other improvements
- make gather_chunked completely generic (#14195)
- Add
.cargo
directory to .gitignore (#14191)
take_chunked
to polars-ops (#14185)
- Enable
clippy
lint to warn on debug macros (#14178)
- Run
cargo update
(#14160)
- merge take kernels (#14137)
- improve From<Ca> -> Vec (#14123)
- hoist boolean -> string cast (#14122)
- Remove
DatetimeChunked::convert_time_zone
(#14046)
- More generic way to present an expression tree diagram (#14020)
- Rename
LiteralValue::to_anyvalue
to LiteralValue::to_any_value
(#14033)
- make Enums an actual datatype (#14011)
- update rustc (#13947)
- move
filter
to polars-compute
(#13897)
- bump object_store to 0.9 (#13857)
- Make functions in
expr/general
non-anonymous (#13832)
- Fix doctests (#13831)
- Refactor Python release workflow (#13807)
- Make
pl.duration
non-anonymous (#13762)
- Rename
pl.count()
to pl.len()
(#13719)
- Deprecate
dt.with_time_unit
in favor of cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))
(#13667)
- Auto-add 'needs triage' label to bugs (#13671)
- make rolling index column visible to optimizer (#13658)
- Rename
lazy-regex
feature to regex
to align polars
with polars-lazy
crate (#13647)
- Add
Documentation
/ Build system
sections to the changelog (#13594)
- Filter unhelpful messages in
make build
(#13579)
- Remove extra line break between checkboxes in GitHub bug report issues (#13576)
- Rename
row_count_name
/row_count_offset
parameters in IO functions to row_index_*
(#13563)
- Rename
with_row_count
to with_row_index
(#13494)
- simplify parquet binary ordering function (#13488)
- dont panic of ambiguous is of wrong type (#13388)
Thank you to all our contributors for making this release possible!
@29antonioac, @Bromeon, @ByteNybbler, @JulianCologne, @MarcNuebel, @MarcoGorelli, @NedJWestern, @ShivMunagala, @Vincenthays, @Wainberg, @aaarrti, @alexander-beedie, @apcamargo, @bchalk101, @braaannigan, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @hamishs, @henryharbeck, @ion-elgreco, @itamarst, @jacksonthall22, @jcrozum, @kstoneriv3, @langestefan, @lukemanley, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @taki-mekhalfa, @thomasaarholt, @tim-stephenson, @universalmind303, @valorien and @wjandrea