π₯ Breaking changes
- Purge arrow-rs support (#19312)
π Performance improvements
- Address inadvertent quadratic behaviour in
expand_columns (#19469)
- Move rolling_corr/cov to an actual implementation on Series (#19466)
- Don't split par if cast to categorical (#19462)
- Improve var/cov/corr performance (#19381)
- Reduce memcopy in parquet (#19350)
- Optimize array and list gather (#19327)
- Add/fix unordered row decode, change unordered format (#19284)
- Fast decision for Parquet dictionary encoding (#19256)
- Make date_range / datetime_range ~10x faster for constant durations (#19216)
- Batch utf8-validation in csv
18% / 25% on 1.9.0 (#19124)
- Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17% (#19088)
- Use List's TotalEqKernel (#18984)
- Improve rename performace for Lazy API (#18890)
- Collapse cross-joins to faster joins (#18633)
- Cache register plugin function (#18860)
β¨ Enhancements
- Implement nested Parquet writing for High-Precision Decimals (#19476)
- Improve
read_database typing (#19444)
- Add IPC sink in new streaming engine (#19431)
- Added
escape_regex operation to the str namespace and as a global function (#19257)
- Add SQL support for
bit_count and bitwise &, |, and xor operators (#19114)
- Add credential provider utility classes for AWS, GCP (#19297)
- Support decoding Float16 in Parquet (#19278)
- Experimental
credential_provider argument for scan_parquet (#19271)
- Allow DeltaTable input to scan_delta and read_delta (#19229)
- Make FlightConsumer Send and support compressed data (#19262)
- New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
- Conserve Parquet
SortingColumns for ints (#19251)
- Low level flight interface (#19239)
- Improved list arithmetic support (#19162)
- Expose LTS CPU in show_versions() (#19193)
- Check Python version when deserializing UDFs (#19175)
- Quantile function in SQL (#18047)
- Improve scalar strict message (#19117)
- Add Series::{first, last, approx_n_unique} (#19093)
- Allow for rolling_*_by to use index count as window (#19071)
- Delay deserialization of python function until physical plan (#19069)
- Add cum(_min/_max) for pl.Boolean (#19061)
- Bitwise operations / aggregations (#18994)
- Improved error message DSL -> IR resolving (#19032)
- Add
strict param to eager/lazy frame "rename" (#19017)
- Support
schema arg in read/scan_parquet() (#19013)
- Add
allow_missing_columns option to read/scan_parquet (#18922)
- Use FFI to extract Series from different Polars binaries (#18964)
- Allow for zero-width fixed size lists (#18940)
- Improve scalar strict message (#18904)
- Support arithmetic between Series with dtype list (#17823)
- Relaxed schema alignment for parquet file list read (#18803)
- Always preserve sorted flag for .dt.date (#18692)
- Implement single inequality joins for join_where (#18727)
π Bug fixes
- Include Array in
to_physical (#19474)
- Don't panic in SQL temporal string check; raise suitable
ColumnNotFound error (#19473)
- Properly raise on mean_horizontal with wrong dtypes (#19472)
- Make output dtype known for
list.to_struct when fields are passed (#19439)
- Address inadvertent quadratic behaviour in
expand_columns (#19469)
- Ensure sorted flag is unset after Int->String cast (#19470)
- Fix row_index of batched reader (#19465)
- Fix perfect groupby (#19461)
- Correct wildcard expansion for functions (#19449)
- Ensure struct
eq/ne_missing also compares outer validity (#19443)
- Fix incorrect reverse on struct containing NULLs (#19446)
- Faulty
escape_regex example (#19440)
- Capture groups should be ignored in replace when literal=True (#19413)
- Fix
ColumnNotFound when using pl.element() inside list.eval (#19438)
- Updates error message in csv parser to recommend schema_overrides instead of deprecated dtypes argument (#19416)
- Incorrect
.join(..., how="left").head(N) if N <= left_df.height() and there are duplicate matches (#19422)
- Support Array type in more DataType methods (#19427)
- Bug in group_tuples_perfect, tail was not processed properly (#19417)
- Ensure that
ASCII* table formats do not use the UTF8 ellipsis char when truncating rows/cols/values (#19404)
- Allow .get(null) in groupby context (#19401)
- Fix
include_file_paths and with_row_index for streaming CSV scan (#19394)
- Flaky parametric parquet test (#19393)
- Raise on data mismatch in
str.json_decode (#19347)
- Fix unsoundness in group_tuples_perfect (#19359)
- Ensure Python version matches version used to serialize credential provider (#19375)
- Capture groups should be ignored in replace_all when literal=True (#19366)
- Ignore Parquet
is_{min,max}_value_exact when set to true (#19344)
- Projection pushdown was ignored by
include_file_paths (#19341)
- Don't produce duplicate column names in Series.to_dummies (#19326)
- Use of
HAVING outside of GROUP BY should raise a suitable SQLSyntaxError (#19320)
- Fix empty array gather (#19316)
- Merge categorical rev-map in
unpivot (#19313)
- DataFrame descending sorting by single list element (#19233)
- Fix cse union schema (#19305)
- Correctly load Parquet statistics for f16 (#19296)
- Error on invalid query (#19303)
- Fix enum scalar output (#19301)
- Fix list gather invalid fast path (#19299)
- Fix quoting style of decimal csv output (#19298)
- Don't vertically parallelize literal select (#19295)
- Fix struct reshape fast path (#19294)
- Also split on forward slashes during hive path inference on Windows (#19282)
- Don't cse
as_struct (#19280)
- Only apply string parsing to String dtype (#19222)
- Compilation error missing use JsonLineReader (#19244)
- Don't remember Parquet statistics if filtered (#19248)
- Do not check dtypes of non-projected columns for parquet (#19254)
- Parquet predicate pushdown for
lit(_) != (#19246)
- Use all chunks in
Series from arrow struct (#19218)
- Implement is_nested_null for Null Array (#19219)
- Fix struct literals (#19214)
- Plotting was not interacting well with Altair schema wrappers (#19213)
- Fixing infer_schema for DataType::Null (#19201)
- Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
- Don't unwrap() expansion (#19196)
- Properly handle non-nullable nested Parquet (#19192)
- Fix invalid list collection in expression engine (#19191)
- Implement to_arrow functionality properly for Arrays (#19077)
- Fix incorrect
(eq|ne)_missing on List/Array types (#19155)
- Properly broadcast Struct when then validity (#19148)
- Allow partial name overlap in join_where resolution (#19128)
- Fix floordiv / modulo with scalar 0 on LHS (#19143)
- Ensure aligned chunks in OOC sort (#19118)
- Recursively align when converting to ArrowArray (#19097)
- Raise on invalid shape of shape 1, empty combination (#19113)
- Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17% (#19088)
- Allow converting
DatetimeOwned to ChunkedArray (#19094)
- Throw proper error for empty char params in scan_csv (#19100)
- Ensure parquet
schema arg is propagated to IR (#19084)
- Only rewrite numeric ineq joins (#19083)
- Check validity of columns of keys/aggs in dsl->ir (#19082)
- Bitwise aggregations should ignore null values (#19067)
- Remove failing datetime subclass test (#19068)
- Fix ser/de PlSmallStr error (#19060)
- Remove failing temporal lit tests (#19056)
- Divide-by-zero in OOC sort (#19048)
- Ensure
must_flush flag is not reset (#19046)
- Error node should be on top (#19045)
- Force nested struct
missing equality (#19031)
- Fix invalid alias udf (#19021)
- Raise invalid predicate join_where (#19020)
- Fix nested flag of functions with multiple arguments (#19016)
- Fix projection pushdown bug in IEJOINS (#19015)
- Separate temporal tests (#19012)
- Return the truth values of
ne_missing and eq_missing operations for struct instead of null (#18930)
- Fix struct broadcasting comparisons (#19003)
- Wrong result on
when().then().otherwise() on struct when both result are broadcast (#19000)
- Improve literals for temporal subclasses (#18998)
- Ensure same fmt in Series/AnyValue to string cast (#18982)
- Return correct value for
when().then().else() on structs when using first()\last() (#18969)
- IPC don't write variadic_buffer_counts in blocks, but only dictionaries (#18980)
- Respect allow_threading in TernaryExpr (#18977)
- Make join test order-agnostic (#18975)
- Window function had incorrect output name on ExprIR (#18970)
- Fix
lit().shrink_dtype() broadcasting (#18958)
- Parallel evaluation of
cumulative_eval (#18959)
- Properly implement AnyValue::Binary
into_py (#18960)
- Fix
Expr.over with order_by did not take effect if group keys were sorted (#18947)
- Properly fetch type of full None List Series (#18916)
- Incorrect mode for sorted input (#18945)
- Properly choose inner physical type for Array (#18942)
- Disable very old date in timezone test for CI (#18935)
- Infer reshape dims when determining schema (#18923)
- Incorrect broadcasting on list-of-string set ops (#18918)
- Adding
with_row_index() to previously collected lazy scan does not take effect (#18913)
- Properly zip struct validities (#18886)
- Ensure ListPrimitiveBuilder dtype invariant is asserted (#18889)
- Out-of-bounds gather in categorical->int cast (#18897)
- AnyValue Series from Categorical/Enum (#18893)
- Properly cast AnyValue string (#18888)
- Fix SO in json inference (#18887)
- Use proper thread pool in cumulative_eval (#18885)
- Properly calculate duration units (#18869)
- Check values in strict cast Int to Time (#18854)
- Fix typo in DuplicateError error message (#18855)
- Properly merge live- and dead columns in prefiltered (#18862)
- DataFrame plot was raising when some extra keywords were passed to encodings (e.g.
x=alt.X(a, axis=alt.Axis(labelAngle=30))) (#18836)
- Respect strictness in list constructor (#18853)
- Properly broadcast array arithmetic (#18851)
- Throw error for comparison of unequal length series (#18816)
- Raise when parquet file has extra columns and no
select() was done (#18843)
- Replace
DynArgs with an enum containing all its variants (#18746)
- Return empty DF when input is empty json list (#18827)
- Handle AnyValue::Struct to prevent null returns (#18801)
- Struct filter by index (#18778)
- Proper dtype casting for struct embedded categoricals in chunked categoricals (#18815)
- Fixed some error/assertion types (#18811)
- Remove panic in
arr.to_struct (#18804)
- Allow empty sort by columns (#18774)
- Broadcast zip_with for structs (#18770)
- Dropped/shifted rows in parquet scan with
streaming=True (#18766)
- Fix
cum_max using exception text of cum_min for invalid dtype (#18780)
- Fix accidental raise on shape 1 (#18748)
π Documentation
- Fix docstrings for ATAN2 and ATAN2D SQL functions (#19351)
- Tiny correction post dask-expr (#19354)
- Remove ecosystem viz section since there is one in misc already (#18408)
- Fix typo in custom expressions docs (#19292)
- Add SQL docs for new
QUANTILE_CONT and QUANTILE_DISC functions (#19272)
- Add marimo to ecosystem.md (#19250)
- Link to main website from banner (#19177)
- Fix example of
as_struct (#19116)
- Clarify difference between bitwise/logical ops (#19180)
- Fix
examples/read_csv rust example (#19185)
- Add non-equi joins to, and revise, joins docs page (#19127)
- Add
Series.first,last,approx_n_unique to docs (#19146)
- Revise and improve 'Concepts' section (#19087)
- Fix example of lazy schema verification (#19059)
- Rewrite 'Getting started' page (#19028)
- Align dates in DataFrame example with Python (#18491)
- Fix
is_not_nan description (#18985)
- Recommend targetDir for rust-analyzer (#18973)
- Typo for
IntoDf trait (#18933)
- Fix broken user-guide API links (#18872)
- Fix minor rogue apostrophes (#18865)
- Fix link to issue tracker and code snippet format in GPU docs (#18850)
- Refactor
docs directory hierarchy (#18773)
- Minor improvements to contributing guide (#18777)
π¦ Build system
- Revert PyO3 version back to
0.21 (#19376)
- Bump Rust toolchain to
nightly-2024-09-29 (#19006)
- Bump
simd-json to 0.14 (#18999)
π οΈ Other improvements
- Undo conflicting fix (#19463)
- Remove code in
examples folder in favor of the user guide (#19430)
- Add var/std support to new streaming engine (#19406)
- Add streaming groupby for reductions (#19291)
- Add
AlignedBytes types (#19308)
- Don't pickle entire python function for object store keying (#19340)
- Add extra functionality to SharedStorage (#19361)
- Improve error message for Zero-Field Structs with Parquet (#19370)
- Add tracking of async task wait time statistics (#19373)
- Add capability to transmute to SharedStorage (#19353)
- Reduce memcopy in parquet (#19350)
- Purge arrow-rs support (#19312)
- Use size_of/align_of from prelude (#19311)
- Add/fix unordered row decode, change unordered format (#19284)
- Trim sliced-out memory from ListArrays in list arithmetic (#19276)
- Move from
parquet-format-safe to polars-parquet-format (#19275)
- Purge unused code (#19267)
- Skip flaky test (#19242)
- Feature gate for list arithmetic (#19237)
- Add more tests for list arithmetic (#19225)
- Remove unused IPC async (#19223)
- Make
get_list_builder infallible (#19217)
- Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
- Make expression output type known (#19195)
- Revert "feat(python): Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149) (#19188)
- Zero-Field Structs and DataFrame with Height Property (#19123)
- Reduction -> GroupedReduction for the new streaming engine (#19176)
- Make
pl.repeat part of the IR (#19152)
- Workaround for rust-analyzer bug (#19134)
- Minor CSV bit twiddle nitpick (#19121)
- Use row encoding in asof join (#19125)
- Remove deprecated raw_entry_mut in StringCache (#19126)
- Clean remove_prefix since python3.9 is now the minimum Python (#19070)
- Eliminate some uses of deprecated raw_entry (#19102)
- Migrate to hashbrown 0.15 (#19091)
- Pin hashbrown to 0.14 until migrated (#19076)
- Add new streaming engine to CI (#19051)
- Add WithRowIndexNode to new-streaming engine (#19037)
- Add buffers to zip heads to reduce contention (#19036)
- Fix race condition in DistributorChannel (#19033)
- Mark
schema arg in read/scan_parquet as unstable (#19018)
- Fix new-streaming
test_lazy_parquet::test_row_index (#19019)
- Preserve scalar in more places (#18898)
- Mention
allow_missing_columns in error message when column not found (parquet) (#18972)
- Fix new-streaming
test_lazy_parquet::test_row_index (#18978)
- Disable CSE-specific test on new streaming engine (#18971)
- Add FixedSizeList equality broadcasting (#18967)
- Divide
ChunkCompare into Eq and Ineq variants (#18963)
- Another set of new-stream test skip/fixes (#18952)
- Fix/skip variety of new-streaming tests, cont (#18928)
- Fix/skip variety of new-streaming tests (#18924)
- Minor new-streaming test fixes (#18891)
- Make
with_column_unchecked take Column (#18863)
- Keep scalar in more places (#18775)
- Replace
DynArgs with an enum containing all its variants (#18746)
- Fix new-streaming
test_parquet::test_complex_types (#18829)
- Fix zero-length len (#18817)
- Add panic to unchecked DataFrame constructors in debug mode (#18807)
- Add missing implicit datetime alias in ExprIR (#18809)
- Fix topological sort in new streaming engine (#18806)
- Fix new-streaming parquet
test_row_index_projection_pushdown_18463 (#18805)
- Remove short-lived / non-CPU bound task spawns on async executor in new-streaming (#18764)
- Fix parquet file metadata is dropped after first DSL->IR conversion (#18789)
- Remove extra hashmap construction in new-streaming parquet (#18792)
- Fix new-streaming parquet on empty parquet (#18763)
- Ensure fallback node gets correct length df even if no columns selected (#18772)
- Fix input independence tests in new-streaming engine (#18771)
- Make DataFrame a Vec of
Column instead of Series (#18664)
- Run benchmark on PR labeled 'needs-bench' (#18737)
- Add pre-filtered decode to new parquet source (#18715)
- Allow polars to pass cargo check on windows (#18672)
- Remove unsafe in polars-json deserialize code (#18725)
Thank you to all our contributors for making this release possible!
@3ok, @Bidek56, @LukasFolwarczny, @Manishearth, @MarcoGorelli, @Plutone11011, @Rashik-raj, @adamreeve, @aleexharris, @alexander-beedie, @alonme, @balbok0, @barak1412, @beckernick, @benrutter, @bradfordlynch, @cmdlineluser, @coastalwhite, @corleyma, @corwinjoy, @deanm0000, @dependabot, @dependabot[bot], @dvillaveces, @edwinvehmaanpera, @eitsupi, @etrotta, @gab23r, @i64, @itamarst, @janscholten, @jbutterwick, @joelostblom, @kenkoooo, @kgv, @khalidmammadov, @laurentS, @max-muoto, @mcrumiller, @mscolnick, @nameexhaustion, @npielawski, @orlp, @pomo-mondreganto, @r-brink, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @squnit, @stinodego, @sunadase, @t-ded, @wakabame, @wence-, @wolfgang-noichl and @xhiroga