Polars: rs-0.35.0 Release

Release date:
November 17, 2023
Previous version:
rs-0.34.0 (released October 24, 2023)
Magnitude:
27,388 Diff Delta
Contributors:
31 total committers
Data confidence:
Commits:

237 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored October 28, 2023
Authored November 10, 2023
Authored October 25, 2023
Authored November 15, 2023

Top Contributors in rs-0.35.0

stinodego
ritchie46
reswqa
nameexhaustion
alexander-beedie
MarcoGorelli
cmdlineluser
orlp
c-peters
moritzwilksch

Directory Browser for rs-0.35.0

All files are compared to previous version, rs-0.34.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ† Highlights

  • improve join performance through radix partitioned join (#12270)

πŸ’₯ Breaking changes

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate parse_int in favor of to_integer (#12464)
  • plugins add version and context (#12433)
  • Fix scan_csv error type (#12355)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Rename is_signed to is_signed_integer (#12220)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Rename ljust/rjust to pad_end/pad_start (#11975)

πŸš€ Performance improvements

  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • indexvec in group-by (#12371)
  • reduce allocations in hash join (#12368)
  • change concurrency parameters (#12321)
  • improve join performance through radix partitioned join (#12270)
  • remove extra multiplication in hash_to_partition (#12233)
  • allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)
  • improve parquet downloading (#12061)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • Allow comparison of two local categories with the same hash (#12503)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • change concurrency parameters (#12321)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • auto infer ambiguous for truncate and round (#12204)
  • Rename is_signed to is_signed_integer (#12220)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • start prefetching all files immediately (#12201)
  • Add .list.to_array expression (#12192)
  • consolidate & improve all casting failure error messages (#12168)
  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • add concurrency budget (#12117)
  • Introduce ignore_nulls for str.concat (#12108)
  • casting utf8 to temporal (#12072)
  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)
  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • uint64 should be correctly extracted from python object (#12338)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • scan emtpy csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Update null_count after arithmetic (#12280)
  • fix ambiguous aggregation type (#12269)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Raise if *_horizontal without inputs (#12106)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)
  • str.concat on empty list (#12066)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Fix padding for non-ASCII strings (#12008)
  • inline parts of unstable unicode module for stable (#12003)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)

πŸ› οΈ Other improvements

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • fix and improve ternary evaluation on groups (#12529)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Add polars-ds to list of community plugins (#12527)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Support Python 3.12 (#12094)
  • Fix some typos (#12485)
  • Deprecate parse_int in favor of to_integer (#12464)
  • update rustc (#12468)
  • rename the DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
  • Replace outdated dev dependency tempdir (#12462)
  • move cov/corr to polars-ops (#12411)
  • use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
  • dprint/markdown link checker minor updates (#12409)
  • replace as_u64 with dirty_hash (#12327)
  • Fix ruff linting invocation (#12350)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Build and verify Rust examples in docs (#12334)
  • Fix some feature flags (#12325)
  • Organize Cargo.toml (#12323)
  • remove fxhash (#12322)
  • Run rustfmt on doc examples (#12319)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • simplify expr checking in predicate push down (#12287)
  • Replace dev dependency avro-rs with apache-avro (#12295)
  • Run clippy on all targets (#12293)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • remove unwrap from group_by (#12263)
  • update object_store (#12006) (#12273)
  • Remove recommended setting from IDE docs (#12275)
  • Add feature flag for list.eval (#12254)
  • factor out some shared code in truncate_impl (#12229)
  • update Cargo.lock (#12226)
  • Make all functions in string namespace non-anonymous (#12215)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • use enum for Ambiguous (#12193)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Refactor FunctionExpr module (#12162)
  • Fix tests for pyarrow 14 (#12170)
  • Fix triggers for docs deployment (#12159)
  • Make all functions in binary namespace non-anonymous (#12126)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor improvements to the docs website (#12084)
  • reshape and repeat_by non-anoymous (#12064)
  • upgrade zstd to 0.13 in polars-parquet (#12062)
  • Direct CONTRIBUTING to the docs website (#12042)
  • inline parquet2 (#12026)
  • remove parquet logic from polars-arrow and consolidate logic in polars-parquet crate. (#12022)
  • move abs to ops (#12005)
  • Rename ljust/rjust to pad_end/pad_start (#11975)
  • Disable type checking for dataframe_api_compat dependency (#11997)

Thank you to all our contributors for making this release possible! @JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl