Polars: rs-0.37.0 Release

Release date:
January 26, 2024
Previous version:
rs-0.36.2 (released January 2, 2024)
Magnitude:
15,303 Diff Delta
Contributors:
45 total committers
Data confidence:
Commits:

277 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored January 20, 2024
Authored January 24, 2024
Authored January 15, 2024

Top Contributors in rs-0.37.0

stinodego
alexander-beedie
ritchie46
reswqa
r-brink
c-peters
Wainberg
mcrumiller
petrosbar
flisky

Directory Browser for rs-0.37.0

All files are compared to previous version, rs-0.36.2. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ† Highlights

  • new implementation for String/Binary type. (#13748)

πŸ’₯ Breaking changes

  • Remove DatetimeChunked::convert_time_zone (#14046)
  • Rename LiteralValue::to_anyvalue to LiteralValue::to_any_value (#14033)
  • Rename drop_columns to drop (#13754)
  • Rename pl.count() to pl.len() (#13719)
  • Rename row_count_name/row_count_offset parameters in IO functions to row_index_* (#13563)
  • Rename with_row_count to with_row_index (#13494)

πŸš€ Performance improvements

  • prune parquet row groups when is_not_null is used (#14260)
  • use is_between to skip parquet row groups (#14244)
  • Use a compression API that is designed for this use case (#11699) (#14194)
  • Use UnitVec in polars-plan traversal (#14199)
  • use UnitVec in streaming joins (#14197)
  • improve ChunkId (#14175)
  • improve iteration performance (#14126)
  • elide unneeded work in window? (#14108)
  • run window functions more in parallel (#14095)
  • improve skip row group using statistics condition (#14056)
  • improve string/binary reverse performance (#14016)
  • optimize DataFrame.describe by presorting columns (#13822)
  • elide redundant bound checks. (#13909)
  • speedup boolean filter (#13905)
  • speedup binview filter (#13902)
  • improve binview filter (#13878)
  • apply string view GC more conservatively (#13850)
  • add optimized BinaryViewArray comparison kernels (#13839)
  • lazy cache binview bytes len (#13830)
  • fast-path for eager int_range (#13811)
  • Optimize arr.sum for inner non-null bool (#13800)
  • directly embed data ptr in Buffer (#13744)
  • elide parallelism restriction on generic rolling expressions (#13662)
  • ensure time groups are parallelized (#13660)
  • do not eagerly compute bitcount (#13562)
  • optimise SQL engine string concat (#13499)
  • remove lifetime requirement from CategoricalChunkedBuilder (#13319)

✨ Enhancements

  • add u8/i8/u16/i16 parsers to CSV reader (#14241)
  • Implements list.gather_every (#14253)
  • Implements prefix/suffix_fields (#14251)
  • Polish decimal arithmetic (#14172)
  • Introduce arr.to_struct (#14202)
  • Supports map fields name of struct (#14203)
  • make IdxVec generic as UnitVec (#14196)
  • add new arithmetic kernels (#14026)
  • Supports unique and hash_rows for null column (#14111)
  • Implement arithmetic operations for Null columns (#14107)
  • Add strict/non-strict construction of Boolean/Binary series (#14073)
  • Improve Series::from_any_values logic (#14052)
  • Adapt extend_constant to function expr architecture and expressify it (#14058)
  • add integer negation (#14049)
  • list & array measures of dispersion (#13245)
  • gc binview when writing ipc (#14035)
  • When calling convert_time_zone on time-zone-naive datetime, convert as if converting from UTC (#13960)
  • DataFrame supports explode by array column (#13958)
  • improve binary formatting (#13981)
  • preserve Enum information when going to IPC (#13943)
  • support kwargs in plugin 'field' functions and raise error on unsupported binview layout (#13944)
  • support cast decimal to utf8 (#13829)
  • add SQL support for timestamp precision modifier (#13936)
  • support negative indexing and expressions for LEFT, RIGHT and SUBSTR SQL string funcs (#13888)
  • Introduce explode for ArrayNameSpace (#13923)
  • raise better error message for .dt.time on Date column (#13932)
  • List set_operations supports float (#13920)
  • Add ignore_nulls for arr.join (#13919)
  • register 'set_sorted' as batch/elementwise (#13896)
  • move Enum/Categorical categories to binview (#13882)
  • Add ignore_nulls for list.join (#13701)
  • Add ignore_nulls for pl.concat_str (#13877)
  • fix parquet for binview (#13873)
  • support mmap for binview in OOC (#13872)
  • implement ffi for binview (#13871)
  • Support zero fill null strategy for binary and string columns (#13869)
  • Implement/fix unary minus operator -pl.col(...) (#13776)
  • extend SQL EXTRACT with "century", "millennium", and "timezone" parts (#13634)
  • fix binview ipc format (#13842)
  • add SQL support for numeric and/or decimal types (#13739)
  • improve panic message (#13836)
  • Expressify str.zfill (#13790)
  • new implementation for String/Binary type. (#13748)
  • Add nulls_last for Series.sort (#13794)
  • Impl count_matches for array namespace (#13675)
  • Add nulls_last for list/array.sort (#13795)
  • Rename drop_columns to drop (#13754)
  • convert fixed-offset timezones to respective Etc timezone from time zone database (#13738)
  • Expressify str.slice (#13747)
  • implement binview for polars-row (#13736)
  • implement binview for polars-json (#13737)
  • add architecture for polars-flavored IPC (#13734)
  • implement binview comparison kernels (#13715)
  • raise default frame/series repr height from 8 to 10 (#13699)
  • write parquet ColumnOrder (#13672)
  • Impl contains for ArrayNameSpace (#13638)
  • improve rolling() expression formatting (#13657)
  • Implement is_between in Rust (#11945)
  • Expressify pattern of str.extract (#13607)
  • Impl join for ArrayNameSpace (#13586)
  • add SQL engine support for string cast to json (#13624)
  • add SQL engine support for EXTRACT and DATE_PART (#13603)
  • add BinaryView to parquet writer/reader. (#13489)
  • add SQL engine support for POSITION and STRPOS (#13585)
  • is_in support for array dtype (#13559)
  • add new str.find expression, returning the index of a regex pattern or literal substring (#13561)
  • add SQL engine support for LIKE and ILIKE pattern matching (#13522)
  • improve hive partition pruning (#13358) (#13426)
  • don't rechunk by default in lazy scans (#13518)
  • Add cum_count expression function (#13478)
  • add SQL engine support for IF control flow function (#13491)
  • add SQL engine support for MOD function (#13502)
  • return datetime for datetime mean & median (#13417)
  • add SQL engine support for CONCAT_WS string function (#13483)
  • BinaryView/Utf8View IPC support (#13464)
  • Implement wasm Pool::scope (#13476)
  • add SQL engine support for RIGHT and REVERSE string functions (#13461)
  • implement BinaryView and Utf8View in polars-arrow (#13243)
  • add SQL engine support for variadic string CONCAT function (#13428)
  • add support for AND in SQL join-clause context (#13242)
  • Impl ordering ops for array namespace (#13414)
  • add SQL engine support for REPLACE string function (#13431)
  • add SQL engine support for SIGN function (#13429)
  • add SQL engine support for IFNULL function (#13432)
  • additional SQL support for bytes, bit, and hex literals (#13389)

🐞 Bug fixes

  • deduplicate recursive growables (#14264)
  • Fix glimpse overload signature (#14258)
  • allow set operations on list of categoricals (#14110)
  • any/all_horizontal with single input has incorrect type (#14256)
  • load numpy array with np array values #14237 (#14238)
  • Fix join validation for String types (#14229)
  • make csv parser more robust to edge cases (#14210)
  • Fix for set_operations of binary dtype (#14152)
  • fix read_csv date/datetime inference and parsing (#14113)
  • don't see files as hive partitions (#14128)
  • allow eval on list of categoricals (#14132)
  • add missing conditional compile flag for StringFunction::Find (#14129)
  • Forbid casting from Date to Time and vice versa (#14127)
  • preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
  • Implements gt/lt cmp for null dtype (#14119)
  • ignore comments at beginning of csv if schema provided (#14115)
  • fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
  • some temporal conversion errors for datetimes earlier than 1970-01-01 (#14050)
  • Preserve name when casting from categorical (#14085)
  • fix cse bug when window function is nested (#14070)
  • Fix melt panic when there are no value vars (#14057)
  • json_encode should respect the logical type (#14063)
  • improve skip row group using statistics condition (#14056)
  • Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
  • handle SliceSink with empty data (#14025)
  • correct field type schema inference (using read_csv) (#14042)
  • Map AnyValue::Null to datatype Null (#14045)
  • Use int formatter for unsigned ints (#14043)
  • quick fix for multiple chunks binary reverse (#14024)
  • count matches on list categorical (#14021)
  • list.min/max with empty and/or None elements (#14018)
  • allow get access to list of categoricals (#14015)
  • Fix casting from categorical to numeric (#13957)
  • read_csv preserve whitespace and newlines (#13934)
  • append decimal with different scale (#13977)
  • Allow casting integer types to Enum (#13955)
  • arg_min/max on categoricals should respect ordering (#13998)
  • serialize decimal type (#13997)
  • check input type for arr/list.contains (#13959)
  • Allow dtype merge when inner dtype is enum (#13938)
  • recurse less in streaming shared sinks (#13930)
  • ensure order is preserved if streaming from different sources (#13922)
  • Fix is_not_null for Struct columns (#13921)
  • make 100 * pl.col(pl.Boolean).mean() work (#13725)
  • allow extract of numeric from str AnyValue (#13865)
  • single-element .dt.time() and .dt.date() should always preserve sortedness (#13808)
  • prune emtpy chunks before set operations (#13898)
  • treat null columns as zero in sum_horizontal (#13880)
  • include null count in rolling window validity with min_periods (#13863)
  • don't return NaN as free memory fraction (#13860)
  • parquet hybrid RLE encoding did not always align to bit width (#13883)
  • Add ignore_nulls for list.join (#13701)
  • .dt.time() was panicking for datetimes prior to unix epoch (#13812)
  • Correct err message of check_map_output_len (#13854)
  • allow list creation of decimals (#13851)
  • Implement abs for Decimal, error on Date/Time/Datetime (#13821)
  • decompress the right number of rows when reading compressed CSVs (#13721)
  • rolling nested groups deadlock (#13835)
  • gather_every should work on agg context (#13810)
  • When reading Parquet or Arrow, convert +00:00 timezone to UTC (#13816)
  • Fix segfault of is_in (#13814)
  • don't panic on full null qcut (#13815)
  • do not read data for zero-length compressed buffer (#13791)
  • Fix the non-null test of transpose (#13783)
  • Raise error instead of panic when joining on wildcard/nth (#13742)
  • str.concat correctly ignore single null value (#13751)
  • Selectors by_name and by_dtype should allow empty list as input (#11024)
  • Use NonZeroUsize for batch_size parameter in write_csv/sink_csv/scan_ndjson (#13726)
  • error instead of panicking in sql if empty function (#13691)
  • gather.get schema (#13679)
  • ensure we hit proper cache in nested rolling expressions (#13666)
  • Allow av_buffer cast numeric record to temporal type (#13661)
  • streaming cross join if swapped is hit (#13656)
  • Make sure rolling key is projected when process projection (#13622)
  • fix schema inference for json (#13637)
  • Empty series of AggregatedList should also have list dtype (#13620)
  • fallback to cast kernel if inline_cast AnyValue raise (#13595)
  • LazyFrame::join() no longer ignores 3 JoinArgs parameters (#13570)
  • fix reverse variable row decoding (#13587)
  • Fix scatter for null values (#13578)
  • Fix cum_count with regards to start value / null values (#13535)
  • Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
  • Treat Python None as null value for Object dtype (#13564)
  • Expr.replace to single value did not replace NULLs (#13551)
  • AnyValue::StructOwned panic when hashing (#13553)
  • improve hive partition pruning (#13358) (#13426)
  • fix projection pushdown for new outer join schema (#13527)
  • ensure size-hint of TrueIdxIter is correct (#13508)
  • correct 'outer_coalesce' logic in case of duplicate names (#13501)
  • raise for out-of-range datetimes in to_datetime/strptime (#13403)
  • Keep logical type when getting values from list (#13456)
  • Handle duplicate/ambiguous inputs for replace (#13217)
  • skip null/empty values if replace_lit_n_char (#13400)
  • fix is_in operator when comparing string with global categoricals (#13412)
  • use different generics for shift_and_fill parameters (#13379)

πŸ“– Documentation

  • fix code block in user-guide/lazy/schemas (#14228)
  • Fix typo in contributing guide (#14181)
  • Small improvements Ecosystem page (#14176)
  • fix code blocks in user-guide/concepts/data-structures (#14146)
  • Fix bullet point formatting in CI contributing guide (#14117)
  • Remove outdated reference to horizontal concat feature (#14105)
  • Replace alternatives page with more objective comparison (#13784)
  • Improve structure of user guide (#13951)
  • Improve structure of user guide (#13639)
  • Introduce ecosystem page in user guide (#13903)
  • Mention deltalake write support in README (#13890)
  • Fix typo in deprecation message of with_row_count (#13793)
  • Fix incorrect "coming from pandas" syntax (#13767)
  • Improve streaming section of the user guide (#13750)
  • fix linking to feature flags in user guide (#13644)
  • Improve documentation on broadcasting (#13394)
  • Add note about toolchain issue under native Windows (#13590)
  • update SQL section of the README (#13529)
  • update polars-business > polars-xdt link (#13509)

πŸ“¦ Build system

  • Enable feature nightly with optional sql feature (#14222)
  • remove horizontal_concat feature (#13390)

πŸ› οΈ Other improvements

  • make gather_chunked completely generic (#14195)
  • Add .cargo directory to .gitignore (#14191)
  • take_chunked to polars-ops (#14185)
  • Enable clippy lint to warn on debug macros (#14178)
  • Run cargo update (#14160)
  • merge take kernels (#14137)
  • improve From<Ca> -> Vec (#14123)
  • hoist boolean -> string cast (#14122)
  • Remove DatetimeChunked::convert_time_zone (#14046)
  • More generic way to present an expression tree diagram (#14020)
  • Rename LiteralValue::to_anyvalue to LiteralValue::to_any_value (#14033)
  • make Enums an actual datatype (#14011)
  • update rustc (#13947)
  • move filter to polars-compute (#13897)
  • bump object_store to 0.9 (#13857)
  • Make functions in expr/general non-anonymous (#13832)
  • Fix doctests (#13831)
  • Refactor Python release workflow (#13807)
  • Make pl.duration non-anonymous (#13762)
  • Rename pl.count() to pl.len() (#13719)
  • Deprecate dt.with_time_unit in favor of cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone)) (#13667)
  • Auto-add 'needs triage' label to bugs (#13671)
  • make rolling index column visible to optimizer (#13658)
  • Rename lazy-regex feature to regex to align polars with polars-lazy crate (#13647)
  • Add Documentation / Build system sections to the changelog (#13594)
  • Filter unhelpful messages in make build (#13579)
  • Remove extra line break between checkboxes in GitHub bug report issues (#13576)
  • Rename row_count_name/row_count_offset parameters in IO functions to row_index_* (#13563)
  • Rename with_row_count to with_row_index (#13494)
  • simplify parquet binary ordering function (#13488)
  • dont panic of ambiguous is of wrong type (#13388)

Thank you to all our contributors for making this release possible! @29antonioac, @Bromeon, @ByteNybbler, @JulianCologne, @MarcNuebel, @MarcoGorelli, @NedJWestern, @ShivMunagala, @Vincenthays, @Wainberg, @aaarrti, @alexander-beedie, @apcamargo, @bchalk101, @braaannigan, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @hamishs, @henryharbeck, @ion-elgreco, @itamarst, @jacksonthall22, @jcrozum, @kstoneriv3, @langestefan, @lukemanley, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @taki-mekhalfa, @thomasaarholt, @tim-stephenson, @universalmind303, @valorien and @wjandrea