Polars: rs-0.33.0 Release

Release date:
September 17, 2023
Previous version:
rs-0.32.0 (released August 14, 2023)
Magnitude:
47,752 Diff Delta
Contributors:
47 total committers
Data confidence:
Commits:

275 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 17, 2023
Authored September 15, 2023
Authored September 15, 2023

Top Contributors in rs-0.33.0

stinodego
ritchie46
orlp
alexander-beedie
reswqa
MarcoGorelli
aminalaee
Object905
svaningelgem
henrikig

Directory Browser for rs-0.33.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸ† Highlights

  • implementing sink_csv for LazyFrame (#10682)

πŸ’₯ Breaking changes

  • empty product returns identity (#10842)
  • return f64 for rank when method="average" (#10734)
  • Rename groupby to group_by (#10654)
  • Read/write support for IPC streams in DataFrames (#10606)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

  • Rename is_first/last to is_first/last_distinct (#11130)
  • Rename count_match to count_matches (#11028)
  • Rename strip to strip_chars (#10813)
  • Add datetime_range expression function (#10213)
  • Rename Series/Expr.rolling_apply to rolling_map (#10750)

πŸš€ Performance improvements

  • improve performance of fast projection (#10945)
  • parse time zones outside of downcast_iter() in replace_time_zone (#10713)
  • use binary abstraction for atan2 (#10588)
  • use binary abstraction in pow (#10562)

✨ Enhancements

  • Expressify str.split argument. (#11117)
  • Expressify argument of binary contains (#11091)
  • dt.offset_by supports broadcasting lhs (#11095)
  • Expressify argument of binary starts_with and ends_with (#11076)
  • json_extract supports extract static and string value to list dtype (#11057)
  • add quote_style="never" option for write_csv (#11015)
  • add support for nextest (#11048)
  • Add literal for str count_match (#10996)
  • More dtypes supports cast to list (#11025)
  • ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
  • Add strip_prefix and strip_suffix to the string namespace (#10958)
  • Add datetime_range expression function (#10213)
  • add proper cache for Regex compilation (#10934)
  • implementation of array_to_string (#10839)
  • apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
  • accept expr in str.count_match (#10900)
  • accept expressions in .offset_by (#9967)
  • implement drop as special case of select (#10885)
  • Supports is_last operation (#10760)
  • activate cse for group_by (again) (#10749)
  • add pairwise float sum implementation (#10756)
  • implementing sink_csv for LazyFrame (#10682)
  • Supports series unique & arg_unique & n_unique for list (#10743)
  • repeat_by should also support broadcasting of LHS (#10735)
  • deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
  • is_first also supports numeric list type. (#10727)
  • improve slice pushdown in unions (#10723)
  • Support min and max strategy for binary & str columns fill null (#10673)
  • support broadcasting in list set operations (#10668)
  • add truncate_ragged_lines (#10660)
  • supports cast to list (#10623)
  • Rename groupby to group_by (#10654)
  • preserve whitespace in notebook output (#10644)
  • Read/write support for IPC streams in DataFrames (#10606)
  • improve binary (arity) generics (#10622)
  • propagate null is in is_in and more generic array construction (#10614)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • frame-level cast support (#10504)
  • Add failed column to cast exception (#10507)
  • Make arange an alias for int_range (#9983)
  • date_range/time_range no longer return a List type (#10526)
  • Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

  • Correct hash and fmt for struct expr (#11119)
  • enforce sortedness of by argument in rolling_* functions (#11002)
  • Filter on empty objectChunked should not throw error (#11073)
  • ensure null_count statistics accounts for null array (#11070)
  • toggle off cse if ext_context is used (#11051)
  • Correct field dtype of string concat (#11055)
  • pushed-down expr should be considered when evaluating ExternalContext (#11023)
  • fix rolling_* functions when "by" has nanosecond resolution (#11005)
  • Don't reuse member for Selector::Add (#11026)
  • fix the construction of List<Null> (#10969)
  • allow singular null in regex pattern (#10948)
  • compute length of null array in explode (#10946)
  • Allow exactly one value in start/end for int_range (#10914)
  • count was falsy tagged as cse in group by (#10917)
  • Retain original dtype when deserializing an empty list (#10893)
  • CSE don't accept opaque functions (#10905)
  • Make int_range(s) exclusive on the upper bound when step is negative (#10898)
  • fix conversion from decimal to float (#10776)
  • Add broadcasting for list comparisons (#10857)
  • don't overflow length before checking limit (#10883)
  • fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
  • tag amortized iter unsafe and add safe alternatives (#10881)
  • use pool in dataframe arithmetic (#10864)
  • remove debug println! from datetime fn (#10862)
  • repair polars_err string interpolation (#10863)
  • make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
  • empty product returns identity (#10842)
  • never panic in hash/equality doesn't hold in cse (#10836)
  • Improve bound checks on temporal ranges (#10837)
  • var/std behavior around few elements (#10828)
  • Fix divided by zero error when read empty csv in streaming mode (#10819)
  • fix equality of quantile aggregation node (#10816)
  • Reading an only-header csv file in streaming mode should not panic (#10810)
  • get_single_leaf can't handle Expr::Count (#10790)
  • string to decimal parsing (#10712)
  • support groupby literal in streaming (#10771)
  • ORDER BY on unselected columns (#10752)
  • Fix is_in cannot cast list type for float (#10769)
  • fix unicode truncation in json parsing (#10761)
  • Error message of list unique should not display inner type (#10748)
  • create chunks_mut entry in vtable (#10745)
  • Prevent panic on sample_n with replacement from empty df (#10731)
  • only preserve sortedness flag in replace_time_zone when safe (#10738)
  • Error on value_counts on column named "counts" (#10737)
  • Build Series from empty Series vector (#10558)
  • return f64 for rank when method="average" (#10734)
  • Keep min/max and arg_min/arg_max consistent. (#10716)
  • Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
  • Cast small int type when scan csv in streaming mode. (#10679)
  • Reused input series in rolling_apply should not be orderly (#10694)
  • re-sort buffer when update window swap the whole buffer (#10696)
  • Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
  • Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
  • AllHorizontal format string (#10658)
  • List<null> chunked builder should take care of series name (#10642)
  • respect 'ignore_errors=False' in csv parser (#10641)
  • fix rename + projection pushdown (#10624)
  • fix int/float downcast in is_in (#10620)
  • Change behavior of all - fix Kleene logic implementation for all/any (#10564)
  • Fix serialization for categorical chunked. (#10609)
  • join_asof missing tolerance implementation, address edge-cases (#10482)
  • Take input_schema to create physical expr for Selection (#10571)
  • fix serialization of empty lists (#10563)
  • Clear window cache after evaluate predication expr (#10505)
  • Parsing regex col in Expr::Columns (#10551)
  • sanitize column naming in boolean ops (#10531)
  • fix build for wasm (#10536)
  • remove fixed_seed and add pl.set_random_seed (#10388)
  • fix build for wasm (#9502)
  • rollback cse in groupby: python 0.18.15 (#10491)

πŸ› οΈ Other improvements

  • Removed duplicated example (#11109)
  • Add CODEOWNERS for docs folder (#11107)
  • Refactor starts_with and ends_with for string (#11085)
  • Integrate user guide (#11089)
  • remove feature gate join/groupby in polars-core (#10965)
  • Add Documentation issue type (#11042)
  • complete intra-docs in api documentation (#11007)
  • genericize take implementation (#10976)
  • genericize PolarsDataType (#10952)
  • enhance internal crates readme with reference to main crate (#10928)
  • Add Duration method for checking full days (#10850)
  • apply with_name in more places (#10899)
  • never compare opaque functions (#10906)
  • eliminate repetition in utf8 datetime functions (#10860)
  • Fix issue templates for bug reports (#10896)
  • remove LocalProjection (#10886)
  • request verbose logging output of minimal reproducable examples (#10882)
  • Reorganize range expression module (#10871)
  • introduce with_name for Series/ChunkedArray (#10859)
  • Further refactor temporal range functions (#10844)
  • Refactor range related functions (#10830)
  • Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
  • Fix some broken links / formatting (#10772)
  • Improve docs for polars-lazy (#10729)
  • update rustc nightly_2023-08-26 (#10467)
  • default to rust native flate2 lib (#10733)
  • Clear GitHub Actions caches weekly (#10715)
  • move 'is_in' to polars-ops (#10645)
  • Clean up schema calculation for date_range (#10653)
  • remove unused apply functions and add fallible generic apply functions (#10621)
  • Enforce up-to-date Cargo.lock (#10555)
  • make binary chunkedarray functions DRY (#10607)
  • bump MSRV to 1.65 (#10568)
  • genericize chunk implementation (#10506)
  • use ChunkArray::(try_)from_chunk_iter (#10497)
  • add VSCode rust-analyzer settings (#10498)
  • Update URLs for dev documentation (#10495)
  • Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible! @Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj