Polars: rs-0.27.0 Release

Release date:
February 10, 2023
Previous version:
rs-0.26.0 (released December 22, 2022)
Magnitude:
29,959 Diff Delta
Contributors:
27 total committers
Data confidence:
Commits:

241 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored January 18, 2023

Top Contributors in rs-0.27.0

stinodego
ritchie46
alexander-beedie
MarcoGorelli
zundertj
ghuls
plaflamme
cojmeister
gab23r
universalmind303

Directory Browser for rs-0.27.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

🏆 Highlights

  • Formalize list aggregation difference between groupbys, selection and window functions (#6487)

⚠️ Breaking changes

  • error on string <-> date cmp (#6498)
  • Formalize list aggregation difference between groupbys, selection and window functions (#6487)
  • show where error messages originated (#6482)
  • str.strip with multiple chars (#5929)

🚀 Performance improvements

  • update string replacement codepaths following new benchmarking (#6777)
  • improve dynamic groupby performance on sorted keys (#6599)
  • faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
  • Improve rechunk check (#6268)
  • reuse allocated scratches in ipc writer (#6287)
  • use dedicated writer thread for sink_parquet (#6285)
  • first check rev-map on categorical equality check (#6085)
  • ensure set_at_idx is O(1) (#5977)
  • use iterator instead of loop polars_io::csv::parser::skip_condition (#5157)

✨ Enhancements

  • accept separator for pivot and to_dummies (#6780)
  • feat(rust, python) rename 'tz' to 'time_zone' in convert_time_zone and replace_time_zone (#6784)
  • rename with_time_zone to convert_time_zone and cast_time_zone to replace_time_zone (#6768)
  • support timezone in csv writer (#6722)
  • implement series abstractions for Int128Type (#6679)
  • parse timezone from Datetime (#6766)
  • formally support duration division (#6758)
  • add argmin/max for utf8 data (#6746)
  • Support an ignore_nulls param for EWM calculations. (#5749) (#6742)
  • deprecate tz_localize (#6693)
  • guarantee schema-stable col(dtype) selection (#6674)
  • better-characterise NotFound exceptions (#6670)
  • disallow with_time_zone from/to tz-naive (#6659)
  • let cast_time_zone work on tz-naive and deprecate tz-localize (#6649)
  • implement fill_null for list data (#6635)
  • expression functions should be nullable (#6629)
  • add streamable udfs (#6614)
  • is_first for struct dtype (#6595)
  • Added from_str_radix method to StringNameSpace that allows to parse strings from any base to i32 (#6570)
  • improve predicate pushdown (#6579)
  • raise error on invalid binary cmp (#6564)
  • let cast_time_zone accept None (#6539)
  • add utc parameter to strptime (#6496)
  • add meta 'has_multiple_outputs', 'is_regex_projec… (#6500)
  • error on string <-> date cmp (#6498)
  • show where error messages originated (#6482)
  • faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
  • allow expr in str.contains (#6443)
  • add float formatting option (#6432)
  • allow expressions as arguments in str.ends_with (#6361)
  • accept expr in str.starts_with (#6355)
  • add strict parameter to decoding expressions (#6342)
  • allow unordered struct creating from anyvalues (#6321)
  • parse abbrev month name (#6314)
  • add dt.combine for combining date and time components (#6121)
  • add sink_ipc (#6286)
  • ensure ooc sort works ooc with all-constant values (#6235)
  • The 1 billion row sort (#6156)
  • optionally treat missing UTF8 values as the empty string at CSV parse-time (#6203)
  • When moving error out of LogicalPlan, leave behind String with error message instead of None (#6199)
  • generalize the cloud storage builders (#5972)
  • Implement DataFrame.unique(keep="none") (#6169)
  • add arr.take expression (#6116)
  • allow extend_constant to work with date literals (#6114)
  • allow nested categorical cast (#6113)
  • add a rounded_corners modifier to pl.Config.set_tbl_formatting (#6108)
  • Get polars to compile to wasm target (#6050)
  • add search_sorted for arrays and utf8 dtype (#6083)
  • improve error message when writing nested data to… (#6040)
  • updated default table format from "UTF8_FULL" to "UTF8_FULL_CONDENSED" (#5967)
  • str.strip with multiple chars (#5929)
  • support glob in parquet object_storage (#5928)
  • read decimal as f64 (#5938)
  • improve query plan scan formatting (#5937)
  • allow all null cast (#5933)
  • truncate by calendar weeks (#5759)
  • merge sorted dataframes (#5817)
  • impl hex and base64 for binary (#5892)
  • streaming parquet from object_stores (#5871)

🐞 Bug fixes

  • always rechunk if n_chunks > n_rows (#6786)
  • fix ndjson empty array parsing (#6785)
  • make some list expressions aware of groupby context (#6776)
  • use explicit drop function node (#6769)
  • don't set sorted flag if we reverse sort the left … (#6772)
  • handle edge-case with string-literal replacement when the replace value looks like a capture group (#6765)
  • respect skip_rows in glob parsing csv (#6754)
  • Improve error message in DataFrame constructor (#6715)
  • arrow map dtype conversion (#6732)
  • dedicated rename implementation. (#6688)
  • return correct display/repr names for NaN-related expressions (#6721)
  • strftime with time zone directive (#6673)
  • improve error message in date_range with invalid units (#6671)
  • remove uses of rayon global thread pool (#6682)
  • true-divide output type (#6665)
  • fix(rust, python) cast to and from fixed offsets (#6602)
  • raise error on string numeric arithmetic (#6601)
  • partially assert sortedness in groupby dynamic (#6593)
  • fix(rust, python); raise oob if negative index given to take (#6590)
  • fix predicate pushdown key check (#6577)
  • fix schema of apply with many inputs on empty df (#6571)
  • let lhs determine struct order in supertype (#6572)
  • fix(rust, python) validate utc, fmt, and tz-aware in strptime (#6550)
  • add strptime to filter boundary (#6560)
  • list eval all null array (#6545)
  • implement ser/de for BinaryChunked (#6543)
  • raise if tz_localize called on UTC-aware (#6526)
  • make concat_list group aware (#6527)
  • error on invalid expanding expression (#6521)
  • create from dicts directly as struct categorical (#6520)
  • fix oob in arr.get by expressions (#6519)
  • fix cse schema (#6518)
  • panic when max_len -1 is reached (#6494)
  • Formalize list aggregation difference between groupbys, selection and window functions (#6487)
  • fix(rust, python) validate tz in with_time_zone (#6417)
  • faster frame-init from list of dicts (when omitting fields), and ensure fields are read according to the declared schema (#6472)
  • use consistent floor division for floats/ints (#6460)
  • split semi/anti join optimization (#6459)
  • fix doc comment in ParallelStrategy (#6444)
  • fix projection pushdown on double semi join (#6440)
  • cumulative_eval ensure output dtype is respected (#6435)
  • auto-detect %+ as tz-aware (#6434)
  • correct error message in cast_time_zone (#6411)
  • only use float simd on specific alignment (#6427)
  • no early escape when window is equal to len in rolling_float (#6408)
  • raise error on invalid sort_by argument (#6382)
  • take offset into account with str.explode (#6384)
  • Return empty batch for pl.read_csv_batched().next_… (#6381)
  • implement ser/de for StructChunked (#6359)
  • series of empty structs (#6347)
  • don't cast nulls before trying normal cast (#6339)
  • expand all nested wildcards in functions (#6334)
  • fix groupby rolling by_key if groups are empty (#6333)
  • parse abbrev month name (#6314)
  • disallow alias in inline join expressions (#6312)
  • feature flag "get_sink" ipc (#6306)
  • block proj-pd and pred-pd on swapping rename (#6303)
  • convert nested dictionary with i64 keys (#6299)
  • fix panic dynamic_groupby on empty dataframe (#6294)
  • Parse negative dates with polars parser (#6256)
  • Add list inner dtype when printing Series (#6233)
  • fix when then otherwise with arity and aggregation… (#6224)
  • pass name to value counts in aggregation (#6221)
  • don't set fast_explode on list of structs (#6220)
  • explode of empty nullable list (#6190)
  • fix empty streaming joins (#6149)
  • fix streaming joins where the join order has been … (#6143)
  • write tz-aware datetimes to csv (#6135)
  • Print error message on mmap IPC file only in verbose mode (#6098)
  • fix invalid dtype in chunked array after struct cast (#6093)
  • don't run cse cache_states if no projections found (#6087)
  • Update read_csv error message (#6082)
  • propogate nulls in binary arithmetic/aggregation (#6076)
  • deal with unnest schema expansion in projection pd (#6063)
  • correct output dtype for cummin/cumsum/cummax (#6062)
  • block streaming on literal series/range (#6058)
  • ndjson struct inference (#6049)
  • deal with empty structs (#6039)
  • fix aggregation that filters out all data (#6036)
  • fix diff overflow (#6033)
  • keep column names in is_null/is_not_null (#6032)
  • keep name when sorting categorical in lexial order (#6029)
  • properly set null anyvalue if categorical is neste… (#6025)
  • make weekday tz-aware (#5989)
  • fix categorical in struct anyvalue issue (#5987)
  • fix invalid boolean simplification (#5976)
  • allow empty sort on any dtype (#5975)
  • properly deal with categoricals in streaming queries (#5974)
  • don't panic on ignored context (#5958)
  • don't allow named expression in arr.eval (#5957)
  • fix panic in join expressions (#5954)
  • block ordered predicates before explode (#5951)
  • adhere to schema in arr.eval of empty list (#5947)
  • fix arrow nested null conversion (#5946)
  • allow None in arr.slice length (#5934)
  • fix time to duration cast (#5932)
  • error on addition with datetime/time (#5931)
  • don't create categoricals in streaming (#5926)
  • object filter should keep single chunk (#5913)
  • csv, read escaped "" as missing (#5912)
  • fix pivot of signed integers (#5909)
  • fix latest oob in streaming convertion (#5902)
  • fix date + duration offsets outside of nanosecond datetime bounds (#5889)
  • adapt k to len in topk (#5888)

🛠️ Other improvements

  • propagate error in date_range with invalid time zone (#6759)
  • update arrow to 0.16 (#6748)
  • remove unreachable path in write_anyvalue (#6727)
  • add groupby_dynamic to docs (#6725)
  • chore(rust) disallow chunked datetime with_time_zone on tznaive, remove unnecessary with_time_zone (#6681)
  • update Required Rust version to 1.58->1.62 (#6680)
  • add test for raising error in apply (#6664)
  • Minor documentation fix (#6657)
  • Add release flow info to contributing guide (#6480)
  • address todo and use regex in tz_aware check (#6479)
  • Address chrono deprecation warnings (#6478)
  • fix doc comment in ParallelStrategy (#6444)
  • move binary to polars-ops (#6401)
  • fix a typo in csv read example (#6389)
  • remove roundtrip to builder (#6383)
  • update rustc to 2023-01-19 (#6341)
  • run cse optimization only if joins and caches… (#6337)
  • update base64 requirement from 0.13 to 0.21 (#6249)
  • Remove benches and criterion dependency (#6273)
  • update chrono-tz requirement from 0.6 to 0.8 (#6255)
  • Enable Dependabot (#5036)
  • Add missing feature attributes for csv-file (#6229)
  • don't set aggregated flag on null propagated aggregation. (#6191)
  • Revert "Use auto_doc_cfg" (#6164)
  • remove vertical take (#6112)
  • add single threaded sort internally (#6103)
  • mark from_chunks as unsafe (#6094)
  • replace exact instances of Option/Result combinators (#6088)
  • ensure reverse indices exist in global string cache (#5970)
  • refactored describe (#5922)
  • don't decode into utf8 (#5910)
  • remove unused deps (#5903)

Thank you to all our contributors for making this release possible! @2-5, @AnatolyBuga, @ChayimFriedman2, @MarceColl, @MarcoGorelli, @MatveyF, @abalkin, @alexander-beedie, @c-peters, @cannero, @chitralverma, @cojmeister, @dannyvankooten, @dependabot, @dependabot[bot], @flowlight0, @gab23r, @gam-phon, @ghuls, @gitkwr, @huitseeker, @jgmartin, @jjerphan, @johngunerli, @josh, @jvanbuel, @n8henrie, @ozgrakkurt, @papparapa, @phaile2, @plaflamme, @rben01, @ritchie46, @romanovacca, @ropoctl, @sorhawell, @stinodego, @universalmind303, @winding-lines, @yuntai and @zundertj