Polars: py-1.7.0 Release

Release date:
September 11, 2024
Previous version:
py-1.6.0 (released August 29, 2024)
Magnitude:
27,000 Diff Delta
Contributors:
32 total committers
Data confidence:
Commits:

431 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 3, 2024
Authored September 4, 2024
Authored September 10, 2024
Authored August 30, 2024
Authored September 6, 2024
Authored September 4, 2024
Authored September 8, 2024
Authored August 30, 2024
Authored September 8, 2024
Authored September 3, 2024
Authored September 4, 2024
Authored September 2, 2024
Authored September 9, 2024
Authored September 11, 2024
Authored September 6, 2024
Authored August 29, 2024
Authored September 10, 2024
Authored September 10, 2024

Top Contributors in py-1.7.0

nameexhaustion
coastalwhite
orlp
ritchie46
adamreeve
stinodego
MarcoGorelli
dependabot-bot
alexander-beedie
barak1412

Directory Browser for py-1.7.0

All files are compared to previous version, py-1.6.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ† Highlights

  • Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
  • Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

πŸš€ Performance improvements

  • Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
  • Don't traverse file list twice for extension validation (#18620)
  • Remove cloning of ColumnChunkMetadata (#18615)
  • Add upfront partitioning in ColumnChunkMetadata (#18584)
  • Enable Parquet parallel=prefiltered for auto (#18514)
  • Change PlSmallStr impl from Arc<str> to compact_str (#18508)
  • Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)

✨ Enhancements

  • Update BytecodeParser for upcoming Python 3.13 (#18677)
  • Add tooltip by default to charts (#18625)
  • Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
  • Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
  • Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
  • Make expressions containing Python UDFs serializable (#18135)

🐞 Bug fixes

  • Use IO[bytes] instead of BytesIO in DataFrame.write_parquet() (#18652)
  • Scalar checks (#18627)
  • Scanning hive partitioned files where hive columns are partially included in the file (#18626)
  • Enable "polars-json/timezones" feature from "polars-io" (#18635)
  • Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
  • Properly slice validity mask on pl.Object series (#18631)
  • Raise if single argument form in replace/replace_strict is not a mapping (#18492)
  • Fix group first value after group-by slice (#18603)
  • Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
  • Fix output type for list.eval in certain cases (#18570)
  • Fix map_elements for List return dtypes (#18567)
  • Check for duplicate column names in read_database cursor result, raising DuplicateError if found (#18548)
  • Do not remove double-sort if maintain_order=True (#18561)
  • Empty any_horizontal should be false, not true (#18545)
  • Fix type inference error in map_elements for List types (#18542)
  • Address incorrect align_frames result when the alignment column contains NULL values (#18521)
  • Fix advertised version in source builds (#18523)
  • Handle Parquet projection pushdown with only row index (#18520)
  • DataFrame write_database not passing down "engine_options" when using ADBC (#18451)
  • Properly raise on invalid selector expressions (#18511)
  • Wrong output column name in or and xor operations (#18512)
  • Normalize by default in Series.entropy like Expr.entropy does (#18493)
  • Various schema corrections (#18474)
  • Don't drop objects on empty buffers (#18469)
  • Expr.sign should preserve dtype (#18446)
  • Ensure assert_frame_not_equal and assert_series_not_equal raise on mismatched input types (#18402)
  • Fixed Worksheet definition in write_excel type annotations (#18452)

πŸ“– Documentation

  • Update join_where docs to clarify behaviour (#18670)
  • Fix multiprocessing docs regarding fork method check (#18563)
  • Various docstring improvements to testing.assert_* functions (#18494)
  • Fix formula in ewm_mean_by (#18506)
  • Pre-compute plugin_path before defining plugin (#18503)
  • Add Expr.null_count to aggregations (#18459)

πŸ› οΈ Other improvements

  • Fix a bunch of tests for new-streaming (#18659)
  • Don't raise on multiple same names in ie_join (#18658)
  • Check predicates in join_where (#18648)
  • Change join_where semantics (#18640)
  • Add benchmark tests for join_where with inequalities (#18614)
  • Check number of binary comparisons in join_where predicates (#18608)
  • Raise on suffixed predicate in join_where (#18607)
  • Fix Python docs build (#18605)
  • Use streaming argument in test_parquet_slice_pushdown_non_zero_offset (#18529)
  • Fix delta test merge (#18601)
  • Alter/skip some tests for new streaming (#18574)
  • Add lower-bound pin for numba (#18555)
  • Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
  • Change PlSmallStr impl from Arc<str> to compact_str (#18508)
  • Make expressions containing Python UDFs serializable (#18135)
  • Change naming to new benchmark setup (#18473)
  • Ensure physical arguments to np ufuncs are rechunked (#18471)
  • Remove a string allocation in Parquet (#18466)
  • Remove network call in hf docs (#18454)

Thank you to all our contributors for making this release possible! @0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz