Polars: rs-0.47.0 Release

Release date:
May 5, 2025
Previous version:
Could not determine previous release version
Magnitude:
0 Diff Delta
Contributors:
0 total committers
Data confidence:
Commits:

Top Contributors in rs-0.47.0

Could not determine top contributors for this release.

Directory Browser for rs-0.47.0

We couldn't find a release before this one

Release Notes Published

πŸ† Highlights

  • Enable common subplan elimination across plans in collect_all (#21747)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Enable new streaming memory sinks by default (#21589)

πŸ’₯ Breaking changes

  • Make bottom interval closed in hist (#22090)

πŸš€ Performance improvements

  • Avoid alloc_zeroed in decompression (#22460)
  • Lower Expr.(n_)unique to group_by on streaming engine (#22420)
  • Chunk huge munmap calls (#22414)
  • Add single-key variants of streaming group_by (#22409)
  • Improve accumulate_dataframes_vertical performance (#22399)
  • Use optimize rolling_quantile with varying window sizes (#22353)
  • Dedicated rolling_skew kernel (#22333)
  • Call large munmap's in background thread (#22329)
  • New streaming group_by implementation (#22285)
  • Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
  • Turn on parallel=prefiltered by default for new streaming (#22190)
  • Add CSE to streaming groupby (#22196)
  • Speed-up new streaming predicate filtering (#22179)
  • Speedup new-streaming file row count (#22169)
  • Fix quadratic behavior when casting Enums (#22008)
  • Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
  • Fast path for empty inner join (#21965)
  • Add native semi/anti join in new streaming engine (#21937)
  • Cache regex compilation globally (#21929)
  • Use views for binary hash tables and add single-key binary variant (#21872)
  • Avoid rechunking in gather (#21876)
  • Switch ahash for foldhash (#21852)
  • Put THP behind feature flag (#21853)
  • Enable THP by default (#21829)
  • Improve join performance for expanding joins (#21821)
  • Use binary_search instead of contains in business-day functions (#21775)
  • Implement linear-time rolling_min/max (#21770)
  • Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
  • Enable common subplan elimination across plans in collect_all (#21747)
  • Allow elementwise functions in recursive lowering (#21653)
  • Add primitive single-key hashtable to new-streaming join (#21712)
  • Remove unnecessary black_boxes in Kahan summation (#21679)
  • Box large enum variants (#21657)
  • Improve join performance for new-streaming engine (#21620)
  • Pre-fill caches (#21646)
  • Optimize only a single cache input (#21644)
  • Collect parquet statistics in one contiguous buffer (#21632)
  • Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
  • Don't maintain order when maintain_order=False in new streaming sinks (#21586)
  • Pre-sort groups in group-by-dynamic (#21569)
  • Provide a fallback skip batch predicate for constant batches (#21477)
  • Parallelize the passing in new streaming multiscan (#21430)
  • Toggle projection pushdown for eager rolling (#21405)
  • Fix pathologic rolling + group-by performance and memory explosion (#21403)
  • Add sampling to new-streaming equi join to decide between build/probe side (#21197)
  • Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
  • Implement native Expr.count() on new-streaming (#21126)
  • Speed up list operations that use amortized_iter() (#20964)
  • Use Cow as output for rechunk and add rechunk_mut (#21116)
  • Reduce arrow slice mmap overhead (#21113)
  • Reduce conversion cost in chunked string gather (#21112)
  • Enable prefiltered by default for new streaming (#21109)
  • Enable parquet column expressions for streaming (#21101)
  • Deduplicate buffers again in stringview concat kernel (#21098)
  • Add dedicated concatenate kernels (#21080)
  • Rechunk only once during join probe gather (#21072)
  • Speed up from_pandas when converting frame with multi-index columns (#21063)
  • Change default memory prefetch to MADV_WILLNEED (#21056)
  • Remove cast to boolean after comparison in optimizer (#21022)
  • Split last rowgroup among all threads in new-streaming parquet reader (#21027)
  • Recombine into larger morsels in new-streaming join (#21008)
  • Improve list.min and list.max performance for logical types (#20972)
  • Ensure count query select minimal columns (#20923)

✨ Enhancements

  • Support grouping by pl.Array (#22575)
  • Preserve exception type and traceback for errors raised from Python (#22561)
  • Use fixed-width font in streaming phys plan graph (#22540)
  • Highlight nodes in streaming phys plan graph (#22535)
  • Support BinaryOffset serde (#22528)
  • Show physical stage graph (#22491)
  • Add structure for dispatching iceberg to native scans (#22405)
  • Add SQL support for checking array values with IN and NOT IN expressions (#22487)
  • Add more IRBuilder utils (#22482)
  • Support DataFrame and Series init from torch Tensor objects (#22177)
  • Add RoundMode for Decimal and Float (#22248)
  • Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
  • Make streaming dispatch public (#22347)
  • Add rolling_kurtosis (#22335)
  • Support Cast in IO plugin predicates (#22317)
  • Add .sort(nulls_last=True) to booleans, categoricals and enums (#22300)
  • Add rolling min/max for temporals (#22271)
  • Support literal:list agg (#22249)
  • Support implode + agg (#22230)
  • Dispatch scans to new-streaming by default (#22153)
  • Improved expression autocomplete for IPython, Jupyter, and Marimo (#22221)
  • Expose FunctionIR::FastCount in the python visitor (#22195)
  • Add SPLIT_PART string function to the SQL interface (#22158)
  • Allow scalar expr in Expr.diff (#22142)
  • Support additional unsigned int aliases in the SQL interface (#22127)
  • Add STRING_TO_ARRAY function to the SQL interface (#22129)
  • Add dt.is_business_day (#21776)
  • Add support for Int128 parsing/recognition to the SQL interface (#22104)
  • Allow sinking to abstract python io and fs classes (#21987)
  • Add add_alp_optimize_exprs to IRBuilder (#22061)
  • Add cat.slice (#21971)
  • Support growing schema if line lenght increases during csv schema inference (#21979)
  • Replace thread unsafe GilOnceCell with Mutex (#21927)
  • Support modified dsl in file cache (#21907)
  • Add support for io-plugins in new-streaming (#21870)
  • Add PartitionParted (#21788)
  • Add DoubleEndedIterator for CatIter (#21816)
  • Minor improvements to EXPLAIN plan output (#21822)
  • Add polars_testing folder with relevant files and add_series_equal!() functionality (#21722)
  • Allow to use repeat_by with (nested) lists and structs (#21206)
  • Add support for rolling_(sum/min/max) for booleans through casting (#21748)
  • Support multi-column sort for all nested types and nested search-sorted (#21743)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Fix replace flags (#21731)
  • Add mkdir flag to sinks (#21717)
  • Enable joins on list/array dtypes (#21687)
  • Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
  • Support all elementwise functions in IO plugin predicates (#21705)
  • Stabilize Enum datatype (#21686)
  • Support Polars int128 in from arrow (#21688)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Enable new streaming memory sinks by default (#21589)
  • Cloud support for new-streaming scans and sinks (#21621)
  • Add len method to arr (#21618)
  • Closeable files on unix (#21588)
  • Add new PartitionMaxSize sink (#21573)
  • Implement unpack_dtypes() functionality with unit tests (#21574)
  • Support engine callback for LazyFrame.profile (#21534)
  • Dispatch new-streaming CSV negative slice to separate node (#21579)
  • Add NDJSON source to new streaming engine (#21562)
  • Add lossy decoding to read_csv for non-utf8 encodings (#21433)
  • Add 'nulls_equal' parameter to is_in (#21426)
  • Improve numeric stability rolling_{std, var, cov, corr} (#21528)
  • IR Serde cross-filter (#21488)
  • Support writing Time type in json (#21454)
  • Activate all optimizations in sinks (#21462)
  • Add AssertionError variant to PolarsError in polars-error (#21460)
  • Pass filter to inner readers in multiscan new streaming (#21436)
  • Implement i128 -> str cast (#21411)
  • Version DSL (#21383)
  • Make user facing binary formats mostly self describing (#21380)
  • Filter hive files using predicates in new streaming (#21372)
  • Add negative slicing to new streaming multiscan (#21219)
  • Pub-licize Expr DSL Function enums (#20421)
  • Implement sorted flags for struct series (#21290)
  • Support reading arrow Map type from Delta (#21330)
  • Add a dedicated remove method for DataFrame and LazyFrame (#21259)
  • Expose include_file_paths to python visitor (#21279)
  • Implement merge_sorted for struct (#21205)
  • Add positive slice for new streaming MultiScan (#21191)
  • Don't take in rewriting visitor (#21212)
  • Add SQL support for the DELETE statement (#21190)
  • Add row index to new streaming multiscan (#21169)
  • Improve DataFrame fmt in explain (#21158)
  • Add projection pushdown to new streaming multiscan (#21139)
  • Implement join on struct dtype (#21093)
  • Use unique temporary directory path per user and restrict permissions (#21125)
  • Enable new streaming multiscan for CSV (#21124)
  • Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
  • Multi/Hive scans in new streaming engine (#21011)
  • Add linear_spaces (#20941)
  • Implement merge_sorted for binary (#21045)
  • Hold string cache in new streaming engine and fix row-encoding (#21039)
  • Support max/min method for Time dtype (#19815)
  • Implement a streaming merge sorted node (#20960)
  • Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
  • Add negative slice support to new-streaming engine (#21001)
  • Allow for more RG skipping by rewriting expr in planner (#20828)
  • Rename catalog schema to namespace (#20993)
  • Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
  • Improved support for KeyboardInterrupts (#20961)
  • Extract timezone info from python datetimes (#20822)
  • Add hint for POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY to error message (#20942)
  • Filter Parquet pages with ParquetColumnExpr (#20714)

🐞 Bug fixes

  • Resolve get() SchemaMismatch panic (#22350)
  • Panic in group_by_dynamic on single-row df with group_by (#22597)
  • Add new_streaming feature to polars crate (#22601)
  • Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays) (#22592)
  • Fix interpolate on dtype Decimal (#22541)
  • CSV count rows skipped last line if file did not end with newline (#22577)
  • Make nested strict casting actually strict (#22497)
  • Make replace and replace_strict mapping use list literals (#22566)
  • Allow pivot on Time column (#22550)
  • Fix error when providing CSV schema with extra columns (#22544)
  • Panic on bitwise op between Series and Expr (#22527)
  • Multi-selector regex expansion (#22542)
  • Streaming outer join coalesce bug (#22530)
  • Bug in .unique() followed by .slice() (#22471)
  • Fix error reading parquet with datetimes written by pandas (#22524)
  • Fix schema_overrides not taking effect in NDJSON (#22521)
  • Fold flags and verify scalar correctness in apply (#22519)
  • Invalid values were triggering panics instead of returning null in dt.to_date / dt.to_datetime (#22500)
  • Incorrectly dropped sort after unique for some queries (#22489)
  • Fix incorrect ternary agg state with mixed columns and scalars (#22496)
  • Make replace and replace_strict properly elementwise (#22465)
  • Fix index out of bounds panic on parquet prefiltering (#22458)
  • Integer underflow when checking parquet UTF-8 (#22472)
  • Add implementation for array.get with idx overflow (#22449)
  • Deprecate str. collection functions with flat strings and mark as elementwise (#22461)
  • Deprecate flat list.gather and mark as elementwise (#22456)
  • Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
  • Reading of reencoded categorical in Parquet (#22436)
  • Last thread in parquet predicate filter oob (#22429)
  • Ensure at least one column projected for AnonymousScan (#22411)
  • Fix rust -> python -> rust for map_batches (#22407)
  • Fix chaining pl.lit(<list>) with .list.get(pl.col(...)) (#22367)
  • Panic when visualizing streaming physical plan with joins (#22404)
  • Fix incorrect filter after LazyFrame.rename().select() (#22380)
  • Fix select(len()) performance regression (#22363)
  • Don't leak state during prefill CSE cache (#22341)
  • Maintain float32 type in partitioned group-by (#22340)
  • Resolve streaming panic on multiple merge_sorted (#22205)
  • Fix ndjson nested types (#22325)
  • Fix nested datetypes in ndjson (#22321)
  • Check matching lengths for pl.corr (#22305)
  • Move type coercion for pl.duration to planner (#22304)
  • Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
  • Coalesce correct column for new streaming full join (#22301)
  • Don't collect NaN from Parquet Statistics (#22294)
  • Set revmap for empty AnyValue to Series (#22293)
  • Add an __all__ entry to internal type definition module (#22254)
  • Too-strict SQL UDF schema validation (#20202)
  • Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
  • Expr.over() returning List type when DataFrame is empty (#22201)
  • Deprecate using is_in with 2 equal types and mark as elementwise (#22178)
  • Duplicate key column name in streaming group_by due to CSE (#22280)
  • Raise ColumnNotFoundError for missing columns in join_where (#22268)
  • Parquet filters for logical types and operations (#22253)
  • Ensure floating-point accuracy in hist (#22245)
  • Fix str.to_integer panics for certain inputs (#22243)
  • Check matching key datatypes for new streaming joins (#22247)
  • Incorrect length BinaryArray/ListBuilder (#22227)
  • Incorrect condition on empty inner join fast path (#22208)
  • Fallback predicate filter for min=max with is_in (#22213)
  • Don't panic for LruCachedFunc for size=0 (#22215)
  • Writing masked out list values to json (#22210)
  • Deadlock in streaming distributor (#22207)
  • Implode in agg (#22197)
  • Reduce GIL hold time for IO plugins in new-streaming (#22186)
  • Enhance predicate validation and cast safety in join_where (#22112)
  • Handle Parquet with compressed empty DataPage v2 (#22172)
  • Schema error during lowering (#22175)
  • Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
  • Incorrect rounding for very large/small numbers (#22173)
  • Allow set input to list.set_* operations (#22163)
  • Deadlock in join due to rayon nested task-stealing (#22159)
  • Mark Expr.repeat_by as elementwise (#22068)
  • Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
  • Raise an error if a number doesn't have associated unit in duration strings (#22035)
  • Add i128 as supertype to boolean (#22138)
  • Add broadcasts and error messages for many elementwise operations (#22130)
  • Throw error for n=0 on list.gather_every (#22122)
  • Throw error for unsupported rolling operations (#22121)
  • Error on unequal length str.to_integer arguments (#22100)
  • Make bottom interval closed in hist (#22090)
  • Avoiding panic with striptime for out-of-bounds dates (#21208)
  • Join revmaps for categoricals in merge_sorted (#21976)
  • Fix glob expansion matching extra files (#21991)
  • Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
  • Parquet filter performance regression from multiscan dispatch (#22116)
  • Panic in GroupBySinkState::into_source with empty locals (#22095)
  • Without the new_streaming feature, Engine::Auto always select in memory engine (#22074)
  • Fix to_integer panicing on invalid base (#22052)
  • Panic for unequal length ewm_mean_by args (#22093)
  • Add scalarity checks to pl.repeat (#22088)
  • Type check n parameter of pl.repeat (#22071)
  • Mark bitwise_{count,leading,trailing}_{ones,zeros} as elementwise (#22044)
  • Mark pl.*_ranges functions correctly as element-wise (#22059)
  • Correctly type check pl.arctan2 (#22060)
  • Mark pl.business_day_count as elementwise (#22055)
  • Mark str.to_decimal properly as non-elementwise (#22040)
  • Documented return type for bin.encode and bin.decode (#22022)
  • Output name for AExpr::Len (#22041)
  • Copy exprs from sink (#22029)
  • Revert #22017 and improve block(_in_place)_on doc comment (#22031)
  • Remove outdated depth warning (#22030)
  • Expression pl.concat was incorrectly marked as elementwise (#22019)
  • Use block_in_place_on to start streaming (#22017)
  • Panic on empty aggregation in streaming (#22016)
  • Error instead of panick for invalid durations in dt.offset_by() and dt.round() (#21982)
  • Raise error instead of silently appending NULL in NDJSON parsing (#21953)
  • Ensure AV is static before pushing to row buffer (#21967)
  • Deadlock in new-streaming multiplexer (#21963)
  • Release GIL in collect_with_callback (#21941)
  • Panic in new RegexCache (#21935)
  • Type hint of cs.exclude() is SelectorType instead of Expr (#21892)
  • Allow pivot on empty frame for all integer index dtypes (#21890)
  • Null panic on decimal aggregate (#21873)
  • Join with categoricals on new-streaming engine (#21825)
  • Fix div 0 partitioned group-by (#21842)
  • Incorrect quote check in CSV parser (#21826)
  • Respect header separator in sink_csv (#21814)
  • Deprecation of streaming=False (#21813)
  • Fix collect_all type-coercion (#21810)
  • Memory leaks in SharedStorage (#21798)
  • Make None refer to uncompressed in sink_ipc (#21786)
  • Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
  • Fix error due to race condition in file cache (#21753)
  • Clear NaNs due to zero-weight division in rolling var/std (#21761)
  • Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
  • Disallow cast from boolean to categorical/enum (#21714)
  • Don't check sortedness in join_asof when 'by' groups supplied, but issue warning (#21724)
  • Incorrect multithread path taken for aggregations (#21727)
  • Disallow cast to empty Enum (#21715)
  • Fix list.mean and list.median returning Float64 for temporal types (#21144)
  • Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
  • Always fallback in SkipBatchPredicate (#21711)
  • New streaming multiscan deadlock (#21694)
  • Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
  • IO plugin; support empty iterator (#21704)
  • Support nulls in multi-column sort (#21702)
  • Window function check length of groups state (#21697)
  • Support 128 sum reduction on new streaming (#21691)
  • IPC round-trip of list of empty view with non-empty bufferset (#21671)
  • Variance can never be negative (#21678)
  • Incorrect loop length in new-streaming group by (#21670)
  • Right join on multiple columns not coalescing left_on columns (#21669)
  • Casting Struct to String panics if n_chunks > 1 (#21656)
  • FixFuture attached to different loop error on read_database_uri (#21641)
  • Fix deadlock in cache + hconcat (#21640)
  • Properly handle phase transitions in row-wise sinks (#21600)
  • Enable new streaming memory sinks by default (#21589)
  • Always use global registry for object (#21622)
  • Check enum categories when reading csv (#21619)
  • Unspecialized prefiltering on nullable arrays (#21611)
  • Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
  • Bad null handling in unordered row encoding (#21603)
  • Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
  • Bad view index in BinaryViewBuilder (#21590)
  • Fix CSV count with comment prefix skipped empty lines (#21577)
  • New streaming IPC enum scan (#21570)
  • Several aspects related to ParquetColumnExpr (#21563)
  • Don't hit parquet::pre-filtered in case of pre-slice (#21565)
  • Categorical min/max panicking when string cache is enabled (#21552)
  • Don't encode IPC record batch twice (#21525)
  • Respect rewriting flag in Node rewriter (#21516)
  • Correct skip batch predicate for partial statistics (#21502)
  • Make the Parquet Sink properly phase aware (#21499)
  • Don't divide by zero in partitioned group-by (#21498)
  • Create new linearizer between rowwise new streaming sink phases (#21490)
  • Don't drop rows in sinks between new streaming phases (#21489)
  • Incorrect lazy schema for Expr.list.diff (#21484)
  • Fix unwrap None panic when filtering delta with missing columns (#21453)
  • Use stable sort for rolling-groupby (#21444)
  • Method dt.ordinal_day was returning UTC results as opposed to those on the local timestamp (#21410)
  • Use Kahan summation for rolling sum kernels. Fix numerical stability issues (#21413)
  • Add scalar checks for n and fill_value parameters in shift (#21292)
  • Upcast small integer dtypes for rolling sum operations (#21397)
  • Don't silently produce null values from invalid input to pl.datetime and pl.date (#21013)
  • Allow duration multiplied w/ primitive to propagate in IR schema (#21394)
  • Struct arithmetic broadcasting behavior (#21382)
  • Prefiltered optional plain primitive kernel (#21381)
  • Panic when projecting only row index from IPC file (#21361)
  • Properly update groups after gather in aggregation context (#21369)
  • Mark test as may_fail_auto_streaming (#21373)
  • Properly set fast_unique in EnumBuilder (#21366)
  • Rust test race condition (#21368)
  • Fix height validation in hstack_mut was bypassed when adding to empty frame (#21335)
  • Fix unequal DataFrame column heights from parquet hive scan with filter (#21340)
  • Ensure ASCII ellipsis fits in column width (#21275)
  • Fix ColumnNotFound error selecting len() after semi/anti join (#21355)
  • Merge Parquet nested and flat decoders (#21342)
  • Incorrect atomic ordering in Connector (#21341)
  • Method dt.offset_by was discarding month and year info if day was included in offset for timezone-aware columns (#21291)
  • Fix pickling polars.col on Python versions <3.11 (#21333)
  • Fix duplicate column names after join if suffix already present (#21315)
  • Skip Batches Expression for boolean literals (#21310)
  • Fix performance regression for eager join_where (#21308)
  • Fix incorrect predicate pushdown for predicates referring to right-join key columns (#21293)
  • Panic in to_physical for series of arrays and lists (#21289)
  • Fix inconsistency between code and comment (#21294)
  • Resolve deadlock due to leaking in Connector recv drop (#21296)
  • Incorrect result for merge_sorted with lexical categorical (#21278)
  • Add Int128 path for join_asof (#21282)
  • Categorical min/max returning String dtype rather than Categorical (#21232)
  • Checking overflow in Sliced function (#21207)
  • Adding a struct field using a literal raises InvalidOperationError (#21254)
  • Return nulls for is_finite, is_infinite, and is_nan when dtype is pl.Null (#21253)
  • Properly implement and test Skip Batch Predicate (#21269)
  • Infinite recursion when broadcasting into struct zip_outer_validity (#21268)
  • Deadlock due to bad logic in new-streaming join sampling (#21265)
  • Incorrect result for top_k/bottom_k when input is sorted (#21264)
  • UTF-8 validation of nested string slice in Parquet (#21262)
  • Raise instead of panicking when casting a Series to a Struct with the wrong number of fields (#21213)
  • Defer credential provider resolution to take place at query collection instead of construction (#21225)
  • Do not panic in strptime() if format ends with '%' (#21176)
  • Raise error instead of panicking for unsupported SQL operations (#20789)
  • Projection of only row index in new streaming IPC (#21167)
  • Fix projection count query optimization (#21162)
  • Fix Expr.over applying scale incorrectly for Decimal types (#21140)
  • Fix IO plugin predicate with failed serialization (#21136)
  • Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
  • Restore printing backtraces on panics (#21131)
  • Use microseconds for Unity catalog datetime unit (#21122)
  • Fix incorrect output height for SQL SELECT COUNT(*) FROM (#21108)
  • Validate/coerce types for comparisons within join_where predicates (#21049)
  • Fix minor histogram issues (#21088)
  • Do not auto-init credential providers if credential fetch returns error (#21090)
  • Fix join_where incorrectly dropping transformations on RHS of equality expressions (#21067)
  • Quadratic allocations when loading nested Parquet column metadata (#21050)
  • Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
  • Calling top_k on list type panics (#21043)
  • Fix rolling on empty DataFrame panicking (#21042)
  • Fix set_tbl_width_chars panicking with negative width (#20906)
  • Fix merge_sorted producing incorrect results or panicking for some logical types (#21018)
  • Fix all-null list aggregations returning Null dtype (#20992)
  • Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
  • Improve SQL interface behaviour when INTERVAL is not a fixed duration (#20958)
  • Add Arrow Float16 conversion DataType (#20970)
  • Feature-gate ClosedWindow (#20963)
  • Revert length check of patterns in str.extract_many() (#20953)
  • Add maintain order for flaky new-streaming test (#20954)
  • Allow for respawning of new streaming sinks (#20934)
  • Ensure Function name correctness in cse (#20929)

πŸ“– Documentation

  • Improve join documentation (#22556)
  • Fix typo in structs page (#22504)
  • Add multiplexing page (#22426)
  • Improve interpolation documentation to clarify behavior of null values (#22274)
  • Add user guide section on working with Sheets in Colab (#22161)
  • Update distributed engine docs (#22128)
  • Add Polars Cloud release notes (#22021)
  • Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
  • Fix typo (#21954)
  • Fix 'pickleable' typo in docs (#21938)
  • Document polars_stream::run_query (#21928)
  • Change ctx to compute=ctx for all remote query examples (#21930)
  • Add sources and sinks to user-guide (#21780)
  • Add skrub to ecosystem.md (#21760)
  • Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
  • Update Polars Cloud interactive workflow examples (#21609)
  • Add cloud api reference to Ref guide (#21566)
  • Fix typo (#21554)
  • Move llm page under misc (#21550)
  • Polars Cloud docs (#21548)
  • Fix initial selector example (#21321)
  • Add pandas strictness API difference (#21312)
  • Add logo to Ask AI (#21261)
  • Fix docs for Catalog (#21252)
  • AI widget again (#21257)
  • Revert plugin (#21250)
  • Add kappa ask ai widget (#21243)
  • Improve Arrow key feature description (#21171)
  • Correct small typo in FileInfo (#21150)
  • Correct Arrow misconception (#21053)
  • Document IO plugins (#20982)
  • Ensure set_sorted description references single-column behavior (#20709)

πŸ“¦ Build system

  • Update object_store to 0.12 (#22512)
  • Speed up CI by running a few more tests in parallel (#21057)

πŸ› οΈ Other improvements

  • Remove confusing error context calling .collect(_eager=True) (#22602)
  • Fix test_truncate_path test case (#22598)
  • Unify function flags into 1 bitset (#22573)
  • Display the operation behind in-memory-map (#22552)
  • Add extend_each_repeated to builders (#22549)
  • Improve zip state update (#22526)
  • Don't store name/dtype in grouper (#22525)
  • Add structure for dispatching iceberg to native scans (#22405)
  • Make node names in new-streaming consistent with plan visualization (#22477)
  • Update the cloud eligibility check (#22459)
  • Remove unused reduction code (#22462)
  • Pin to explicit macOS version in code coverage (#22432)
  • Add test for implode + over (#22437)
  • Fix CI by removing use_legacy_dataset (#22438)
  • Only use pytorch index-url for pytorch package (#22355)
  • Install pytorch for 3.13 on Windows (#22356)
  • Improve new-streaming multiscan physical plan visualization (#22415)
  • Make interpolate fix more robust (#22421)
  • Fix interpolate test (#22417)
  • Reduce hot table size in debug mode (#22400)
  • Replace intrinsic with non-intrinsic (#22401)
  • Improve code re-use for groupby and delete unused code (#22382)
  • Make streaming dispatch public (#22347)
  • Update rustc to 'nightly-2025-04-19' (#22342)
  • Update mozilla-actions/sccache-action (#22319)
  • Introduce and use UnifiedScanArgs (#22314)
  • Purge old parquet and scan code (#22226)
  • Add an __all__ entry to internal type definition module (#22254)
  • Move missing / cast columns policies from polars-stream to polars-plan (#22277)
  • Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
  • Use more logical delta definition in var/cov calcs (#22256)
  • Add Polars Cloud 0.0.7 release notes (#22223)
  • Change format name from list to implode (#22240)
  • Make other parallel parquet modes filter afterwards (#22228)
  • Close async reader issues (#22224)
  • Add BinaryArrayBuilder (#22225)
  • Remove old MultiScanExec for in-memory (#22184)
  • Remove unused multiscan code (#22183)
  • Separate FunctionOptions from DSL calls (#22133)
  • Dispatch new-streaming CSV source to updated multiscan (#21994)
  • Dispatch new-streaming IO plugin source to updated multiscan (#22009)
  • Undeprecate backward_fill and forward_fill (#22156)
  • Dispatch new-streaming IPC source to updated multiscan (#21993)
  • Schema callback for multi file reader interface (#22152)
  • Handle conversion of Duration specially in pyir (#22101)
  • Deprecate duplicate backward_fill and forward_fill interface (#22083)
  • Solve clippy lints for 1.86 (#22102)
  • Remove rust exclusive MaxBound and MinBound fill strategies (#22063)
  • Simplify HashKeys with code re-use (#22037)
  • Dispatch new-streaming Parquet source to updated multiscan (#21992)
  • Dispatch new-streaming NDJSON source to updated multiscan (#21990)
  • Add updated multiscan pipeline (#21925)
  • Fix slicing 0-width morsels on new streaming (#21940)
  • Change dynamic literals to be separate category (#21849)
  • Add bridge for multi scan (#21863)
  • Add POLARS_TIMEOUT_MS for timing out slow Polars tests (#21887)
  • Remove FileType in favor of ReaderCapabilities for new-streaming multiscan (#21881)
  • Disable --dist loadgroup in pytest (#21885)
  • Fix refcount assert being messed up by pytest assertion magic (#21884)
  • Remove FileSink from polars-parquet (#21865)
  • Add new FileReader / FileReaderBuilder interfaces for multi file (#21839)
  • Mark with_row_index_mut as unsafe (#21841)
  • Add ops components for refactored multiscan (#21812)
  • Add env vars to configure new-streaming buffer sizes (#21818)
  • Add MorselLinearizer convenience wrapper (#21811)
  • Add a slice enum to polars-utils (#21795)
  • Pass around StreamingExecutionState, containing num_pipelines and ExecutionState (#21796)
  • Add allocation-free empty/default to SharedStorage (#21768)
  • Remove variance numerical stability hack (#21749)
  • Only use chrono_tz timezones in hypothesis testing (#21721)
  • Remove order check from flaky test (#21730)
  • Add sinks into the DSL before optimization (#21713)
  • Add missing test case for #21701 (#21709)
  • Remove old-streaming from engine argument (#21667)
  • Add as_phys_any to PrivateSeries for downcasting (#21696)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Work around typos ignore bug (#21672)
  • Added Test For datetime_range Nanosecond Overflow (#21354)
  • Update to edition 2024 (#21662)
  • Update rustc (#21647)
  • Remove once_cell in favor of std equivalents (#21639)
  • Remove unused flag (#21642)
  • Support object from chunks (#21636)
  • Rename block_on_potential_spawn to block_in_place_on (#21627)
  • Push versioned docs on workflow dispatch (#21630)
  • Fail docs early (#21629)
  • Check major/minor in docs (#21626)
  • Add docs workflow (#21624)
  • Introduce Writeable and AsyncWriteable (#21599)
  • Add test for 21581 (#21617)
  • Remove even more parquet multiscan handling (#21601)
  • Add freeze_reset to the builders (#21587)
  • Remove multiscan handling from new streaming parquet source (#21584)
  • Add opt_gather and extend_nulls to builders (#21582)
  • Avoid downloading full parquet when initializing new streaming parquet source (#21580)
  • Prepare skeleton for partitioning sinks (#21536)
  • Add SeriesBuilder and DataFrameBuilder (#21567)
  • Use a oneshot channel for unrestricted_row_count, fix panic in new-streaming negative slice (#21559)
  • Don't take ownership of IRplan in new streaming engine (#21551)
  • Refactor code for re-use by streaming NDJSON source (#21520)
  • Simplify the phase handling of new streaming sinks (#21530)
  • Map Polars AssertionError to pyo3's AssertionError and improve macro flexibility (#21495)
  • Improve IPC sink node parallelism (#21505)
  • Use tikv-jemallocator (#21486)
  • Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
  • Add remove_one to VarState for use in rolling variance (#21504)
  • Move rolling to polars-compute (#21503)
  • Remove Growable in favor of ArrayBuilder (#21500)
  • Add ArrayBuilders (#21370)
  • Introduce a Sink Node trait in the new streaming engine (#21458)
  • Add test for rolling stability sort (#21456)
  • Add test for empty .is_in predicate filter (#21455)
  • Test for unique length on multiple columns (#21418)
  • Dispatch ChunkedArray serialization to Series (IPC) serialization (#21422)
  • Refactor ordinal_day (#21416)
  • Move dsl related code under dsl/* (#21367)
  • Move storage of hive partitions to DataFrame (#21364)
  • Feature gate merge sorted in new streaming engine (#21338)
  • Remove new streaming old multiscan (#21300)
  • Add tests for fixed open issues (#21185)
  • Try to mimic all steps (#21249)
  • Require version for POLARS_VERSION (#21248)
  • Fix docs (#21246)
  • Remove unused file (#21240)
  • Improve hash join build sample implementation (#21236)
  • Add use_field_init_shorthand = true to rustfmt (#21237)
  • Don't mutate arena by default in Rewriting Visitor (#21234)
  • Disable the TraceMalloc allocator (#21231)
  • Use distributor channel in new-streaming CSV reader and prepare scanning routine for true parallel reading (#21189)
  • Add feature gate to old streaming deprecation warning (#21179)
  • Install seaborn when running remote benchmark (#21168)
  • Add test for equality filters in Parquet (#21114)
  • Add various tests for open issues (#21075)
  • Move python dsl and builder_dsl code to dsl folder (#21077)
  • Organize python related logics in polars-plan (#21070)
  • Improve binary dispatch (#21061)
  • Skip physical order test (#21060)
  • Add force_populate_read for debugging pagefault performance problems (#21054)
  • Fix new ruff lints (#21040)
  • Use string-based keyboard interrupt panic detection (#21030)
  • Add make fix for running cargo clippy --fix (#21024)
  • Spawn threads on our rayon pool in new-streaming (#21012)
  • Add tests for resolved issues (#20999)
  • Update code coverage workflow to use macos-latest runners (#20995)
  • Remove unnecessary unsafe around warning function (#20985)
  • Remove unused arrow file (#20974)
  • Remove thiserror dependency (#20979)
  • Deprecate the old streaming engine (#20949)
  • Move dt.replace tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945)
  • Extract merge sorted IR node (#20939)
  • Update copyright year (#20764)
  • Move Parquet deserialization to BitmapBuilder (#20896)
  • Also publish polars-python (#20933)
  • Remove verify_dict_indices_slice from main (#20928)

Thank you to all our contributors for making this release possible! @AH-Merii, @DavideCanton, @DeflateAwning, @EnricoMi, @GaelVaroquaux, @GiovanniGiacometti, @Jacob640, @JakubValtar, @Jesse-Bakker, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @MaxJackson, @NathanHu725, @NeejWeej, @Shoeboxam, @YichiZhang0613, @aberres, @adamreeve, @alexander-beedie, @amotzop, @anath2, @arnabanimesh, @axellpadilla, @banflam, @borchero, @braaannigan, @brianmakesthings, @bschoenmaeckers, @cmdlineluser, @cnpryer, @coastalwhite, @d-reynol, @dependabot[bot], @dongchao-1, @edwinvehmaanpera, @eitsupi, @erikbrinkman, @etiennebacher, @florian-klein, @ghuls, @hemanth94, @henryharbeck, @itamarst, @jqnatividad, @jrycw, @kdn36, @kevinjqliu, @kgv, @lmmx, @lukemanley, @math-hiyoko, @mcrumiller, @mroeschke, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @r-brink, @rgertenbach, @ritchie46, @sebasv, @siddharth-vi, @silannisik, @skritsotalakis, @stijnherfst, @taureandyernv, @thomasjpfan, @wence-, @ydagosto, @yiteng-guo, @zachlefevre and dependabot[bot]