0 Commits in this Release
Could not find any commits associated with this release.
Browse Other Releases
Top Contributors in rs-0.47.0
Could not determine top contributors for this release.
Directory Browser for rs-0.47.0
We couldn't find a release before this one
Release Notes Published
π Highlights
- Enable common subplan elimination across plans in
collect_all
(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
π₯ Breaking changes
- Make bottom interval closed in
hist
(#22090)
π Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skew
kernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefiltered
by default for new streaming (#22190) - Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
- Use views for binary hash tables and add single-key binary variant (#21872)
- Avoid rechunking in gather (#21876)
- Switch ahash for foldhash (#21852)
- Put THP behind feature flag (#21853)
- Enable THP by default (#21829)
- Improve join performance for expanding joins (#21821)
- Use binary_search instead of contains in business-day functions (#21775)
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all
(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-by
performance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.min
andlist.max
performance for logical types (#20972) - Ensure count query select minimal columns (#20923)
β¨ Enhancements
- Support grouping by
pl.Array
(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
- Highlight nodes in streaming phys plan graph (#22535)
- Support BinaryOffset serde (#22528)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
IN
andNOT IN
expressions (#22487) - Add more IRBuilder utils (#22482)
- Support
DataFrame
andSeries
init from torchTensor
objects (#22177) - Add
RoundMode
for Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
- Make streaming dispatch public (#22347)
- Add
rolling_kurtosis
(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)
to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support
implode + agg
(#22230) - Dispatch scans to new-streaming by default (#22153)
- Improved expression autocomplete for
IPython
,Jupyter
, andMarimo
(#22221) - Expose
FunctionIR::FastCount
in the python visitor (#22195) - Add
SPLIT_PART
string function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff
(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAY
function to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add support for
Int128
parsing/recognition to the SQL interface (#22104) - Allow sinking to abstract python
io
andfs
classes (#21987) - Add
add_alp_optimize_exprs
toIRBuilder
(#22061) - Add
cat.slice
(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCell
withMutex
(#21927) - Support modified dsl in file cache (#21907)
- Add support for io-plugins in new-streaming (#21870)
- Add
PartitionParted
(#21788) - Add DoubleEndedIterator for CatIter (#21816)
- Minor improvements to EXPLAIN plan output (#21822)
- Add
polars_testing
folder with relevant files andadd_series_equal!()
functionality (#21722) - Allow to use
repeat_by
with (nested) lists and structs (#21206) - Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdir
flag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSize
sink (#21573) - Implement
unpack_dtypes()
functionality with unit tests (#21574) - Support engine callback for
LazyFrame.profile
(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Add lossy decoding to
read_csv
for non-utf8 encodings (#21433) - Add 'nulls_equal' parameter to
is_in
(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}
(#21528) - IR Serde cross-filter (#21488)
- Support writing
Time
type in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionError
variant toPolarsError
inpolars-error
(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
- Implement i128 -> str cast (#21411)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Pub-licize Expr DSL Function enums (#20421)
- Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
remove
method forDataFrame
andLazyFrame
(#21259) - Expose
include_file_paths
to python visitor (#21279) - Implement
merge_sorted
for struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETE
statement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANS
in multiscan for new streaming (#21127) - Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces
(#20941) - Implement
merge_sorted
for binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schema
tonamespace
(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Improved support for KeyboardInterrupts (#20961)
- Extract timezone info from python datetimes (#20822)
- Add hint for
POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY
to error message (#20942) - Filter Parquet pages with
ParquetColumnExpr
(#20714)
π Bug fixes
- Resolve
get()
SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streaming
feature topolars
crate (#22601) - Consistently use Unix epoch as origin for
dt.truncate
(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replace
andreplace_strict
mapping use list literals (#22566) - Allow pivot on
Time
column (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
- Streaming outer join coalesce bug (#22530)
- Bug in
.unique()
followed by.slice()
(#22471) - Fix error reading parquet with datetimes written by pandas (#22524)
- Fix
schema_overrides
not taking effect in NDJSON (#22521) - Fold flags and verify scalar correctness in apply (#22519)
- Invalid values were triggering panics instead of returning
null
indt.to_date
/dt.to_datetime
(#22500) - Incorrectly dropped sort after unique for some queries (#22489)
- Fix incorrect ternary agg state with mixed columns and scalars (#22496)
- Make
replace
andreplace_strict
properly elementwise (#22465) - Fix index out of bounds panic on parquet prefiltering (#22458)
- Integer underflow when checking parquet UTF-8 (#22472)
- Add implementation for
array.get
with idx overflow (#22449) - Deprecate
str.
collection functions with flat strings and mark as elementwise (#22461) - Deprecate flat
list.gather
and mark as elementwise (#22456) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
- Reading of reencoded categorical in Parquet (#22436)
- Last thread in parquet predicate filter oob (#22429)
- Ensure at least one column projected for AnonymousScan (#22411)
- Fix rust -> python -> rust for map_batches (#22407)
- Fix chaining pl.lit(<list>) with .list.get(pl.col(...)) (#22367)
- Panic when visualizing streaming physical plan with joins (#22404)
- Fix incorrect filter after
LazyFrame.rename().select()
(#22380) - Fix
select(len())
performance regression (#22363) - Don't leak state during prefill CSE cache (#22341)
- Maintain float32 type in partitioned group-by (#22340)
- Resolve streaming panic on multiple
merge_sorted
(#22205) - Fix ndjson nested types (#22325)
- Fix nested datetypes in ndjson (#22321)
- Check matching lengths for
pl.corr
(#22305) - Move type coercion for
pl.duration
to planner (#22304) - Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
- Coalesce correct column for new streaming full join (#22301)
- Don't collect
NaN
from Parquet Statistics (#22294) - Set revmap for empty
AnyValue
toSeries
(#22293) - Add an
__all__
entry to internal type definition module (#22254) - Too-strict SQL UDF schema validation (#20202)
- Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
- Expr.over() returning List type when DataFrame is empty (#22201)
- Deprecate using
is_in
with 2 equal types and mark as elementwise (#22178) - Duplicate key column name in streaming group_by due to CSE (#22280)
- Raise
ColumnNotFoundError
for missing columns injoin_where
(#22268) - Parquet filters for logical types and operations (#22253)
- Ensure floating-point accuracy in
hist
(#22245) - Fix
str.to_integer
panics for certain inputs (#22243) - Check matching key datatypes for new streaming joins (#22247)
- Incorrect length BinaryArray/ListBuilder (#22227)
- Incorrect condition on empty inner join fast path (#22208)
- Fallback predicate filter for
min=max
withis_in
(#22213) - Don't panic for
LruCachedFunc
forsize=0
(#22215) - Writing masked out list values to json (#22210)
- Deadlock in streaming distributor (#22207)
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where
(#22112) - Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*
operations (#22163) - Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_by
as elementwise (#22068) - Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128
as supertype to boolean (#22138) - Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0
onlist.gather_every
(#22122) - Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integer
arguments (#22100) - Make bottom interval closed in
hist
(#22090) - Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted
(#21976) - Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic in
GroupBySinkState::into_source
with empty locals (#22095) - Without the new_streaming feature,
Engine::Auto
always select in memory engine (#22074) - Fix to_integer panicing on invalid base (#22052)
- Panic for unequal length
ewm_mean_by
args (#22093) - Add scalarity checks to
pl.repeat
(#22088) - Type check
n
parameter ofpl.repeat
(#22071) - Mark
bitwise_{count,leading,trailing}_{ones,zeros}
as elementwise (#22044) - Mark
pl.*_ranges
functions correctly as element-wise (#22059) - Correctly type check
pl.arctan2
(#22060) - Mark
pl.business_day_count
as elementwise (#22055) - Mark
str.to_decimal
properly as non-elementwise (#22040) - Documented return type for
bin.encode
andbin.decode
(#22022) - Output name for
AExpr::Len
(#22041) - Copy exprs from sink (#22029)
- Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()
anddt.round()
(#21982) - Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback
(#21941) - Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()
isSelectorType
instead ofExpr
(#21892) - Allow
pivot
on empty frame for all integer index dtypes (#21890) - Null panic on decimal aggregate (#21873)
- Join with categoricals on new-streaming engine (#21825)
- Fix div 0 partitioned group-by (#21842)
- Incorrect quote check in CSV parser (#21826)
- Respect header separator in
sink_csv
(#21814) - Deprecation of
streaming=False
(#21813) - Fix collect_all type-coercion (#21810)
- Memory leaks in SharedStorage (#21798)
- Make
None
refer touncompressed
insink_ipc
(#21786) - Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
- Fix error due to race condition in file cache (#21753)
- Clear NaNs due to zero-weight division in rolling var/std (#21761)
- Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
- Disallow cast from boolean to categorical/enum (#21714)
- Don't check sortedness in
join_asof
when 'by' groups supplied, but issue warning (#21724) - Incorrect multithread path taken for aggregations (#21727)
- Disallow cast to empty Enum (#21715)
- Fix
list.mean
andlist.median
returning Float64 for temporal types (#21144) - Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
- Always fallback in SkipBatchPredicate (#21711)
- New streaming multiscan deadlock (#21694)
- Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
- IO plugin; support empty iterator (#21704)
- Support nulls in multi-column sort (#21702)
- Window function check length of groups state (#21697)
- Support 128 sum reduction on new streaming (#21691)
- IPC round-trip of list of empty view with non-empty bufferset (#21671)
- Variance can never be negative (#21678)
- Incorrect loop length in new-streaming group by (#21670)
- Right join on multiple columns not coalescing left_on columns (#21669)
- Casting Struct to String panics if n_chunks > 1 (#21656)
- Fix
Future attached to different loop
error onread_database_uri
(#21641) - Fix deadlock in cache + hconcat (#21640)
- Properly handle phase transitions in row-wise sinks (#21600)
- Enable new streaming memory sinks by default (#21589)
- Always use global registry for object (#21622)
- Check enum categories when reading csv (#21619)
- Unspecialized prefiltering on nullable arrays (#21611)
- Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
- Bad null handling in unordered row encoding (#21603)
- Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
- Bad view index in BinaryViewBuilder (#21590)
- Fix CSV count with comment prefix skipped empty lines (#21577)
- New streaming IPC enum scan (#21570)
- Several aspects related to ParquetColumnExpr (#21563)
- Don't hit parquet::pre-filtered in case of pre-slice (#21565)
- Categorical min/max panicking when string cache is enabled (#21552)
- Don't encode IPC record batch twice (#21525)
- Respect rewriting flag in Node rewriter (#21516)
- Correct skip batch predicate for partial statistics (#21502)
- Make the Parquet Sink properly phase aware (#21499)
- Don't divide by zero in partitioned group-by (#21498)
- Create new linearizer between rowwise new streaming sink phases (#21490)
- Don't drop rows in sinks between new streaming phases (#21489)
- Incorrect lazy schema for
Expr.list.diff
(#21484) - Fix unwrap None panic when filtering delta with missing columns (#21453)
- Use stable sort for rolling-groupby (#21444)
- Method
dt.ordinal_day
was returning UTC results as opposed to those on the local timestamp (#21410) - Use Kahan summation for rolling sum kernels. Fix numerical stability issues (#21413)
- Add scalar checks for
n
andfill_value
parameters inshift
(#21292) - Upcast small integer dtypes for rolling sum operations (#21397)
- Don't silently produce null values from invalid input to
pl.datetime
andpl.date
(#21013) - Allow duration multiplied w/ primitive to propagate in IR schema (#21394)
- Struct arithmetic broadcasting behavior (#21382)
- Prefiltered optional plain primitive kernel (#21381)
- Panic when projecting only row index from IPC file (#21361)
- Properly update groups after
gather
in aggregation context (#21369) - Mark test as may_fail_auto_streaming (#21373)
- Properly set
fast_unique
in EnumBuilder (#21366) - Rust test race condition (#21368)
- Fix height validation in
hstack_mut
was bypassed when adding to empty frame (#21335) - Fix unequal DataFrame column heights from parquet hive scan with filter (#21340)
- Ensure ASCII ellipsis fits in column width (#21275)
- Fix ColumnNotFound error selecting
len()
after semi/anti join (#21355) - Merge Parquet nested and flat decoders (#21342)
- Incorrect atomic ordering in Connector (#21341)
- Method
dt.offset_by
was discarding month and year info if day was included in offset for timezone-aware columns (#21291) - Fix pickling
polars.col
on Python versions <3.11 (#21333) - Fix duplicate column names after join if suffix already present (#21315)
- Skip Batches Expression for boolean literals (#21310)
- Fix performance regression for eager
join_where
(#21308) - Fix incorrect predicate pushdown for predicates referring to right-join key columns (#21293)
- Panic in
to_physical
for series of arrays and lists (#21289) - Fix inconsistency between code and comment (#21294)
- Resolve deadlock due to leaking in Connector recv drop (#21296)
- Incorrect result for merge_sorted with lexical categorical (#21278)
- Add
Int128
path forjoin_asof
(#21282) - Categorical min/max returning String dtype rather than Categorical (#21232)
- Checking overflow in Sliced function (#21207)
- Adding a struct field using a literal raises InvalidOperationError (#21254)
- Return nulls for
is_finite
,is_infinite
, andis_nan
when dtype ispl.Null
(#21253) - Properly implement and test Skip Batch Predicate (#21269)
- Infinite recursion when broadcasting into struct zip_outer_validity (#21268)
- Deadlock due to bad logic in new-streaming join sampling (#21265)
- Incorrect result for top_k/bottom_k when input is sorted (#21264)
- UTF-8 validation of nested string slice in Parquet (#21262)
- Raise instead of panicking when casting a Series to a Struct with the wrong number of fields (#21213)
- Defer credential provider resolution to take place at query collection instead of construction (#21225)
- Do not panic in
strptime()
ifformat
ends with '%' (#21176) - Raise error instead of panicking for unsupported SQL operations (#20789)
- Projection of only row index in new streaming IPC (#21167)
- Fix projection count query optimization (#21162)
- Fix
Expr.over
applying scale incorrectly for Decimal types (#21140) - Fix IO plugin predicate with failed serialization (#21136)
- Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
- Restore printing backtraces on panics (#21131)
- Use microseconds for Unity catalog datetime unit (#21122)
- Fix incorrect output height for SQL
SELECT COUNT(*) FROM
(#21108) - Validate/coerce types for comparisons within join_where predicates (#21049)
- Fix minor histogram issues (#21088)
- Do not auto-init credential providers if credential fetch returns error (#21090)
- Fix
join_where
incorrectly dropping transformations on RHS of equality expressions (#21067) - Quadratic allocations when loading nested Parquet column metadata (#21050)
- Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
- Calling
top_k
on list type panics (#21043) - Fix rolling on empty DataFrame panicking (#21042)
- Fix
set_tbl_width_chars
panicking with negative width (#20906) - Fix
merge_sorted
producing incorrect results or panicking for some logical types (#21018) - Fix all-null list aggregations returning Null dtype (#20992)
- Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
- Improve SQL interface behaviour when
INTERVAL
is not a fixed duration (#20958) - Add Arrow Float16 conversion DataType (#20970)
- Feature-gate
ClosedWindow
(#20963) - Revert length check of
patterns
instr.extract_many()
(#20953) - Add maintain order for flaky new-streaming test (#20954)
- Allow for respawning of new streaming sinks (#20934)
- Ensure Function name correctness in cse (#20929)
π Documentation
- Improve
join
documentation (#22556) - Fix typo in structs page (#22504)
- Add multiplexing page (#22426)
- Improve interpolation documentation to clarify behavior of null values (#22274)
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Document
polars_stream::run_query
(#21928) - Change ctx to compute=ctx for all remote query examples (#21930)
- Add sources and sinks to user-guide (#21780)
- Add skrub to ecosystem.md (#21760)
- Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
- Update Polars Cloud interactive workflow examples (#21609)
- Add cloud api reference to Ref guide (#21566)
- Fix typo (#21554)
- Move llm page under misc (#21550)
- Polars Cloud docs (#21548)
- Fix initial selector example (#21321)
- Add pandas strictness API difference (#21312)
- Add logo to Ask AI (#21261)
- Fix docs for Catalog (#21252)
- AI widget again (#21257)
- Revert plugin (#21250)
- Add kappa ask ai widget (#21243)
- Improve Arrow key feature description (#21171)
- Correct small typo in
FileInfo
(#21150) - Correct Arrow misconception (#21053)
- Document IO plugins (#20982)
- Ensure
set_sorted
description references single-column behavior (#20709)
π¦ Build system
- Update object_store to 0.12 (#22512)
- Speed up CI by running a few more tests in parallel (#21057)
π οΈ Other improvements
- Remove confusing error context calling
.collect(_eager=True)
(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map
(#22552) - Add extend_each_repeated to builders (#22549)
- Improve zip state update (#22526)
- Don't store name/dtype in grouper (#22525)
- Add structure for dispatching iceberg to native scans (#22405)
- Make node names in new-streaming consistent with plan visualization (#22477)
- Update the cloud eligibility check (#22459)
- Remove unused reduction code (#22462)
- Pin to explicit macOS version in code coverage (#22432)
- Add test for
implode
+over
(#22437) - Fix CI by removing use_legacy_dataset (#22438)
- Only use pytorch index-url for
pytorch
package (#22355) - Install pytorch for 3.13 on Windows (#22356)
- Improve new-streaming multiscan physical plan visualization (#22415)
- Make interpolate fix more robust (#22421)
- Fix interpolate test (#22417)
- Reduce hot table size in debug mode (#22400)
- Replace intrinsic with non-intrinsic (#22401)
- Improve code re-use for groupby and delete unused code (#22382)
- Make streaming dispatch public (#22347)
- Update rustc to 'nightly-2025-04-19' (#22342)
- Update mozilla-actions/sccache-action (#22319)
- Introduce and use
UnifiedScanArgs
(#22314) - Purge old parquet and scan code (#22226)
- Add an
__all__
entry to internal type definition module (#22254) - Move missing / cast columns policies from
polars-stream
topolars-plan
(#22277) - Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
- Use more logical delta definition in var/cov calcs (#22256)
- Add Polars Cloud 0.0.7 release notes (#22223)
- Change format name from list to implode (#22240)
- Make other parallel parquet modes filter afterwards (#22228)
- Close async reader issues (#22224)
- Add BinaryArrayBuilder (#22225)
- Remove old
MultiScanExec
for in-memory (#22184) - Remove unused multiscan code (#22183)
- Separate
FunctionOptions
from DSL calls (#22133) - Dispatch new-streaming CSV source to updated multiscan (#21994)
- Dispatch new-streaming IO plugin source to updated multiscan (#22009)
- Undeprecate
backward_fill
andforward_fill
(#22156) - Dispatch new-streaming IPC source to updated multiscan (#21993)
- Schema callback for multi file reader interface (#22152)
- Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fill
andforward_fill
interface (#22083) - Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBound
andMinBound
fill strategies (#22063) - Simplify HashKeys with code re-use (#22037)
- Dispatch new-streaming Parquet source to updated multiscan (#21992)
- Dispatch new-streaming NDJSON source to updated multiscan (#21990)
- Add updated multiscan pipeline (#21925)
- Fix slicing 0-width morsels on new streaming (#21940)
- Change dynamic literals to be separate category (#21849)
- Add bridge for multi scan (#21863)
- Add POLARS_TIMEOUT_MS for timing out slow Polars tests (#21887)
- Remove
FileType
in favor ofReaderCapabilities
for new-streaming multiscan (#21881) - Disable --dist loadgroup in pytest (#21885)
- Fix refcount assert being messed up by pytest assertion magic (#21884)
- Remove
FileSink
frompolars-parquet
(#21865) - Add new
FileReader
/FileReaderBuilder
interfaces for multi file (#21839) - Mark
with_row_index_mut
as unsafe (#21841) - Add ops components for refactored multiscan (#21812)
- Add env vars to configure new-streaming buffer sizes (#21818)
- Add MorselLinearizer convenience wrapper (#21811)
- Add a slice enum to polars-utils (#21795)
- Pass around StreamingExecutionState, containing num_pipelines and ExecutionState (#21796)
- Add allocation-free empty/default to SharedStorage (#21768)
- Remove variance numerical stability hack (#21749)
- Only use chrono_tz timezones in hypothesis testing (#21721)
- Remove order check from flaky test (#21730)
- Add sinks into the DSL before optimization (#21713)
- Add missing test case for #21701 (#21709)
- Remove old-streaming from engine argument (#21667)
- Add as_phys_any to PrivateSeries for downcasting (#21696)
- Use FFI to read dataframe instead of transmute (#21673)
- Work around typos ignore bug (#21672)
- Added Test For
datetime_range
Nanosecond Overflow (#21354) - Update to edition 2024 (#21662)
- Update rustc (#21647)
- Remove
once_cell
in favor ofstd
equivalents (#21639) - Remove unused flag (#21642)
- Support object from chunks (#21636)
- Rename
block_on_potential_spawn
toblock_in_place_on
(#21627) - Push versioned docs on workflow dispatch (#21630)
- Fail docs early (#21629)
- Check major/minor in docs (#21626)
- Add docs workflow (#21624)
- Introduce
Writeable
andAsyncWriteable
(#21599) - Add test for 21581 (#21617)
- Remove even more parquet multiscan handling (#21601)
- Add freeze_reset to the builders (#21587)
- Remove multiscan handling from new streaming parquet source (#21584)
- Add opt_gather and extend_nulls to builders (#21582)
- Avoid downloading full parquet when initializing new streaming parquet source (#21580)
- Prepare skeleton for partitioning sinks (#21536)
- Add SeriesBuilder and DataFrameBuilder (#21567)
- Use a oneshot channel for
unrestricted_row_count
, fix panic in new-streaming negative slice (#21559) - Don't take ownership of IRplan in new streaming engine (#21551)
- Refactor code for re-use by streaming NDJSON source (#21520)
- Simplify the phase handling of new streaming sinks (#21530)
- Map Polars
AssertionError
to pyo3'sAssertionError
and improve macro flexibility (#21495) - Improve IPC sink node parallelism (#21505)
- Use tikv-jemallocator (#21486)
- Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
- Add remove_one to VarState for use in rolling variance (#21504)
- Move rolling to polars-compute (#21503)
- Remove Growable in favor of ArrayBuilder (#21500)
- Add ArrayBuilders (#21370)
- Introduce a Sink Node trait in the new streaming engine (#21458)
- Add test for rolling stability sort (#21456)
- Add test for empty
.is_in
predicate filter (#21455) - Test for unique length on multiple columns (#21418)
- Dispatch ChunkedArray serialization to Series (IPC) serialization (#21422)
- Refactor ordinal_day (#21416)
- Move dsl related code under dsl/* (#21367)
- Move storage of hive partitions to DataFrame (#21364)
- Feature gate merge sorted in new streaming engine (#21338)
- Remove new streaming old multiscan (#21300)
- Add tests for fixed open issues (#21185)
- Try to mimic all steps (#21249)
- Require version for POLARS_VERSION (#21248)
- Fix docs (#21246)
- Remove unused file (#21240)
- Improve hash join build sample implementation (#21236)
- Add use_field_init_shorthand = true to rustfmt (#21237)
- Don't mutate arena by default in Rewriting Visitor (#21234)
- Disable the TraceMalloc allocator (#21231)
- Use distributor channel in new-streaming CSV reader and prepare scanning routine for true parallel reading (#21189)
- Add feature gate to old streaming deprecation warning (#21179)
- Install seaborn when running remote benchmark (#21168)
- Add test for equality filters in Parquet (#21114)
- Add various tests for open issues (#21075)
- Move python dsl and builder_dsl code to dsl folder (#21077)
- Organize python related logics in polars-plan (#21070)
- Improve binary dispatch (#21061)
- Skip physical order test (#21060)
- Add force_populate_read for debugging pagefault performance problems (#21054)
- Fix new ruff lints (#21040)
- Use string-based keyboard interrupt panic detection (#21030)
- Add make fix for running cargo clippy --fix (#21024)
- Spawn threads on our rayon pool in new-streaming (#21012)
- Add tests for resolved issues (#20999)
- Update code coverage workflow to use macos-latest runners (#20995)
- Remove unnecessary unsafe around warning function (#20985)
- Remove unused arrow file (#20974)
- Remove thiserror dependency (#20979)
- Deprecate the old streaming engine (#20949)
- Move
dt.replace
tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945) - Extract merge sorted IR node (#20939)
- Update copyright year (#20764)
- Move Parquet deserialization to
BitmapBuilder
(#20896) - Also publish polars-python (#20933)
- Remove verify_dict_indices_slice from main (#20928)
Thank you to all our contributors for making this release possible! @AH-Merii, @DavideCanton, @DeflateAwning, @EnricoMi, @GaelVaroquaux, @GiovanniGiacometti, @Jacob640, @JakubValtar, @Jesse-Bakker, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @MaxJackson, @NathanHu725, @NeejWeej, @Shoeboxam, @YichiZhang0613, @aberres, @adamreeve, @alexander-beedie, @amotzop, @anath2, @arnabanimesh, @axellpadilla, @banflam, @borchero, @braaannigan, @brianmakesthings, @bschoenmaeckers, @cmdlineluser, @cnpryer, @coastalwhite, @d-reynol, @dependabot[bot], @dongchao-1, @edwinvehmaanpera, @eitsupi, @erikbrinkman, @etiennebacher, @florian-klein, @ghuls, @hemanth94, @henryharbeck, @itamarst, @jqnatividad, @jrycw, @kdn36, @kevinjqliu, @kgv, @lmmx, @lukemanley, @math-hiyoko, @mcrumiller, @mroeschke, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @r-brink, @rgertenbach, @ritchie46, @sebasv, @siddharth-vi, @silannisik, @skritsotalakis, @stijnherfst, @taureandyernv, @thomasjpfan, @wence-, @ydagosto, @yiteng-guo, @zachlefevre and dependabot[bot]