Systemds: 3.1.0-rc1 Release

Release date:
March 2, 2023
Previous version:
3.0.0-rc2 (released June 15, 2022)
Magnitude:
48,864 Diff Delta
Contributors:
17 total committers
Data confidence:
Commits:

215 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 16, 2022
Authored October 11, 2022
Authored January 11, 2023
Authored January 23, 2023
Authored November 11, 2022
Authored October 24, 2022
Authored February 1, 2023
Authored October 25, 2022
Authored May 17, 2022
Authored October 26, 2022
Authored September 17, 2022
Authored June 28, 2022
Authored October 19, 2022

Top Contributors in 3.1.0-rc1

Baunsgaard
kev-inn
BACtaki
mboehm7
sebwrede
MKehayov
phaniarnab
ilovemesomeramen
wedenigt
Shafaq-Siddiqi

Directory Browser for 3.1.0-rc1

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

What's Changed

Sub-task

[SYSTEMDS-2411] - Performance codegen kmeans mnist80m w/ compression [SYSTEMDS-3088] - Prefetch instruction [SYSTEMDS-3098] - Broadcast instruction [SYSTEMDS-3256] - Create apply functions for cleaning primitives [SYSTEMDS-3286] - LogicalEnumerator change with transitions concept and cleanups [SYSTEMDS-3347] - Flatten the nested loop for parallel pipelines execution [SYSTEMDS-3376] - Adding apply_pipeline() builtin for cleaning pipelines API [SYSTEMDS-3401] - Release docker images with GitHub actions [SYSTEMDS-3422] - Add monitoring tool testing workflows [SYSTEMDS-3443] - Asynchronous Execution and Persist Spark Transformations [SYSTEMDS-3466] - Future-based asynchronous execution of Spark actions [SYSTEMDS-3469] - New operator linearization order to maximize inter-operator parallelism [SYSTEMDS-3470] - Lineage-based reuse of Spark actions [SYSTEMDS-3473] - Push down rmvar instructions for asynchronous instructions [SYSTEMDS-3474] - Lineage-based reuse of asynchronous operators [SYSTEMDS-3479] - Persist and reuse Spark RDDs [SYSTEMDS-3497] - Refactor to add LOP rewrite step in compilation

Bug

[SYSTEMDS-1026] - Fix memory configuration in sparkDML.sh [SYSTEMDS-1281] - OOM Error On Binary Write [SYSTEMDS-1283] - Out of memory error [SYSTEMDS-2948] - CLA Improved Run estimation [SYSTEMDS-3045] - AttributeError: Function definition not found [SYSTEMDS-3272] - applySchema built-in to set the schema of frame from DML [SYSTEMDS-3339] - CSR TSMM left with filled rows bug [SYSTEMDS-3353] - Sparse TSMM dense row blocks CSR [SYSTEMDS-3354] - py4j.Py4JException: Method exceptionString([class org.apache.spark.SparkConf]) does not exist [SYSTEMDS-3355] - MatrixBlock size using CSR when allowed [SYSTEMDS-3379] - Federated Nan Values [SYSTEMDS-3390] - countDistinctApprox() operation in AggregateUnaryCPInstruction is inefficient for row/col aggregations [SYSTEMDS-3391] - Correct the release artifact generation date [SYSTEMDS-3394] - Log4j incompatible dependencies [SYSTEMDS-3396] - ConcurrentModificationException in federated execution [SYSTEMDS-3398] - Jackson Core missing for json writing and reading in reduced binary [SYSTEMDS-3400] - Fix Java doc warnings [SYSTEMDS-3408] - Enque output not UTF-8 python [SYSTEMDS-3409] - Read CSV directly without mtd python [SYSTEMDS-3411] - Python configuration not loading defaults [SYSTEMDS-3412] - Matrix Multiplication crash in Spark [SYSTEMDS-3414] - Pipelines failing in Hybrid execution [SYSTEMDS-3415] - Built-in tests failure in Git actions [SYSTEMDS-3416] - Cleaning Pipelines failed with No space left on device [SYSTEMDS-3417] - IndexOutOfBounds due to int overflow on replace [SYSTEMDS-3418] - Cleaning Pipelines: Replace function failure in hybrid execution [SYSTEMDS-3419] - Cleaning Pipelines: Block Sizes mismatch [SYSTEMDS-3420] - Cleaning Pipelines in hybrid mode: Invalid block dimensions error [SYSTEMDS-3424] - Federated Statistics print in non federated scenario [SYSTEMDS-3425] - Spark Aggregate Binary operations parse to Fed instruction [SYSTEMDS-3432] - FederationUtils.bindResponses causes out of memory because of sparse matrices. [SYSTEMDS-3433] - Python IDE test Docs fail [SYSTEMDS-3435] - MSVM robustness for non-existing classes [SYSTEMDS-3436] - CLA ArrayOutOfBounds in sample [SYSTEMDS-3437] - CLA Invalid Unique estimate DDC [SYSTEMDS-3439] - Federated read cache cannot be disabled [SYSTEMDS-3442] - Monitoring Heavy hitters not always correct list [SYSTEMDS-3451] - Slow Federated Mlogreg on Criteo (dummy-coded) [SYSTEMDS-3452] - Incorrect warning when reading scalars [SYSTEMDS-3476] - Spark with default settings [SYSTEMDS-3477] - Cleaning Pipelines: Task Parallel Experiments failing in spark mode [SYSTEMDS-3498] - Unique() crashes with iterator EOF on vectors with >1K distinct items [SYSTEMDS-3500] - Perftest: Mlogreg on 1M_1k_dense w/ unnecessary spark jobs [SYSTEMDS-3501] - Perftest: lmDS on 1M_1k_dense with unnecessary spark tsmm [SYSTEMDS-3503] - Java doc warnings

Epic

[SYSTEMDS-450] - Extended spark interfaces [SYSTEMDS-3445] - Combining compression schemes together [SYSTEMDS-3459] - Reorganization and cleanup of the internal representation of FrameBlocks

New Feature

[SYSTEMDS-2551] - Federated Compression Instruction [SYSTEMDS-2699] - CLA IO Compressed Matrices [SYSTEMDS-2754] - Compressed Max/Min Index support. [SYSTEMDS-2830] - Functional Compression [SYSTEMDS-3280] - Homomorphic Encryption for Federated Parameter Servers [SYSTEMDS-3303] - NN Builtin: Attention Layer [SYSTEMDS-3325] - Multi-threaded tokenization [SYSTEMDS-3337] - CLA TSMM direct multiplication [SYSTEMDS-3360] - Federated async compression [SYSTEMDS-3361] - Federated Workload-aware Compression [SYSTEMDS-3369] - Timout setting for all federated tests [SYSTEMDS-3374] - Federated primitive for transferring a local data object to a federated representation [SYSTEMDS-3404] - Synchronous with backup workers mode for Parameter Servers [SYSTEMDS-3405] - Federated Write at site [SYSTEMDS-3438] - CLA RowSlice compressed return [SYSTEMDS-3478] - Bitset array for Frames [SYSTEMDS-3481] - Frame from MatrixBlock improvement [SYSTEMDS-3493] - Python windows Install [SYSTEMDS-3494] - Python 3.9 support [SYSTEMDS-3495] - Parallel Compressed Encode

Story

[SYSTEMDS-2783] - Lineage-based reuse in federated execution [SYSTEMDS-3087] - Memory management and lazy evaluation in dynamic environments [SYSTEMDS-3463] - Add unique() built-in function

Improvement

[SYSTEMDS-295] - Unexpected order when print !boolean [SYSTEMDS-1169] - Clean Up and Automate Python Tests [SYSTEMDS-1406] - Fix whitespace issues in main algorithms [SYSTEMDS-1532] - Introduce Python scripts to launch SystemML from the Command Line [SYSTEMDS-2513] - Improve the Development and user experience on Windows [SYSTEMDS-2897] - CLA decompressing write [SYSTEMDS-3185] - Federated Multi Tenant Backend [SYSTEMDS-3187] - Add documentation for the release scripts [SYSTEMDS-3192] - Test large dense block compression [SYSTEMDS-3254] - CountDistinct Col and Row & Unique [SYSTEMDS-3282] - Upper bound for number of decoders [SYSTEMDS-3283] - Multi-threaded ctable instruction [SYSTEMDS-3319] - CLA Generalize Bin Packing [SYSTEMDS-3323] - CLA move combine of empty to estim [SYSTEMDS-3328] - Federated transform for equi-height [SYSTEMDS-3359] - Sample-based Recode Map Size Estimation [SYSTEMDS-3386] - Refactor runtime replacement of CP or SP instructions with FED instructions [SYSTEMDS-3393] - Use Java (JDK17) SIMD Implementation [SYSTEMDS-3413] - Add row/col aggregation support to countDistinct() builtin function [SYSTEMDS-3429] - Use Local Level of Parallelism when Transformencoding in Federated Mode [SYSTEMDS-3440] - Federated Requests Coordinator Hostname [SYSTEMDS-3444] - Spark Write CLA [SYSTEMDS-3446] - DDC Append [SYSTEMDS-3447] - SDC Append [SYSTEMDS-3448] - Uncompressed Append [SYSTEMDS-3449] - Const/Empty append [SYSTEMDS-3450] - DDCFOR Append [SYSTEMDS-3453] - Offsets Append [SYSTEMDS-3454] - CLA Sheme primitive [SYSTEMDS-3455] - Improved multi-threaded unary operations [SYSTEMDS-3456] - MatrixBlock equals [SYSTEMDS-3457] - MatrixBlock equals Sparse Specialization [SYSTEMDS-3458] - Add support for Spark backend to countDistinct() builtin function [SYSTEMDS-3460] - Move FrameBlock out of MatrixBlock path. [SYSTEMDS-3461] - FrameBlock Arrays separation [SYSTEMDS-3462] - FrameBlock Iterators Factory Pattern [SYSTEMDS-3464] - Python Combine Write [SYSTEMDS-3465] - Typed return on CacheBlock Interface Slice [SYSTEMDS-3467] - Add support for MULTI_BLOCK Spark backend support for countDistinct() [SYSTEMDS-3471] - Enable multi-threaded transformencode/apply [SYSTEMDS-3472] - Spark Append Frame Bug [SYSTEMDS-3475] - Spark update version 3.3.1 [SYSTEMDS-3480] - Verify release scripts with github workflows [SYSTEMDS-3482] - Parallel Hadoop IO startup [SYSTEMDS-3484] - FrameAppend optimization [SYSTEMDS-3485] - Precompile detect type patterns [SYSTEMDS-3486] - Character Array Type [SYSTEMDS-3487] - Array primitives with null [SYSTEMDS-3488] - Compressed Frame Write [SYSTEMDS-3489] - CLA Compress NaN [SYSTEMDS-3490] - Compressed Transform Encode [SYSTEMDS-3491] - CLA Specialized Column Indexes

Test

[SYSTEMDS-3395] - ColGroup Equivalence Tests [SYSTEMDS-3397] - Python NN testExample

Wish

[SYSTEMDS-3171] - GIO - Mapping from binary data

Task

[SYSTEMDS-209] - Algorithm wrappers (ml pipelines, ml context) [SYSTEMDS-563] - MR operations over frames [SYSTEMDS-3148] - Federated Performance Tests [SYSTEMDS-3228] - Builtin for k nearest neighbor graph construction [SYSTEMDS-3229] - WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform [SYSTEMDS-3241] - Federated quantile [SYSTEMDS-3291] - Apply builtin for mice [SYSTEMDS-3348] - Federated Monitoring Tool [SYSTEMDS-3496] - New builtin function auc (area under ROC curve)

Dependency upgrade

[SYSTEMDS-3375] - CUDA11 / CUDNN8 support

Documentation

[SYSTEMDS-3407] - GMM is missing docs for seed and verbose [SYSTEMDS-3434] - Python API does not include params for all kvargs

New Contributors

Full Changelog: https://github.com/apache/systemds/compare/2.2.0-rc1...3.0.0-rc2