Light Gbm: v3.3.0 Release

Release date:
October 9, 2021
Previous version:
v3.2.1 (released April 12, 2021)
Magnitude:
17,227 Diff Delta
Contributors:
32 total committers
Data confidence:
Commits:

258 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored May 7, 2021

Top Contributors in v3.3.0

jameslamb
StrikerRUS
cyfdecyf
jmoralez
NovusEdge
david-cortes
ffineis
fabsig
sayantan1410
akshitadixit

Directory Browser for v3.3.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

Changes

πŸ’‘ New Features

  • allow inclusion in C programs @drewmiller (#4608)
  • add param aliases from scikit-learn @StrikerRUS (#4637)
  • [python] add placeholders to titles in plotting functions @StrikerRUS (#4614)
  • [python-package] Support 2d collections as input for init_score in multiclass classification task @jmoralez (#4150)
  • [python] add parameter object_hook to method dump_model @xadupre (#4533)
  • [python] support Dataset.get_data for Sequence input. @cyfdecyf (#4472)
  • [python] allow to pass some params as pathlib.Path objects @StrikerRUS (#4440)
  • [python-package] Create Dataset from multiple data files @cyfdecyf (#4089)
  • [dask] add support for eval sets and custom eval functions @ffineis (#4101)
  • Add linear leaf models to json output (fixes #4186) @btrotta (#4329)
  • [dask] run Dask tests on aarch64 architecture @StrikerRUS (#3996)
  • [python] handle arbitrary length feature names in Python-package @StrikerRUS (#4293)
  • Precise text file parsing @cyfdecyf (#4081)
  • added aliases to params @StrikerRUS (#4205)
  • [swig] add wrapper for LGBM_DatasetGetFeatureNames @shuttie (#4103)

πŸ”¨ Breaking

  • [python] deprecate "auto" value of ylabel argument of plot_metric() function @StrikerRUS (#4624)
  • [python] rename print_evaluation() into log_evaluation() @StrikerRUS (#4604)
  • [RFC][python] deprecate advanced args of train() and cv() functions and sklearn wrapper @StrikerRUS (#4574)
  • [RFC][python] deprecate silent and standalone verbose args. Prefer global verbose param @StrikerRUS (#4577)
  • [python] add 'auto' value for importance_type param in plotting @StrikerRUS (#4570)
  • [dask] Make output of feature contribution predictions for sparse matrices match those from sklearn estimators (fixes #3881) @jameslamb (#4378)
  • [R-package] change default nrounds to 100 to match LightGBM core library default @david-cortes (#4197)

πŸš€ Efficiency Improvement

  • simplify and speed up comparisons for splits with identical gains @jameslamb (#4542)
  • factor out .size() checks in GetDataType() @jameslamb (#4541)
  • consolidate duplicate conditions in TextReader @jameslamb (#4530)
  • [python] replace numpy.zeros with numpy.empty for the speedup @StrikerRUS (#4410)
  • [R-package] avoid unnecessary computation of std deviations in lgb.cv() @jameslamb (#4360)
  • Replace division of exponential in Gamma loss @lorentzenchr (#4289)

πŸ› Bug Fixes

  • [R-package] fix segfaults caused by missing Booster and Dataset handles (fixes #4208) @jameslamb (#4586)
  • move Network method implementations from network.h to network.cpp (fixes #4464) @jameslamb (#4496)
  • [R-package] prevent memory leak if pointer fails to allocate @david-cortes (#4613)
  • [R-package] Fix R memory leaks (fixes #4282, fixes #3462) @david-cortes (#4597)
  • [python][sklearn] respect eval_at aliases in keyword arguments @StrikerRUS (#4599)
  • [dask] Fixed Dask type annotation @StrikerRUS (#4558)
  • [R-package] allow construction of Dataset from CSV without header (fixes #4553) @jameslamb (#4554)
  • [R-package] fix OpenMP checking on macOS (fixes #4131) @jameslamb (#4507)
  • [R-package] pass R-configured compiler flags to checks in configure @jameslamb (#4506)
  • [R-package] use C++ compiler for pre-compile checks on Windows @jameslamb (#4504)
  • [dask] find all needed ports in each host at once (fixes #4458) @jmoralez (#4498)
  • Fix undefined behavior with NaN input in CategoricalDecision() @hcho3 (#4468)
  • [dask] determine output shape of array in predict (fixes #4285) @jmoralez (#4351)
  • [fix] fix Reservoir Sampling in Sample of random.h (fix #4371 and #4134) @shiyu1994 (#4450)
  • [CUDA] fix CUDA memory error by reducing block number (#4315) @RobinDong (#4327)
  • [R-package] fix protection stack imbalance and unprotected objects (fixes #4390) @fabsig (#4391)
  • [dask] pass additional predict() parameters through when input is a Dask Array @jameslamb (#4399)
  • fix param aliases @StrikerRUS (#4387)
  • sync for init score of binary objective function @loveclj (#4332)
  • Fix undefined behavior in ArrayArgs::Partition() when interval size is 1 (fixes #4272) @kruda (#4280)
  • Log warning instead of fatal when parsing float get under/overflow. @cyfdecyf (#4336)
  • [fix] fix Sample when sampling only one element (fix #4134) @shiyu1994 (#4324)
  • [R-package] move more finalizer logic into C++ side to address memory leaks @jameslamb (#4353)
  • [tests][python] fix f-string in test_dask.py @StrikerRUS (#4373)
  • [fix] skip empty bins when calculating cnt_in_bin in BinMapper::FindBin (fix #4301) @shiyu1994 (#4325)
  • [fix] fix GatherInfoForThresholdNumerical boundary (fix #4286) @shiyu1994 (#4322)
  • fix calculation of weighted gamma loss (fixes #4174) @mayer79 (#4283)
  • [R-package] prevent symbol lookup conflicts (fixes #4045) @jameslamb (#4155)
  • [R-package] avoid misleading warnings when using interaction constraints (fixes #4108) @jameslamb (#4232)
  • [fix] Fix bug in data distributed learning with local empty leaf @shiyu1994 (#4185)
  • fix: Dataset::CreateValid init fields which saves to binary. @cyfdecyf (#4177)

πŸ“– Documentation

  • [docs] add Mars to docs @StrikerRUS (#4616)
  • [docs] update link to MinGW-w64 site @StrikerRUS (#4606)
  • [docs] add lightgbm_ray to docs @jameslamb (#4584)
  • [docs][python] Refer to functions as callable in docstrings @StrikerRUS (#4575)
  • [R-package] fix warnings in demos @jameslamb (#4569)
  • [R-package] fix warnings in examples @jameslamb (#4568)
  • [python][docs] Refer to string type as str in docstrings @StrikerRUS (#4565)
  • [docs] add JosΓ© Morales to repo maintainers @StrikerRUS (#4563)
  • [docs] update links to SynapseML (former MMLSpark) @StrikerRUS (#4564)
  • [python][docs] Refer to string type as str and add commas in list of ... types @StrikerRUS (#4557)
  • [docs][python] Improve description of eval_result argument in record_evaluation() @StrikerRUS (#4559)
  • [doc] Add link to Neptune hyperparam tuning guide @Blaizzy (#4529)
  • [docs] Update link to daal4py in README @StrikerRUS (#4532)
  • [docs] Add notes in installation guide, including ones about OpenMP @StrikerRUS (#4520)
  • [docs] [R-package] use CRAN-style builds when building pkgdown site @jameslamb (#4513)
  • [docs] Update link to mlr3-compliant interface in README @StrikerRUS (#4509)
  • [docs] document CLI behavior when label_column is omitted @jameslamb (#4485)
  • [docs] clarify description of prediction early stopping @StrikerRUS (#4411)
  • [docs][python] add versionadded to Sequence class in Python wrapper @StrikerRUS (#4441)
  • [docs] add lleaves to README @StrikerRUS (#4431)
  • [docs] Add shapash to the list of related projects @StrikerRUS (#4408)
  • [docs] update link to LightGBM example in MMLSpark repo @StrikerRUS (#4401)
  • [docs][R-package] add authors in R-package description @StrikerRUS (#4395)
  • fix: typo in python class _InnerPredictor docstring @cyfdecyf (#4389)
  • [dask] Dask Vector types for group, init_score, sample_weights (fixes #4375) @ffineis (#4380)
  • [docs] document sanitizers @StrikerRUS (#4365)
  • [docs][python] enhance keep_training_booster param description @StrikerRUS (#4364)
  • [docs] add anchor for nightly builds in docs @StrikerRUS (#4366)
  • [docs] document how to pass multi-value params from Python and R (fixes #4345) @jameslamb (#4346)
  • [docs] make building of C++ tests section collapsable @StrikerRUS (#4340)
  • [docs] replace broken mmlspark notebook link in docs @jameslamb (#4303)
  • [docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetEvalCounts (fixes #4264) @jameslamb (#4270)
  • [docs][R-package] update docs on C++ interface @jameslamb (#4257)
  • [docs][python] update some docs related to custom objective @StrikerRUS (#4245)
  • [docs][python][scikit-learn] added note for LGBMRanker @StrikerRUS (#4243)
  • [docs] fix broken MS MPI link in Installation Guide @jameslamb (#4224)
  • [R-package] clarify parameter documentation (fixes #4193) @jameslamb (#4202)
  • [docs][R-package] Update the explanation of num_threads (fixes #4192) @issactoast (#4199)
  • [docs] add working dir to R package docker run examples @jameslamb (#4190)
  • [docs] fix markdown in docs @StrikerRUS (#4191)
  • [docs] Add changes to gcc-tips @akshitadixit (#4187)
  • [docs] bring back macOS installation method with Homebrew formula in docs @StrikerRUS (#4182)

🧰 Maintenance

  • v3.3.0 release (fixes #4310) @jameslamb (#4633)
  • fix possible precision loss in xentropy and fair loss objectives @jameslamb (#4651)
  • [tests][python-package] refactor list_to_1d_numpy test to run without pandas installed @jmoralez (#4639)
  • [python] add type hints to _safe_call @strobelTha (#4641)
  • remove unused DCGCalculator::CalDCGAtK() @jameslamb (#4650)
  • [python][sklearn] add __sklearn_is_fitted__() method to be better compatible with scikit-learn API @StrikerRUS (#4636)
  • [ci] Use the latest gcc version in macOS CI jobs @StrikerRUS (#4640)
  • remove duplicated debug printing in CMakeLists.txt for MPI @StrikerRUS (#4644)
  • remove unused BinMapper::SizeForSpecificBin() @jameslamb (#4643)
  • [ci] ignore certificates for kitware apt channel in CUDA jobs (fixes #4646) @jameslamb (#4648)
  • [ci] bump CUDA version from 11.4.0 to 11.4.2 at CI @StrikerRUS (#4628)
  • [R-package] introduce Dataset methods set_field() and get_field() @jameslamb (#4571)
  • [ci] Recover running CUDA tests at CI (fixed #4611) @shiyu1994 (#4621)
  • [ci] Run cmakelint at CI and fix some errors @StrikerRUS (#4617)
  • [python] initialize installation options with boolean values in setup.py @StrikerRUS (#4620)
  • [python] fix mypy error in dask.py @StrikerRUS (#4615)
  • [ci] Stop running CUDA tests at CI @StrikerRUS (#4611)
  • [R-package] avoid unnecessary computation and add tests for Dataset set_reference() method @jameslamb (#4587)
  • [ci] fix link to LightGBM public e-mail @StrikerRUS (#4603)
  • [tests][dask] Use workers hostname in tests (fixes #4594) @jmoralez (#4595)
  • prefer spaces to tabs in CMakeLists.txt @jameslamb (#4593)
  • [ci] skip Dask tests on QEMU builds @jameslamb (#4600)
  • [ci] simplify docker info parsing in QEMU builds @StrikerRUS (#4592)
  • [ci] explicitly set --platform when running aarch64 image in QEMU builds @jameslamb (#4579)
  • [R-package] fix inaccurate error message in Dataset get_colnames() method @jameslamb (#4588)
  • [R-package] preserve uses of '...' in Dataset slice() method @jameslamb (#4581)
  • [R-package] fix inaccurate comments, remove unnecessary comments @jameslamb (#4582)
  • [R-package] deprecate the use of 'info' in Dataset @jameslamb (#4573)
  • [R-package] deprecate uses of '...' in Dataset slice() method @jameslamb (#4572)
  • [R-package] use {testthat} SummaryReporter in tests @jameslamb (#4567)
  • [python] Use double type for init_score array when set by predictor @StrikerRUS (#4510)
  • [ci] upgrade R to 4.1.1 @jameslamb (#4560)
  • [python] add type hints on train() in engine.py @jameslamb (#4544)
  • [R-package] add deprecation warnings on uses of '...' in predict() and reset_parameter() @jameslamb (#4548)
  • [docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) @jameslamb (#4545)
  • [ci] Check for MM_PREFETCH and MM_MALLOC not only in CRAN builds @StrikerRUS (#4540)
  • [ci] Add checks that OpenMP is used in R-package builds @StrikerRUS (#4538)
  • [ci] Add checks that MM_PREFETCH and MM_MALLOC are used in CRAN builds @StrikerRUS (#4536)
  • [python] add type hints to logging functions in basic.py @jameslamb (#4527)
  • [python] add type hints in docs/conf.py @jameslamb (#4526)
  • [R-package] remove unused '...' in Booster constructor @jameslamb (#4523)
  • [R-package] add deprecation warnings about some uses of '...' @jameslamb (#4522)
  • [ci] use flag '--allow-releaseinfo-change' in some 'apt-get update' calls @jameslamb (#4524)
  • [ci] replace uses of backticks in test.sh with $() @jameslamb (#4519)
  • [ci] move Solaris and valgrind test steps into scripts @jameslamb (#4503)
  • [tests][dask] reduce number of collisions tests @jmoralez (#4501)
  • [R-package] remove unused variable R_SCRIPT in configure.win @jameslamb (#4505)
  • Update c_api LGBM_SampleIndices() comment. @cyfdecyf (#4490)
  • [R-package] quote path variables in build-cran-package.sh @jameslamb (#4499)
  • [python][tests] refactor tests with Sequence input @StrikerRUS (#4495)
  • [R-package] limit exported symbols in DLL @jameslamb (#4494)
  • [docs][ci] bump versions of R-package dependencies at RTD @StrikerRUS (#4488)
  • remove examples/.gitignore @jameslamb (#4486)
  • [python] Add type hints to helpers/parameter_generator.py @sagnik1511 (#4474)
  • [refactor] Use CreateSampleIndices() in c_api.cpp @cyfdecyf (#4478)
  • [python] parallelize MinGW make similarly to Unix make command @StrikerRUS (#4462)
  • [ci] remove preinstalled possibly conflicting software from PATH in CI jobs @StrikerRUS (#4463)
  • [ci] Add CI job running rchk on the R package (fixes #4400) @jameslamb (#4449)
  • [python] migrate to pathlib in setup.py and use absolute() on paths first @StrikerRUS (#4444)
  • [ci] add support for 8.0 and 8.6 CUDA archs @StrikerRUS (#4454)
  • [tests][python] added tests for early stop in prediction in ranking task @StrikerRUS (#4457)
  • [ci] bump CUDA version from 11.2.2 to 11.4.0 at CI @StrikerRUS (#4453)
  • [tests] clarify RuntimeError in distributed tests @StrikerRUS (#4452)
  • [python-package] use toarray() instead of todense() in tests and examples @jameslamb (#4446)
  • [python] migrate to pathlib in distributed tests @StrikerRUS (#4443)
  • [python] minor refactoring of Python code @StrikerRUS (#4442)
  • [tests][python] refactor file loading routine in C API test @StrikerRUS (#4437)
  • [tests] fix deprecation numpy warning @StrikerRUS (#4439)
  • [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136) @jameslamb (#4436)
  • [python] migrate to pathlib in python examples @StrikerRUS (#4428)
  • [python] migrate to pathlib in helper scripts @StrikerRUS (#4434)
  • [tests][cli] distributed training @jmoralez (#4254)
  • [python] migrate to pathlib in python tests @StrikerRUS (#4435)
  • [python] migrate to f-strings in interactive_plot_example.ipynb @StrikerRUS (#4430)
  • [ci] ensure interactive_plot_example notebook is run in interactive mode at CI @StrikerRUS (#4432)
  • [ci] add h5 files into .gitignore @StrikerRUS (#4429)
  • [python] migrate to pathlib in conf.py @StrikerRUS (#4427)
  • [python-package] f-string format updated in plot_example.py @amanjha8100 (#4421)
  • [python] migrate to pathlib in create_nuget.py @StrikerRUS (#4422)
  • [python-package] Add type hints to init for LGBMModel @seanytak (#4420)
  • [SWIG] fix compiler warning about unused variable in SWIG @StrikerRUS (#4419)
  • [tests] fix compiler warning about types conversion in cpp tests @StrikerRUS (#4418)
  • [dask] fix typehint on _pad_eval_names() @jameslamb (#4413)
  • [python] Add type hints to python-package/lightgbm/plotting.py @WestonKing-Leatham (#4367)
  • [tests][dask] add missing compute() in Dask test @jameslamb (#4412)
  • [tests][ci] run cpp tests with sanitizers on Linux and macOS @StrikerRUS (#4330)
  • [ci] [R-package] increase timeout on valgrind job @jameslamb (#4404)
  • [python] Improving the syntax of the fstrings in the file: .\examples\python-guide\advanced_example.py @sayantan1410 (#4386)
  • [python] Improving the syntax of prints in simple_example.py and sklearn_example.py @StrikerRUS (#4396)
  • [R-package] remove unnecessary comments @jameslamb (#4383)
  • [ci] Increase timeout value for QEMU builds @StrikerRUS (#4385)
  • [R-package] consolidate duplicate lists of Dataset info keys @jameslamb (#4381)
  • [tests] replace pytest.parametrize @StrikerRUS (#4377)
  • [ci] [R-package] add unit tests on monotone constraints @jameslamb (#4352)
  • [python] add type hints to check_dynamic_dependencies.py @greyhere (#4382)
  • [python] add type hints to python-package/setup.py @greyhere (#4376)
  • [R-package] remove defaults in internal functions @jameslamb (#4361)
  • [python] improving the syntax of the fstring in the file : tests/python_package_test/test_dask.py @sayantan1410 (#4358)
  • Updated tests/python_package_test/test_plotting.py to use f-strings @WestonKing-Leatham (#4359)
  • [R-package] remove unnecessary library() calls in tests @jameslamb (#4354)
  • [python-package] use f-strings for concatenation in examples/python-guide/logistic_regression.py @sagnik1511 (#4356)
  • [python-package] updated test_consistency.py to use f-strings @sayantan1410 (#4348)
  • [R-package] resolve test warning about is.na() and handles @jameslamb (#4341)
  • [R-package] factor out lgb.check.r6.class() @jameslamb (#4343)
  • [R-package] remove lgb.last_error() and LGBM_GetLastError_R() @jameslamb (#4344)
  • [R-package] remove unused argument in early stopping callback @jameslamb (#4342)
  • [R-package] remove uses of ... in Predictor constructor @jameslamb (#4338)
  • [R-package] remove unused code in lgb.params2str() @jameslamb (#4337)
  • [ci] upgrade R to 4.1.0 in CI @StrikerRUS (#4328)
  • [ci] cmake: remove linking to sanitizer library @cyfdecyf (#4176)
  • [ci] Increase timeout value for QEMU builds @StrikerRUS (#4326)
  • [python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py @sayantan1410 (#4312)
  • [docs][python] fix LGBMRanker docstring @StrikerRUS (#4306)
  • [python] improve error message for required packages @StrikerRUS (#4304)
  • [tests][python] Handle data types more accurate in C API test @StrikerRUS (#4297)
  • [python-package] Improve Graphviz import error message (fixes #4299) @AngelikaAntsmae (#4302)
  • [python] Handle integer types more accurate in Python-to-C interface @StrikerRUS (#4292)
  • [python] Improving the syntax of the f-strings in the file: tests/c_api_test/test.py @sayantan1410 (#4294)
  • [CUDA] Add CUDA_ARCHITECTURES to fix CMake warnings (#3754) @RobinDong (#4268)
  • [R-package] Handle integer types more accurate in R-to-C interface @StrikerRUS (#4291)
  • [R-package] suppress Wcast-function-type warning in CMake-based gcc and MinGW builds (fixes #4273) @jameslamb (#4274)
  • [python] added f-string to python-package/lightgbm/basic.py @NovusEdge (#4143)
  • [python] added f-strings to python-package/lightgbm/dask.py @NovusEdge (#4144)
  • [ci] pin dask and distributed in CI jobs @jameslamb (#4288)
  • Migrate to f-strings in python-package\lightgbm\plotting.py (#4136) @akshitadixit (#4279)
  • [python] added f-strings to helpers/parameter_generator.py @NovusEdge (#4146)
  • [python] added f-string to python-package/lightgbm/callback.py @NovusEdge (#4142)
  • [R-package] manage Dataset and Booster handles as R external pointers (fixes #3016) @jameslamb (#4265)
  • [ci][docs] Unpin Sphinx version @StrikerRUS (#4277)
  • [docs] remove extra spaces in comments and docs @jameslamb (#4269)
  • [R-package] move creation of character vectors in some methods to C++ side @jameslamb (#4256)
  • [ci][docs] Restrict Sphinx version @StrikerRUS (#4267)
  • [python] added f-strings to python-package/lightgbm/engine.py @kantajitshaw (#4258)
  • fix param name @StrikerRUS (#4253)
  • [R-package] Use R standard routines to access character data in C++ @jameslamb (#4252)
  • [ci] Delete lock.yml @StrikerRUS (#4251)
  • Correct spelling @az0 (#4250)
  • [R-package] Use R standard routines to access numeric and integer array data in C++ @jameslamb (#4247)
  • [R-package] use R standard routine to access read-only ints passed to C++ @jameslamb (#4246)
  • [R-package] move Rinternals.h closer to where it is used @jameslamb (#4248)
  • [R-package] Convert LGBM_GetLastError_R to use R built-in types @jameslamb (#4242)
  • [R-package] remove pre-allocated call_state in C++ calls @jameslamb (#4244)
  • [ci] Install graphviz system-widely @StrikerRUS (#4238)
  • show specific error message in TCP accept/send/receive logs @jameslamb (#4128)
  • [ci] [python-package] remove unused import in tests @jameslamb (#4233)
  • Fix typo in binary file already exists error message. @cyfdecyf (#4231)
  • [R-package] fix warnings in unit tests @jameslamb (#4225)
  • [python][scikit-learn] change MRO @StrikerRUS (#3192)
  • [ci][docs] Unpin Breathe version in requirements.txt @StrikerRUS (#4222)
  • [R-package] Move error handling into C++ side @jameslamb (#4163)
  • [R-package] fix grammar in comments @david-cortes (#4215)
  • [dask] Fix typo mentioned in 4101 @ffineis (#4214)
  • [ci] parallelize R package installs in CI jobs @jameslamb (#4198)
  • [python] Migrate to f-strings in python-package/lightgbm/sklearn.py @akshitadixit (#4188)
  • [R-package] Make returned feature importances from lgb.importance() visible by default @david-cortes (#4194)
  • [ci] run cpp tests at CI @StrikerRUS (#4166)
  • [ci] unpin CMake version for CUDA + Clang toolchain @StrikerRUS (#4183)
  • [ci] Restore CUDA jobs at CI @StrikerRUS (#4172)
  • [ci] Bump version for development @StrikerRUS (#4171)