Release history¶
Version 0.3¶
Changelog¶
Testing¶
- Pytest is used instead of nosetests. By Joan Massich.
Documentation¶
- Added a User Guide and extended some examples. By Guillaume Lemaitre.
Bug fixes¶
- Fixed a bug in
utils.check_ratio
such that an error is raised when the number of samples required is negative. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.NearMiss
version 3. The indices returned were wrong. By Guillaume Lemaitre. - Fixed bug for
ensemble.BalanceCascade
andcombine.SMOTEENN
andSMOTETomek
. By Guillaume Lemaitre.` - Fixed bug for check_ratio to be able to pass arguments when ratio is a callable. By Guillaume Lemaitre.`
- Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
New features¶
under_sampling.ClusterCentroids
accepts a parametervoting
allowing to use nearest-neighbors of centroids instead of centroids themselves. It is more efficient for sparse input. By Guillaume Lemaitre.- Turn off steps in
pipeline.Pipeline
using the None object. By Christos Aridas. - Add a fetching function
datasets.fetch_datasets
in order to get some imbalanced datasets useful for benchmarking. By Guillaume Lemaitre.
Enhancement¶
- Add
ensemble.BalancedBaggingClassifier
which is a meta estimator to directly use theensemble.EasyEnsemble
chained with a classifier. By Guillaume Lemaitre. - All samplers accepts sparse matrices with defaulting on CSR type. By Guillaume Lemaitre.
datasets.make_imbalance
take a ratio similarly to other samplers. It supports multiclass. By Guillaume Lemaitre.- All the unit tests have been factorized and a
utils.check_estimators
has been derived from scikit-learn. By Guillaume Lemaitre. - Script for automatic build of conda packages and uploading. By Guillaume Lemaitre
- Remove seaborn dependence and improve the examples. By Guillaume Lemaitre.
- adapt all classes to multi-class resampling. By Guillaume Lemaitre
API changes summary¶
- __init__ has been removed from the
base.SamplerMixin
to create a real mixin class. By Guillaume Lemaitre. - creation of a module
exceptions
to handle consistant raising of errors. By Guillaume Lemaitre. - creation of a module
utils.validation
to make checking of recurrent patterns. By Guillaume Lemaitre. - move the under-sampling methods in
prototype_selection
andprototype_generation
submodule to make a clearer dinstinction. By Guillaume Lemaitre. - change
ratio
such that it can adapt to multiple class problems. By Guillaume Lemaitre.
Deprecation¶
- Deprecation of the use of
min_c_
indatasets.make_imbalance
. By Guillaume Lemaitre - Deprecation of the use of float in
datasets.make_imbalance
for the ratio parameter. By Guillaume Lemaitre. - deprecate the use of float as ratio in favor of dictionary, string, or callable. By Guillaume Lemaitre.
Version 0.2¶
Changelog¶
Bug fixes¶
- Fixed a bug in
under_sampling.NearMiss
which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre. - Fixed a bug in
ensemble.EasyEnsemble
, correction of the random_state generation. By Guillaume Lemaitre and Christos Aridas. - Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours
, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.AllKNN
, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the list of indices returned. By Guillaume Lemaitre. - Fixed a bug in
ensemble.BalanceCascade
, solve the issue to obtain a single array if desired. By Guillaume Lemaitre. - Fixed a bug in
pipeline.Pipeline
, solve to embed Pipeline in other Pipeline. By Christos Aridas . - Fixed a bug in
pipeline.Pipeline
, solve the issue to put to sampler in the same Pipeline. By Christos Aridas . - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the shape of sel_x when only one sample is selected. By Aliaksei Halachkin. - Fixed a bug in
under_sampling.NeighbourhoodCleaningRule
, selecting neighbours instead of minority class misclassified samples. By Aleksandr Loskutov. - Fixed a bug in
over_sampling.ADASYN
, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. By Rafael Wampfler.
New features¶
- Added AllKNN under sampling technique. By Dayvid Oliveira.
- Added a module metrics implementing some specific scoring function for the problem of balancing. By Guillaume Lemaitre and Christos Aridas.
Enhancement¶
- Added support for bumpversion. By Guillaume Lemaitre.
- Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
- Change from cross_validation module to model_selection module for sklearn deprecation cycle. By Dayvid Oliveira and Christos Aridas.
API changes summary¶
- size_ngh has been deprecated in
combine.SMOTEENN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira . - size_ngh has been deprecated in
under_sampling.EditedNearestNeighbors
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.CondensedNeareastNeigbour
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.OneSidedSelection
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.NeighbourhoodCleaningRule
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.RepeatedEditedNearestNeighbours
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.AllKNN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - Two base classes
BaseBinaryclassSampler
andBaseMulticlassSampler
have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas. - Move random_state to be assigned in the
SamplerMixin
initialization. By Guillaume Lemaitre. - Provide estimators instead of parameters in
combine.SMOTEENN
andcombine.SMOTETomek
. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas. - k has been deprecated in
over_sampling.ADASYN
. Use n_neighbors instead. By Guillaume Lemaitre. - k and m have been deprecated in
over_sampling.SMOTE
. Use k_neighbors and m_neighbors instead. By Guillaume Lemaitre. - n_neighbors accept KNeighborsMixin based object for
under_sampling.EditedNearestNeighbors
,under_sampling.CondensedNeareastNeigbour
,under_sampling.NeighbourhoodCleaningRule
,under_sampling.RepeatedEditedNearestNeighbours
, andunder_sampling.AllKNN
. By Guillaume Lemaitre.
Documentation changes¶
- Replace some remaining UnbalancedDataset occurences. By Francois Magimel.
- Added doctest in the documentation. By Guillaume Lemaitre.
Version 0.1¶
Changelog¶
API¶
- First release of the stable API. By Fernando Nogueira, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
New methods¶
- Under-sampling
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3)
- Condensend Nearest Neighbour
- One-Sided Selection
- Neighboorhood Cleaning Rule
- Edited Nearest Neighbours
- Instance Hardness Threshold
- Repeated Edited Nearest Neighbours
- Over-sampling
- Random minority over-sampling with replacement
- SMOTE - Synthetic Minority Over-sampling Technique
- bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
- SVM SMOTE - Support Vectors SMOTE
- ADASYN - Adaptive synthetic sampling approach for imbalanced learning
- Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
- Ensemble sampling
- EasyEnsemble
- BalanceCascade