imblearn.datasets
.make_imbalance¶
-
imblearn.datasets.
make_imbalance
(X, y, ratio, min_c_=None, random_state=None, **kwargs)[source][source]¶ Turns a dataset into an imbalanced dataset at specific ratio.
A simple toy dataset to visualize clustering and classification algorithms.
Read more in the User Guide.
Parameters: X : ndarray, shape (n_samples, n_features)
Matrix containing the data to be imbalanced.
y : ndarray, shape (n_samples, )
Corresponding label for each sample in X.
ratio : str, dict, or callable, optional (default=’auto’)
Ratio to use for resampling the data set.
- If
dict
, the keys correspond to the targeted classes. The values correspond to the desired number of samples. All samples will be passed through if the class is not specified. - If callable, function taking
y
and returns adict
. The keys correspond to the targeted classes. The values correspond to the desired number of samples.
min_c_ : str or int, optional (default=None)
The identifier of the class to be the minority class. If
None
,min_c_
is set to be the current minority class. Only used whenratio
is a float for back-compatibility.Deprecated since version 0.2:
min_c_
is deprecated in 0.2 and will be removed in 0.4. Useratio
by passing adict
instead.random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
kwargs : dict, optional
Dictionary of additional keyword arguments to pass to
ratio
.Returns: X_resampled : ndarray, shape (n_samples_new, n_features)
The array containing the imbalanced data.
y_resampled : ndarray, shape (n_samples_new)
The corresponding label of X_resampled
Notes
See Multiclass classification with under-sampling, make_imbalance function, and Usage of the ratio parameter for the different algorithm.
Examples
>>> from collections import Counter >>> from sklearn.datasets import load_iris >>> from imblearn.datasets import make_imbalance
>>> data = load_iris() >>> X, y = data.data, data.target >>> print('Distribution before imbalancing: {}'.format(Counter(y))) Distribution before imbalancing: Counter({0: 50, 1: 50, 2: 50}) >>> X_res, y_res = make_imbalance(X, y, ratio={0: 10, 1: 20, 2: 30}, ... random_state=42) >>> print('Distribution after imbalancing: {}'.format(Counter(y_res))) Distribution after imbalancing: Counter({2: 30, 1: 20, 0: 10})
- If