Note
Click here to download the full example code
Metrics specific to imbalanced learningΒΆ
Specific metrics have been developed to evaluate classifier which
has been trained using imbalanced data. imbalanced_ensemble
provides mainly
two additional metrics which are not implemented in sklearn
: (i)
geometric mean (imbalanced_ensemble.metrics.geometric_mean_score()
)
and (ii) index balanced accuracy (imbalanced_ensemble.metrics.make_index_balanced_accuracy()
).
# Adapted from imbalanced-learn
# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
print(__doc__)
RANDOM_STATE = 42
First, we will generate some imbalanced dataset.
from sklearn.datasets import make_classification
X, y = make_classification(
n_classes=3,
class_sep=2,
weights=[0.1, 0.9],
n_informative=10,
n_redundant=1,
flip_y=0,
n_features=20,
n_clusters_per_class=4,
n_samples=5000,
random_state=RANDOM_STATE,
)
We will split the data into a training and testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify=y, random_state=RANDOM_STATE
)
We will create a pipeline made of a SMOTE
over-sampler followed by a LinearSVC
classifier.
from imbalanced_ensemble.pipeline import make_pipeline
from imbalanced_ensemble.sampler.over_sampling import SMOTE
from sklearn.svm import LinearSVC
model = make_pipeline(
SMOTE(random_state=RANDOM_STATE), LinearSVC(random_state=RANDOM_STATE)
)
Now, we will train the model on the training set and get the prediction associated with the testing set. Be aware that the resampling will happen only when calling fit: the number of samples in y_pred is the same than in y_test.
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Out:
C:\Softwares\Anaconda3\lib\site-packages\sklearn\svm\_base.py:985: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
warnings.warn("Liblinear failed to converge, increase "
The geometric mean corresponds to the square root of the product of the sensitivity and specificity. Combining the two metrics should account for the balancing of the dataset.
from imbalanced_ensemble.metrics import geometric_mean_score
print(f"The geometric mean is {geometric_mean_score(y_test, y_pred):.3f}")
Out:
The geometric mean is 0.938
The index balanced accuracy can transform any metric to be used in imbalanced learning problems.
from imbalanced_ensemble.metrics import make_index_balanced_accuracy
alpha = 0.1
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)
print(
f"The IBA using alpha={alpha} and the geometric mean: "
f"{geo_mean(y_test, y_pred):.3f}"
)
Out:
The IBA using alpha=0.1 and the geometric mean: 0.880
alpha = 0.5
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)
print(
f"The IBA using alpha={alpha} and the geometric mean: "
f"{geo_mean(y_test, y_pred):.3f}"
)
Out:
The IBA using alpha=0.5 and the geometric mean: 0.880
Total running time of the script: ( 1 minutes 5.239 seconds)
Estimated memory usage: 13 MB