Note
Click here to download the full example code
Metrics specific to imbalanced learningΒΆ
Specific metrics have been developed to evaluate classifier which
has been trained using imbalanced data. imblearn
provides mainly
two additional metrics which are not implemented in sklearn
: (i)
geometric mean and (ii) index balanced accuracy.
# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
print(__doc__)
RANDOM_STATE = 42
First, we will generate some imbalanced dataset.
from sklearn.datasets import make_classification
X, y = make_classification(
n_classes=3,
class_sep=2,
weights=[0.1, 0.9],
n_informative=10,
n_redundant=1,
flip_y=0,
n_features=20,
n_clusters_per_class=4,
n_samples=5000,
random_state=RANDOM_STATE,
)
We will split the data into a training and testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify=y, random_state=RANDOM_STATE
)
We will create a pipeline made of a SMOTE
over-sampler followed by a LinearSVC
classifier.
from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.svm import LinearSVC
model = make_pipeline(
SMOTE(random_state=RANDOM_STATE), LinearSVC(random_state=RANDOM_STATE)
)
Now, we will train the model on the training set and get the prediction
associated with the testing set. Be aware that the resampling will happen
only when calling fit
: the number of samples in y_pred
is the same than
in y_test
.
Out:
/Users/glemaitre/Documents/packages/scikit-learn/sklearn/svm/_base.py:1199: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
warnings.warn(
The geometric mean corresponds to the square root of the product of the sensitivity and specificity. Combining the two metrics should account for the balancing of the dataset.
from imblearn.metrics import geometric_mean_score
print(f"The geometric mean is {geometric_mean_score(y_test, y_pred):.3f}")
Out:
The geometric mean is 0.938
The index balanced accuracy can transform any metric to be used in imbalanced learning problems.
from imblearn.metrics import make_index_balanced_accuracy
alpha = 0.1
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)
print(
f"The IBA using alpha={alpha} and the geometric mean: "
f"{geo_mean(y_test, y_pred):.3f}"
)
Out:
The IBA using alpha=0.1 and the geometric mean: 0.880
alpha = 0.5
geo_mean = make_index_balanced_accuracy(alpha=alpha, squared=True)(geometric_mean_score)
print(
f"The IBA using alpha={alpha} and the geometric mean: "
f"{geo_mean(y_test, y_pred):.3f}"
)
Out:
The IBA using alpha=0.5 and the geometric mean: 0.880
Total running time of the script: ( 0 minutes 0.161 seconds)