scikit_ext package¶
Submodules¶
scikit_ext.estimators module¶
Various scikit-learn estimators and meta-estimators
-
class
scikit_ext.estimators.
IterRandomEstimator
(estimator, target_score=None, max_iter=10, random_state=None, scoring=<function calinski_harabaz_score>, fit_params=None, verbose=0)¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.ClassifierMixin
Meta-Estimator intended primarily for unsupervised estimators whose fitted model can be heavily dependent on an arbitrary random initialization state. It is best used for problems where a
fit_predict
method is intended, so the only data used for prediction will be the same data on which the model was fitted.The
fit
method will fit multiple iterations of the same base estimator, varying therandom_state
argument for each iteration. The iterations will stop either whenmax_iter
is reached, or when the target score is obtained.The model does not use cross validation to find the best estimator. It simply fits and scores on the entire input data set. A hyperparaeter is not being optimized here, only random initialization states. The idea is to find the best fitted model, and keep that exact model, rather than to find the best hyperparameter set.
-
fit
(X, y=None, **fit_params)¶ Run fit on the estimator attribute multiple times with various
random_state
arguments and choose the fitted estimator with the best score.Uses
calinski_harabaz_score
is no scoring is provided.- X : array-like, shape = [n_samples, n_features]
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples] or [n_samples, n_output], optional
- Target relative to X for classification or regression; None for unsupervised learning.
- **fit_params : dict of string -> object
- Parameters passed to the
fit
method of the estimator
-
-
class
scikit_ext.estimators.
OneVsRestAdjClassifier
(estimator, norm=None, **kwargs)¶ Bases:
sklearn.multiclass.OneVsRestClassifier
One-vs-the-rest (OvR) multiclass strategy
Also known as one-vs-all, this strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy for multiclass classification and is a fair default choice.
The adjusted version is a custom extension which overwrites the inherited predict_proba() method with a more flexible method allowing custom normalization for the predicted probabilities. Any norm argument that can be passed directly to sklearn.preprocessing.normalize is allowed. Additionally, norm=None will skip the normalization step alltogeter. To mimick the inherited OneVsRestClassfier behavior, set norm=’l2’. All other methods are inherited from OneVsRestClassifier.
- estimator : estimator object
- An estimator object implementing fit and one of decision_function or predict_proba.
- n_jobs : int, optional, default: 1
- The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
- norm: str, optional, default: None
- Normalization method to be passed straight into sklearn.preprocessing.normalize as the norm input. A value of None (default) will skip the normalization step.
- estimators_ : list of n_classes estimators
- Estimators used for predictions.
- classes_ : array, shape = [n_classes]
- Class labels.
- label_binarizer_ : LabelBinarizer object
- Object used to transform multiclass labels to binary labels and vice-versa.
- multilabel_ : boolean
- Whether a OneVsRestClassifier is a multilabel classifier.
-
predict_proba
(X)¶ Probability estimates.
The returned estimates for all classes are ordered by label of classes.
X : array-like, shape = [n_samples, n_features]
- T : array-like, shape = [n_samples, n_classes]
- Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
-
class
scikit_ext.estimators.
OptimizedEnsemble
(estimator, n_estimators_init=5, threshold=0.01, max_iter=10, step_function=<function <lambda>>, **kwargs)¶ Bases:
sklearn.model_selection._search.BaseSearchCV
An optimized ensemble class. Will find the optimal
n_estimators
parameter for the given ensemble estimator, according to the specified input parameters.The
fit
method will iterate through n_estimators options, starting with n_estimators_init, and using the step_function reursively from there. Stop at max_iter or when the score gain between iterations is less than threshold.The OptimizedEnsemble class can then itself be used as an Estimator, or the
best_estimator_
attribute can be accessed directly, which is a fitted version of the input estimator with the optimal parameters.-
fit
(X, y, **fit_params)¶ Find the optimal
n_estimators
parameter using a custom optimization routine.- X : array-like, shape = [n_samples, n_features]
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples] or [n_samples, n_output], optional
- Target relative to X for classification or regression; None for unsupervised learning.
- **fit_params : dict of string -> object
- Parameters passed to the
fit
method of the estimator
-
score
(*args, **kwargs)¶ Call score on the estimator with the best found parameters. Only available if the underlying estimator supports
score
.This uses the score defined by the
best_estimator_.score
method.- X : array-like, shape = [n_samples, n_features]
- Input data, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples] or [n_samples, n_output], optional
- Target relative to X for classification or regression; None for unsupervised learning.
score : float
-
scikit_ext.scorers module¶
Various scikit-learn scorers and scoring functions
-
scikit_ext.scorers.
cluster_distribution_score
(X, labels)¶ Description
- X : array-like, shape (
n_samples
,n_features
) - List of
n_features
-dimensional data points. Each row corresponds to a single data point. - labels : array-like, shape (
n_samples
,) - Predicted labels for each sample.
- score : float
- The resulting Cluster Distribution score.
- X : array-like, shape (