skclean.handlers.Filter

class skclean.handlers.Filter(classifier, detector=None, threshold: float = 0.5, frac_to_filter: float = None, n_jobs=1, random_state=None)

Removes from dataset samples most likely to be noisy. Samples-to-be-removed can be selected in two ways: either a specified percentage of samples with lowest conf_score, or samples with lower conf_score than a specified threshold.

Parameters
  • classifier (object) – A classifier instance supporting sklearn API.

  • detector (BaseDetector or None, default=None) – To compute conf_score. Set it to None only if conf_score is expected in fit() (e.g. when used inside a Pipeline with a BaseDetector preceding it). Otherwise a Detector must be supplied during instantiation.

  • threshold (float, default=.5) – Samples with higher conf_score will be kept, rest will be filtered out. A value of .5 implies majority voting, whereas .99 (i.e. a value closer to, but less than 1.0) implies onsensus voting.

  • frac_to_filter (float, default=None) – Percentages of samples to filter out. Exactly one of either threshold or frac_to_filter must be set.

  • n_jobs (int, default=1) – No of parallel cpu cores to use

  • random_state (int, default=None) – Set this value for reproducibility

Methods

__init__(classifier[, detector, threshold, …])

Initialize self.

fit(X, y[, conf_score])

get_params([deep])

Get parameters for this estimator.

predict(X)

predict_proba(X)

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

Attributes

iterative