Feature Selectors

DiffVariableSelector

class bcselector.variable_selection.DiffVariableSelector[source]

Bases: bcselector.variable_selection._VariableSelector

Ranks all features in dataset with difference cost filter method.

Methods Summary

fit(data, target_variable, costs, lamb[, …])

Ranks all features in dataset with difference cost filter method.

score(model, scoring_function, **kwargs)

plot_scores([budget, …])

get_cost_results()

get_no_cost_results()

Methods Documentation

fit(data, target_variable, costs, lamb, j_criterion_func='cife', number_of_features=None, budget=None, stop_budget=False, **kwargs)[source]

Ranks all features in dataset with difference cost filter method.

Parameters
  • data (np.ndarray or pd.) – Matrix or data frame of data that we want to rank features.

  • target_variable (np.ndarray or pd.core.series.Series) – Vector or series of target variable. Number of rows in data must equal target_variable length

  • costs (list or dict) – Costs of features. Must be the same size as columns in data. When using data as np.array, provide costs as list of floats or integers. When using data as pd.DataFrame, provide costs as list of floats or integers or dict {‘col_1’:cost_1,…}.

  • lamb (int or float) – Cost scaling parameter. Higher lambda is, higher is the impact of the cost on selection.

  • j_criterion_func (str) – Method of approximation of the conditional mutual information Must be one of [‘mim’,’mifs’,’mrmr’,’jmi’,’cife’]. All methods can be seen by running: >>> from bcselector.information_theory.j_criterion_approximations.__all__

  • number_of_features (int) – Optional argument, constraint to selected number of features.

  • budget (int or float) – Optional argument, constraint to selected total cost of features.

  • stop_budget (bool) – Optional argument, TODO - must delete this argument

  • **kwargs – Arguments passed to difference_find_best_feature() function and then to j_criterion_func.

Examples

>>> from bcselector.variable_selection import DiffVariableSelector
>>> dvs = DiffVariableSelector()
>>> dvs.fit(X, y, costs, lamb=1, j_criterion_func='mim')
score(model, scoring_function, **kwargs)
plot_scores(budget=None, compare_no_cost_method=False, savefig=False, annotate=False, annotate_box=False, figsize=(12, 8), bbox_pos=(0.72, 0.6), plot_title=None, x_axis_title=None, y_axis_title=None, **kwargs)
get_cost_results()
get_no_cost_results()