hybparsimony.util package

Submodules

hybparsimony.util.complexity module

Complexity module.

This module contains predefined complexity functions for some of the most popular algorithms in the scikit-learn library:

  • linearModels_complexity: Any algorithm from `sklearn.linear_model’. Returns: 10^9·nFeatures + (sum of the squared coefs).

  • svm_complexity: Any algorithm from `sklearn.svm’. Returns: 10^9·nFeatures + (number of support vectors).

  • knn_complexity: Any algorithm from `sklearn.neighbors’. Returns: 10^9·nFeatures + 1/(number of neighbors)

  • mlp_complexity: Any algorithm from `sklearn.neural_network’. Returns: 10^9·nFeatures + (sum of the ANN squared weights).

  • randomForest_complexity: Any algorithm from `sklearn.ensemble.RandomForestRegressor’ or ‘sklearn.ensemble.RandomForestClassifier’. Returns: 10^9·nFeatures + (the average of tree leaves).

  • xgboost_complexity: XGboost sklearn model. Returns: 10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)

  • decision_tree_complexity: Any algorithm from ‘sklearn.tree’. Return: 10^9·nFeatures + (number of leaves) (Experimental)

Otherwise:

  • generic_complexity: Any algorithm. Returns: the number of input features (nFeatures).

Other complexity functions can be defined with the following interface.

def complexity(model, nFeatures, **kwargs):
    pass

return complexity
hybparsimony.util.complexity.decision_tree_complexity(model, nFeatures, **kwargs)[source]

Complexity function for Decision Tree models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (number of leaves)

hybparsimony.util.complexity.generic_complexity(model, nFeatures, **kwargs)[source]

Generic complexity function.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

nFeatures.

hybparsimony.util.complexity.kernel_ridge_complexity(model, nFeatures, **kwargs)[source]
hybparsimony.util.complexity.knn_complexity(model, nFeatures, **kwargs)[source]

Complexity function for KNN models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + 1/(number of neighbors)

hybparsimony.util.complexity.linearModels_complexity(model, nFeatures, **kwargs)[source]

Complexity function for linear models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (sum of the model squared coefs).

hybparsimony.util.complexity.mlp_complexity(model, nFeatures, **kwargs)[source]

Complexity function for MLP models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (sum of the ANN squared weights)

hybparsimony.util.complexity.randomForest_complexity(model, nFeatures, **kwargs)[source]

Complexity function for RandomForest models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (the average of tree leaves)

hybparsimony.util.complexity.svm_complexity(model, nFeatures, **kwargs)[source]

Complexity function for SVM models.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (number of support vectors)

hybparsimony.util.complexity.xgboost_complexity(model, nFeatures, **kwargs)[source]

Complexity function for XGBoost model.

Parameters

modelmodel

The model from which the internal complexity is calculated.

nFeaturesint

The number of the selected features.

**kwargs

Other arguments.

Returns

int

10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)

hybparsimony.util.fitness module

hybparsimony.util.fitness.fitness_for_parallel(algorithm, complexity, custom_eval_fun=<function cross_val_score>, cromosoma=None, X=None, y=None, ignore_warnings=True)[source]

Fitness function for hybparsimony similar to ‘getFitness()’ without being nested, to allow the pickle and therefore the parallelism.

Parameters

algorithmobject

The machine learning algorithm to optimize.

complexityfunction

A function that calculates the complexity of the model. There are some functions available in hybparsimony.util.complexity.

custom_eval_funfunction

An evaluation function similar to scikit-learns’s ‘cross_val_score()’.

cromosoma: population.Chromosome class

Solution’s chromosome.

X{array-like, dataframe} of shape (n_samples, n_features)

Input matrix.

y{array-like, dataframe} of shape (n_samples,)

Target values (class labels in classification, real numbers in regression).

ignore_warnings: True

If ignore warnings.

Returns

float

np.array([model’s fitness value (J), model’s complexity]), model

Examples

import pandas as pd import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.svm import SVC from sklearn.model_selection import cross_val_score from hybparsimony import hybparsimony from hybparsimony.util import svm_complexity, population from hybparsimony.util.fitness import fitness_for_parallel # load ‘breast_cancer’ dataset breast_cancer = load_breast_cancer() X, y = breast_cancer.data, breast_cancer.target chromosome = population.Chromosome(params = [1.0, 0.2],

name_params = [‘C’,’gamma’], const = {‘kernel’:’rbf’}, cols= np.random.uniform(size=X.shape[1])>0.50, name_cols = breast_cancer.feature_names)

print(fitness_for_parallel(SVC, svm_complexity,

custom_eval_fun=cross_val_score, cromosoma=chromosome, X=X, y=y))

hybparsimony.util.fitness.getFitness(algorithm, complexity, custom_eval_fun=<function cross_val_score>, ignore_warnings=True)[source]

Fitness function for hybparsimony.

Parameters

algorithmobject

The machine learning algorithm to optimize.

complexityfunction

A function that calculates the complexity of the model. There are some functions available in hybparsimony.util.complexity.

custom_eval_funfunction

An evaluation function similar to scikit-learns’s ‘cross_val_score()’

Returns

float

np.array([model’s fitness value (J), model’s complexity]), model

Examples

Usage example for a binary classification model

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from hybparsimony import hybparsimony
from hybparsimony.util import getFitness, svm_complexity, population
# load 'breast_cancer' dataset
breast_cancer = load_breast_cancer()
X, y = breast_cancer.data, breast_cancer.target
chromosome = population.Chromosome(params = [1.0, 0.2],
                                name_params = ['C','gamma'],
                                const = {'kernel':'rbf'},
                                cols= np.random.uniform(size=X.shape[1])>0.50,
                                name_cols = breast_cancer.feature_names)
print(getFitness(SVC,svm_complexity)(chromosome, X=X, y=y))

hybparsimony.util.hyb_aux module

hybparsimony.util.models module

hybparsimony.util.models.check_algorithm(algorithm, is_classification)[source]

hybparsimony.util.order module

hybparsimony.util.order.order(obj, kind='heapsort', decreasing=False, na_last=True)[source]

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters

objnumpy.array

Array to order.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.

decreasingbool, optional

If we want decreasing order.

na_lastbool, optional

For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

hybparsimony.util.parsimony_monitor module

hybparsimony.util.parsimony_monitor.parsimony_monitor(iter, current_best_score, current_best_complexity, fitnessval, bestfitnessVal, bestcomplexity, minutes_gen, digits=6, *args)[source]

Functions for monitoring HYB-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters

iter: int

Iteration.

current_best_score: float

The best score in the whole process (score of the best model).

current_best_complexity: float

The complexity of the best model in the whole process.

fitnessval: list

Fitness values of the population in that iteration.

bestfitnessVal: float

Best fitness value in this iteration (score of the best model in that iteration)

bestcomplexity: float

The complexity of the best model in that iteration.

minutes_gen: float

Time in minutes of that iteration.

digitsint

Minimal number of significant digits.

*args :

Further arguments passed to or from other methods.

hybparsimony.util.parsimony_monitor.parsimony_summary(fitnessval, complexity, *args)[source]

hybparsimony.util.population module

class hybparsimony.util.population.Chromosome(params, name_params, const, cols, name_cols)[source]

Bases: object

property columns
property params
class hybparsimony.util.population.Population(params, columns, population=None)[source]

Bases: object

CATEGORICAL = 2
CONSTANT = 3
FLOAT = 1
INTEGER = 0
POWER = 4
getChromosome(key)[source]

This method returns a chromosome from the population.

Parameters

keyint

Chromosome row index .

Returns

Chromosome

A Chromosome object.

property paramsnames
property population
update_to_feat_thres(popSize, feat_thres)[source]

Module contents

hybparsimony.util.order(obj, kind='heapsort', decreasing=False, na_last=True)[source]

Function to order vectors

This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.

Parameters

objnumpy.array

Array to order.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional

Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.

decreasingbool, optional

If we want decreasing order.

na_lastbool, optional

For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.

hybparsimony.util.parsimony_monitor(iter, current_best_score, current_best_complexity, fitnessval, bestfitnessVal, bestcomplexity, minutes_gen, digits=6, *args)[source]

Functions for monitoring HYB-PARSIMONY algorithm evolution

Functions to print summary statistics of fitness values at each iteration of a GA search.

Parameters

iter: int

Iteration.

current_best_score: float

The best score in the whole process (score of the best model).

current_best_complexity: float

The complexity of the best model in the whole process.

fitnessval: list

Fitness values of the population in that iteration.

bestfitnessVal: float

Best fitness value in this iteration (score of the best model in that iteration)

bestcomplexity: float

The complexity of the best model in that iteration.

minutes_gen: float

Time in minutes of that iteration.

digitsint

Minimal number of significant digits.

*args :

Further arguments passed to or from other methods.