hybparsimony.util package
Submodules
hybparsimony.util.complexity module
Complexity module.
This module contains predefined complexity functions for some of the most popular algorithms in the scikit-learn library:
linearModels_complexity: Any algorithm from `sklearn.linear_model’. Returns: 10^9·nFeatures + (sum of the squared coefs).
svm_complexity: Any algorithm from `sklearn.svm’. Returns: 10^9·nFeatures + (number of support vectors).
knn_complexity: Any algorithm from `sklearn.neighbors’. Returns: 10^9·nFeatures + 1/(number of neighbors)
mlp_complexity: Any algorithm from `sklearn.neural_network’. Returns: 10^9·nFeatures + (sum of the ANN squared weights).
randomForest_complexity: Any algorithm from `sklearn.ensemble.RandomForestRegressor’ or ‘sklearn.ensemble.RandomForestClassifier’. Returns: 10^9·nFeatures + (the average of tree leaves).
xgboost_complexity: XGboost sklearn model. Returns: 10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)
decision_tree_complexity: Any algorithm from ‘sklearn.tree’. Return: 10^9·nFeatures + (number of leaves) (Experimental)
Otherwise:
generic_complexity: Any algorithm. Returns: the number of input features (nFeatures).
Other complexity functions can be defined with the following interface.
def complexity(model, nFeatures, **kwargs):
pass
return complexity
- hybparsimony.util.complexity.decision_tree_complexity(model, nFeatures, **kwargs)[source]
Complexity function for Decision Tree models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (number of leaves)
- hybparsimony.util.complexity.generic_complexity(model, nFeatures, **kwargs)[source]
Generic complexity function.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
nFeatures.
- hybparsimony.util.complexity.knn_complexity(model, nFeatures, **kwargs)[source]
Complexity function for KNN models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + 1/(number of neighbors)
- hybparsimony.util.complexity.linearModels_complexity(model, nFeatures, **kwargs)[source]
Complexity function for linear models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (sum of the model squared coefs).
- hybparsimony.util.complexity.mlp_complexity(model, nFeatures, **kwargs)[source]
Complexity function for MLP models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (sum of the ANN squared weights)
- hybparsimony.util.complexity.randomForest_complexity(model, nFeatures, **kwargs)[source]
Complexity function for RandomForest models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (the average of tree leaves)
- hybparsimony.util.complexity.svm_complexity(model, nFeatures, **kwargs)[source]
Complexity function for SVM models.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (number of support vectors)
- hybparsimony.util.complexity.xgboost_complexity(model, nFeatures, **kwargs)[source]
Complexity function for XGBoost model.
Parameters
- modelmodel
The model from which the internal complexity is calculated.
- nFeaturesint
The number of the selected features.
- **kwargs
Other arguments.
Returns
- int
10^9·nFeatures + (the average of tree leaves * number of trees) (Experimental)
hybparsimony.util.fitness module
- hybparsimony.util.fitness.fitness_for_parallel(algorithm, complexity, custom_eval_fun=<function cross_val_score>, cromosoma=None, X=None, y=None, ignore_warnings=True)[source]
Fitness function for hybparsimony similar to ‘getFitness()’ without being nested, to allow the pickle and therefore the parallelism.
Parameters
- algorithmobject
The machine learning algorithm to optimize.
- complexityfunction
A function that calculates the complexity of the model. There are some functions available in hybparsimony.util.complexity.
- custom_eval_funfunction
An evaluation function similar to scikit-learns’s ‘cross_val_score()’.
- cromosoma: population.Chromosome class
Solution’s chromosome.
- X{array-like, dataframe} of shape (n_samples, n_features)
Input matrix.
- y{array-like, dataframe} of shape (n_samples,)
Target values (class labels in classification, real numbers in regression).
- ignore_warnings: True
If ignore warnings.
Returns
- float
np.array([model’s fitness value (J), model’s complexity]), model
Examples
import pandas as pd import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.svm import SVC from sklearn.model_selection import cross_val_score from hybparsimony import hybparsimony from hybparsimony.util import svm_complexity, population from hybparsimony.util.fitness import fitness_for_parallel # load ‘breast_cancer’ dataset breast_cancer = load_breast_cancer() X, y = breast_cancer.data, breast_cancer.target chromosome = population.Chromosome(params = [1.0, 0.2],
name_params = [‘C’,’gamma’], const = {‘kernel’:’rbf’}, cols= np.random.uniform(size=X.shape[1])>0.50, name_cols = breast_cancer.feature_names)
- print(fitness_for_parallel(SVC, svm_complexity,
custom_eval_fun=cross_val_score, cromosoma=chromosome, X=X, y=y))
- hybparsimony.util.fitness.getFitness(algorithm, complexity, custom_eval_fun=<function cross_val_score>, ignore_warnings=True)[source]
Fitness function for hybparsimony.
Parameters
- algorithmobject
The machine learning algorithm to optimize.
- complexityfunction
A function that calculates the complexity of the model. There are some functions available in hybparsimony.util.complexity.
- custom_eval_funfunction
An evaluation function similar to scikit-learns’s ‘cross_val_score()’
Returns
- float
np.array([model’s fitness value (J), model’s complexity]), model
Examples
Usage example for a binary classification model
import pandas as pd import numpy as np from sklearn.datasets import load_breast_cancer from sklearn.svm import SVC from sklearn.model_selection import cross_val_score from hybparsimony import hybparsimony from hybparsimony.util import getFitness, svm_complexity, population # load 'breast_cancer' dataset breast_cancer = load_breast_cancer() X, y = breast_cancer.data, breast_cancer.target chromosome = population.Chromosome(params = [1.0, 0.2], name_params = ['C','gamma'], const = {'kernel':'rbf'}, cols= np.random.uniform(size=X.shape[1])>0.50, name_cols = breast_cancer.feature_names) print(getFitness(SVC,svm_complexity)(chromosome, X=X, y=y))
hybparsimony.util.hyb_aux module
hybparsimony.util.models module
hybparsimony.util.order module
- hybparsimony.util.order.order(obj, kind='heapsort', decreasing=False, na_last=True)[source]
Function to order vectors
This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.
Parameters
- objnumpy.array
Array to order.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional
Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
- decreasingbool, optional
If we want decreasing order.
- na_lastbool, optional
For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.
hybparsimony.util.parsimony_monitor module
- hybparsimony.util.parsimony_monitor.parsimony_monitor(iter, current_best_score, current_best_complexity, fitnessval, bestfitnessVal, bestcomplexity, minutes_gen, digits=6, *args)[source]
Functions for monitoring HYB-PARSIMONY algorithm evolution
Functions to print summary statistics of fitness values at each iteration of a GA search.
Parameters
- iter: int
Iteration.
- current_best_score: float
The best score in the whole process (score of the best model).
- current_best_complexity: float
The complexity of the best model in the whole process.
- fitnessval: list
Fitness values of the population in that iteration.
- bestfitnessVal: float
Best fitness value in this iteration (score of the best model in that iteration)
- bestcomplexity: float
The complexity of the best model in that iteration.
- minutes_gen: float
Time in minutes of that iteration.
- digitsint
Minimal number of significant digits.
- *args :
Further arguments passed to or from other methods.
hybparsimony.util.population module
- class hybparsimony.util.population.Chromosome(params, name_params, const, cols, name_cols)[source]
Bases:
object
- property columns
- property params
- class hybparsimony.util.population.Population(params, columns, population=None)[source]
Bases:
object
- CATEGORICAL = 2
- CONSTANT = 3
- FLOAT = 1
- INTEGER = 0
- POWER = 4
- getChromosome(key)[source]
This method returns a chromosome from the population.
Parameters
- keyint
Chromosome row index .
Returns
- Chromosome
A Chromosome object.
- property paramsnames
- property population
Module contents
- hybparsimony.util.order(obj, kind='heapsort', decreasing=False, na_last=True)[source]
Function to order vectors
This function is an overload of numpy.argsort sorting method allowing increasing and decreasing ordering and allowing nan values to be placed at the end and at the beginning.
Parameters
- objnumpy.array
Array to order.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional
Sorting algorithm. The default is heapsort. Note that both ‘stable’ and ‘mergesort’ use timsort under the covers and, in general, the actual implementation will vary with data type.
- decreasingbool, optional
If we want decreasing order.
- na_lastbool, optional
For controlling the treatment of NA’s. If True, missing values in the data are put last, if False, they are put first.
- hybparsimony.util.parsimony_monitor(iter, current_best_score, current_best_complexity, fitnessval, bestfitnessVal, bestcomplexity, minutes_gen, digits=6, *args)[source]
Functions for monitoring HYB-PARSIMONY algorithm evolution
Functions to print summary statistics of fitness values at each iteration of a GA search.
Parameters
- iter: int
Iteration.
- current_best_score: float
The best score in the whole process (score of the best model).
- current_best_complexity: float
The complexity of the best model in the whole process.
- fitnessval: list
Fitness values of the population in that iteration.
- bestfitnessVal: float
Best fitness value in this iteration (score of the best model in that iteration)
- bestcomplexity: float
The complexity of the best model in that iteration.
- minutes_gen: float
Time in minutes of that iteration.
- digitsint
Minimal number of significant digits.
- *args :
Further arguments passed to or from other methods.