glmdisc module

This module is dedicated to preprocessing tasks for logistic regression and post-learning graphical tools.

class glmdisc.glmdisc(test=True, validation=True, criterion='bic', iter=100, m_start=20)[source]

Bases: object

This class implements a supervised multivariate discretization method, factor levels grouping and interaction discovery for logistic regression.

bestFormula()[source]

Returns the best formula found by the MCMC.

contData()[source]

Returns the continuous data provided to the MCMC as a single pandas dataframe.

discreteData()[source]

Returns the best discrete data found by the MCMC.

discretize(predictors_cont, predictors_qual)[source]

Discretizes new continuous and categorical features using a previously fitted glmdisc object.

Keyword arguments: predictors_cont – Continuous predictors to be discretized in a numpy “numeric” array. Can be provided either here or with the __init__ method. predictors_qual – Categorical features which levels are to be merged (also in a numpy “string” array). Can be provided either here or with the __init__ method.

discretizeDummy(predictors_cont, predictors_qual)[source]

Discretizes new continuous and categorical features using a previously fitted glmdisc object as Dummy Variables usable with the best_reglog object.

Keyword arguments: predictors_cont – Continuous predictors to be discretized in a numpy “numeric” array. Can be provided either here or with the __init__ method. predictors_qual – Categorical features which levels are to be merged (also in a numpy “string” array). Can be provided either here or with the __init__ method.

fit(predictors_cont, predictors_qual, labels)[source]

Fits the glmdisc object.

Keyword arguments: predictors_cont – Continuous predictors to be discretized in a numpy “numeric” array. Can be provided either here or with the __init__ method. predictors_qual – Categorical features which levels are to be merged (also in a numpy “string” array). Can be provided either here or with the __init__ method. labels – Boolean (0/1) labels of the observations. Must be of the same length as predictors_qual and predictors_cont (numpy “numeric” array).

performance()[source]

Returns the best performance found by the MCMC.

predict(predictors_cont, predictors_qual)[source]

Predicts the label values with new continuous and categorical features using a previously fitted glmdisc object.

Keyword arguments: predictors_cont – Continuous predictors to be discretized in a numpy “numeric” array. Can be provided either here or with the __init__ method. predictors_qual – Categorical features which levels are to be merged (also in a numpy “string” array). Can be provided either here or with the __init__ method.

glmdisc.vectorized(prob_matrix, items)[source]