cubmods package#

Submodules#

cubmods.cub module#

CUB models in Python. Module for CUB (Combination of Uniform and Binomial).

Description#

This module contains methods and classes for CUB model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cub.CUBresCUB00(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, figsize])

Main function to plot an object of the Class.

plot_confell([figsize, ci, equal, ...])

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipse.

plot_ordinal([figsize, kind, ax, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, figsize=(7, 15))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_confell(figsize=(7, 5), ci=0.95, equal=True, magnified=False, ax=None, saveas=None)#

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipse.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • equal (bool) – if the plot must have equal aspect

  • magnified (bool) – if False the limits will be the entire parameter space, otherwise let matplotlib choose the limits

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), kind='bar', ax=None, saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cub.cmf(m, pi, xi)#

Cumulative probability of a specified CUB model.

\(\Pr(R \leq r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

an array of the CMF for the specified model

Return type:

numpy array

cubmods.cub.draw(m, pi, xi, n, df, formula, seed=None)#

Draw a random sample from a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • n (int) – number of ordinal responses to be drawn

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cub.gini(m, pi, xi)#

The Gini index of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the Gini index of the model

Return type:

float

cubmods.cub.init_theta(f, m)#

Preliminary estimators for CUB models without covariates.

Computes preliminary parameter estimates of a CUB model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • f (array of int) – array of the absolute frequencies of given ordinal responses

  • m (int) – number of ordinal categories

Returns:

a tuple of \((\pi^{(0)}, \xi^{(0)})\)

cubmods.cub.laakso(m, pi, xi)#

The Laakso index of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the Laakso index of the model

Return type:

float

cubmods.cub.loglik(m, pi, xi, f)#

Compute the log-likelihood function of a CUB model without covariates for a given absolute frequency distribution.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • f (array of int) – array of absolute frequency distribution

Returns:

the log-likelihood value

Return type:

float

cubmods.cub.mean(m, pi, xi)#

Expected value of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the expected value of the model

Return type:

float

cubmods.cub.median(m, pi, xi)#

The median of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the median of the model

Return type:

float

cubmods.cub.mle(sample, m, df, formula, ass_pars=None, maxiter=500, tol=0.0001)#

Main function for CUB models without covariates.

Function to estimate and validate a CUB model without covariates for given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUB00 (see the Class for details)

Return type:

object

cubmods.cub.pmf(m, pi, xi)#

Probability distribution of a specified CUB model.

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the vector of the probability distribution of a CUB model.

Return type:

numpy array

cubmods.cub.prob(m, pi, xi, r)#

Probability \(\Pr(R = r | \pmb\theta)\) of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • r (int) – ordinal value (must be \(1 \leq r \leq m\))

Returns:

the probability \(\Pr(R = r | \pmb\theta)\)

Return type:

float

cubmods.cub.skew(pi, xi)#

Skewness normalized \(\eta\) index

Parameters:
  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the skewness of the model

Return type:

float

cubmods.cub.std(m, pi, xi)#

Standard deviation of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the standard deviation of the model

Return type:

float

cubmods.cub.var(m, pi, xi)#

Variance of a specified CUB model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the variance of the model

Return type:

float

cubmods.cub.varcov(m, pi, xi, ordinal)#

Compute the variance-covariance matrix of parameter estimates of a CUB model without covariates.

References:

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • ordinal (array of int) – array of ordinal responses

Returns:

the variance-covariance matrix of the CUB model

Return type:

numpy ndarray

cubmods.cub_0w module#

CUB models in Python. Module for CUB (Combination of Uniform and Binomial) with covariates for the feeling component.

Description:#

This module contains methods and classes for CUB_0W model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cub_0w.CUBresCUB0W(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cub_0w.draw(m, pi, gamma, W, df, formula, seed=None)#

Draw a random sample from a specified CUB model with covariates for the feeling component.

Parameters:
  • m (int) – number of ordinal categories

  • n (int) – number of ordinal responses to be drawn

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None; it must be \(\neq 0\)

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cub_0w.effe01(gamma, esterno01, m)#

Auxiliary function for the log-likelihood estimation of CUB models with covariates for the feeling component.

Compute the opposite of the scalar function that is maximized when running the E-M algorithm for CUB models with covariates for the feeling parameter.

It is called as an argument for minimize within CUB function for models with covariates for feeling or for both feeling and uncertainty.

Parameters:
  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • esterno01 – a matrix binding together: the vector \(\pmb\tau\) of the posterior probabilities that each observation has been generated by the first component distribution of the mixture, the ordinal data \(\pmb r\) and the matrix \(\pmb w\) of the selected covariates accounting for an intercept term

  • m (int) – number of ordinal categories

Returns:

the expected value of the inconplete log-likelihood

Return type:

float

cubmods.cub_0w.init_gamma(sample, m, W)#

Preliminary parameter estimates of a CUB model with covariates for the feeling component.

Compute preliminary parameter estimates for the feeling component of a CUB model fitted to ordinal responses. These estimates are set as initial values for parameters to start the E-M algorithm.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

an array \(\pmb\gamma^{(0)}\)

Return type:

array of float

cubmods.cub_0w.loglik(sample, m, pi, gamma, W)#

Log-likelihood function of a CUB model with covariates for the feeling component

Compute the log-likelihood function of a CUB model fitting ordinal data, with covariates for explaining the feeling component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the log-likelihood value

Return type:

float

cubmods.cub_0w.mle(sample, m, W, df, formula, ass_pars=None, maxiter=500, tol=0.0001)#

Main function for CUB models with covariates for the feeling component.

Function to estimate and validate a CUB model for given ordinal responses, with covariates for explaining the feeling component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUB0W (see the Class for details)

Return type:

object

cubmods.cub_0w.pmf(m, pi, gamma, W)#

Average probability distribution of a specified CUB model with covariates for the feeling component.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the vector of the probability distribution.

Return type:

numpy array

cubmods.cub_0w.pmfi(m, pi, gamma, W)#

Probability distribution for each subject of a specified CUB model with covariates for the feeling component.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cub_0w.prob(sample, m, pi, gamma, W)#

Probability distribution of a CUB model with covariates for the feeling component given an observed sample

Compute the probability distribution of a CUB model with covariates for the feeling component, given an observed sample.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.cub_0w.varcov(sample, m, pi, gamma, W)#

Variance-covariance matrix of CUB models with covariates for the feeling component

Compute the variance-covariance matrix of parameter estimates of a CUB model with covariates for the feeling component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the variance-covariance matrix of the CUB model

Return type:

numpy ndarray

cubmods.cub_y0 module#

CUB models in Python. Module for CUB (Combination of Uniform and Binomial) with covariates for the uncertainty component.

Description:#

This module contains methods and classes for CUB_Y0 model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cub_y0.CUBresCUBY0(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cub_y0.draw(m, beta, xi, Y, df, formula, seed=None)#

Draw a random sample from a specified CUB model with covariates for the uncertainty component.

Parameters:
  • m (int) – number of ordinal categories

  • n (int) – number of ordinal responses to be drawn

  • xi (float) – uncertainty parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None; it must be \(\neq 0\)

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cub_y0.effe10(beta, esterno10)#

Auxiliary function for the log-likelihood estimation of CUB models.

Compute the opposite of the scalar function that is maximized when running the E-M algorithm for CUB models with covariates for the uncertainty parameter.

It is called as an argument for optim within CUB function for models with covariates for

uncertainty or for both feeling and uncertainty.

Parameters:
  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • esterno10 – A matrix binding together: the matrix \(\pmb y\) of the selected covariates (accounting for an intercept term) and a vector \(\tau\) (whose length equals the number of observations) of the posterior probabilities that each observation has been generated by the first component distribution of the mixture

Returns:

the expected value of the inconplete log-likelihood

Return type:

float

cubmods.cub_y0.loglik(m, sample, Y, beta, xi)#

Log-likelihood function of a CUB model with covariates for the uncertainty component

Compute the log-likelihood function of a CUB model fitting ordinal responses with covariates for explaining the uncertainty component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • xi (float) – uncertainty parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

Returns:

the log-likelihood value

Return type:

float

cubmods.cub_y0.mle(sample, m, Y, df, formula, ass_pars=None, maxiter=500, tol=0.0001)#

Main function for CUB models with covariates for the uncertainty component.

Estimate and validate a CUB model for given ordinal responses, with covariates for explaining the uncertainty component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBY0 (see the Class for details)

Return type:

object

cubmods.cub_y0.pmf(m, beta, xi, Y)#

Average probability distribution of a specified CUB model with covariates.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • xi (float) – feeling parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

Returns:

the vector of the probability distribution.

Return type:

numpy array

cubmods.cub_y0.pmfi(m, beta, xi, Y)#

Probability distribution for each subject of a specified CUB model with covariates.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • xi (float) – feeling parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cub_y0.prob(m, sample, Y, beta, xi)#

Probability distribution of a CUB model with covariates for the uncertainty component given an observed sample

Compute the probability distribution of a CUB model with covariates for the feeling component, given an observed sample.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • xi (float) – uncertainty parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.cub_y0.varcov(m, sample, Y, beta, xi)#

Variance-covariance matrix of CUB model with covariates for the uncertainty parameter.

Compute the variance-covariance matrix of parameter estimates of a CUB model with covariates for the uncertainty component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • xi (float) – uncertainty parameter \(\xi\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

Returns:

the variance-covariance matrix of the CUB model

Return type:

numpy ndarray

cubmods.cub_yw module#

CUB models in Python. Module for CUB (Combination of Uniform and Binomial) with covariates for both feeling and uncertainty.

Description:#

This module contains methods and classes for CUB_YW model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cub_yw.CUBresCUBYW(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

“Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cub_yw.draw(m, beta, gamma, Y, W, df, formula, seed=None)#

Draw a random sample from a specified CUB model with covariates for both feeling and uncertainty.

Parameters:
  • n (int) – number of ordinal responses to be drawn

  • m (int) – number of ordinal categories

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cub_yw.loglik(m, sample, Y, W, beta, gamma)#

Log-likelihood function of a CUB model with covariates for both feeling and uncertainty.

Compute the log-likelihood function of a CUB model fitting ordinal data with covariates for explaining both the feeling and the uncertainty components.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the log-likelihood value

Return type:

float

cubmods.cub_yw.mle(sample, m, Y, W, df, formula, ass_pars=None, maxiter=500, tol=0.0001)#

Main function for CUB models with covariates for both the uncertainty and the feeling components.

Estimate and validate a CUB model for given ordinal responses, with covariates for explaining both the feeling and the uncertainty components by means of logistic transform.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBYW (see the Class for details)

Return type:

object

cubmods.cub_yw.pmf(m, beta, gamma, Y, W)#

Average probability distribution of a specified CUB model with covariates for both feeling and uncertainty.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the vector of the probability distribution.

Return type:

numpy array

cubmods.cub_yw.pmfi(m, beta, gamma, Y, W)#

Probability distribution for each subject of a specified CUB model with covariates for both feeling and uncertainty.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cub_yw.prob(m, sample, Y, W, beta, gamma)#

Probability distribution of a CUB model with covariates for both feeling and uncertainty.

Compute the probability distribution of a CUB model with covariates for both the feeling and the uncertainty components.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.cub_yw.varcov(m, sample, Y, W, beta, gamma)#

Variance-covariance matrix of a CUB model with covariates for both uncertainty and feeling.

Compute the variance-covariance matrix of parameter estimates of a CUB model with covariates for both the uncertainty and the feeling components.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the variance-covariance matrix of the CUB model

Return type:

numpy ndarray

cubmods.cube module#

CUB models in Python. Module for CUBE (Combination of Uniform and Beta-Binomial).

Description:#

This module contains methods and classes for CUBE model family.

Manual, Examples and References:#

List of TODOs:#

  • TODO: adjust 3d plots legend

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cube.CUBresCUBE(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, confell, test3, figsize])

Main function to plot an object of the Class.

plot3d(ax[, ci, magnified])

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipsoid with its projections.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, confell=False, test3=True, figsize=(7, 15))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipsoid

  • confell (bool) – DEPRECATED, defaults to False

  • test3 (bool) – DEPRECATED, defaults to True

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot3d(ax, ci=0.95, magnified=False)#

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipsoid with its projections.

Parameters:
  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipsoid

  • magnified (bool) – if False the limits will be the entire parameter space, otherwise let matplotlib choose the limits

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cube.betar(m, xi, phi)#

Beta-Binomial distribution.

Return the Beta-Binomial distribution with given parameters.

Parameters:
  • m (int) – number of ordinal categories

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

array of length \(m\) of the Beta-Binomial distribution.

Return type:

numpy array

cubmods.cube.cmf(m, pi, xi, phi)#

Cumulative probability of a specified CUBE model.

\(\Pr(R \leq r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

array of length \(m\) of the cumulative probability of a CUBE model without covariates.

Return type:

numpy array

cubmods.cube.draw(m, pi, xi, phi, n, df, formula, seed=None)#

Draw a random sample from a specified CUBE model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

  • n (int) – number of ordinal responses to be drawn

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cube.effecube(params, tau, f, m)#

Auxiliary function for the log-likelihood estimation of CUBE models without covariates.

Define the opposite of the scalar function that is maximized when running the E-M algorithm for CUBE models without covariates.

Parameters:
  • params (array of float) – array of initial estimates for the feeling and the overdispersion parameters

  • tau (array) – a column vector of length \(m\) containing the posterior probabilities that each observed category has been generated by the first component distribution of the mixture

  • f (array) – array of the absolute frequencies of the observations

  • m (int) – number of ordinal categories

Returns:

the expected value of the inconplete log-likelihood

Return type:

float

cubmods.cube.init_theta(sample, m)#

Naive estimates for CUBE models without covariates.

Compute naive parameter estimates of a CUBE model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

Returns:

a tuple of \((\pi^{(0)}, \xi^{(0)}, \phi^{(0)})\)

Return type:

tuple of float

cubmods.cube.loglik(m, pi, xi, phi, f)#

Log-likelihood function of a CUBE model without covariates.

Compute the log-likelihood function of a CUBE model without covariates fitting the given absolute frequency distribution.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

  • f (array of int) – array of absolute frequency distribution

Returns:

the log-likelihood value

Return type:

float

cubmods.cube.mean(m, pi, xi)#

Mean of a CUBE model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the expected value of the model

Return type:

float

cubmods.cube.mle(sample, m, df, formula, ass_pars=None, maxiter=1000, tol=1e-06)#

Main function for CUBE models without covariates.

Estimate and validate a CUBE model without covariates.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBE (see the Class for details)

Return type:

object

cubmods.cube.pmf(m, pi, xi, phi)#

Probability distribution of a specified CUBE model.

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

array of length \(m\) of the distribution of a CUBE model without covariates.

Return type:

numpy array

cubmods.cube.prob(m, pi, xi, phi, r)#

Probability \(\Pr(R = r | \pmb\theta)\) of a CUBE model without covariates.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

  • r (int) – ordinal response

Returns:

the probability \(\Pr(R = r | \pmb\theta)\) of a CUBE model without covariates.

Return type:

numpy array

cubmods.cube.var(m, pi, xi, phi)#

Variance of a CUBE model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

the variance of the model

Return type:

float

cubmods.cube.varcov(m, pi, xi, phi, sample)#

Variance-covariance matrix for CUBE models based on the observed information matrix.

Compute the variance-covariance matrix of parameter estimates for a CUBE model without covariates as the inverse of the observed information matrix.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

  • sample (array of int) – array of ordinal responses

Returns:

the variance-covariance matrix of the CUBE model

Return type:

numpy ndarray

cubmods.cube_0w0 module#

CUB models in Python. Module for CUBE (Combination of Uniform and Beta-Binomial) with covariates for the feeling component.

Description:#

This module contains methods and classes for CUBE_0W0 model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cube_0w0.CUBresCUBE0W0(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cube_0w0.betabinomialxi(m, sample, xivett, phi)#

Beta-Binomial probabilities of ordinal responses, given feeling parameter for each observation.

Compute the Beta-Binomial probabilities of given ordinal responses, with feeling parameter specified for each observation, and with the same overdispersion parameter for all the responses.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array) – array of ordinal responses. Missing values are not allowed: they should be preliminarily deleted

  • xivett (array) – array of feeling parameters of the Beta-Binomial distribution for given ordinal responses

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

array of the same length as ordinal: each entry is the Beta-Binomial probability for the given observation for the corresponding feeling and overdispersion parameters.

Return type:

array

cubmods.cube_0w0.draw(m, pi, gamma, phi, W, df, formula, seed=None)#

Draw a random sample from a specified CUBE model.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • phi (float) – overdispersion parameter \(\phi\)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • n (int) – number of ordinal responses to be drawn

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cube_0w0.effe(pars, sample, W, m)#

Auxiliary function for the log-likelihood estimation of CUBE models with covariates only for the feeling component.

Compute the opposite of the scalar function that is maximized when running the E-M algorithm for CUBE models with covariates only for the feeling component.

Parameters:
  • pars (array) – array of length equal to W.index.size+3 whose entries are the initial parameters estimates

  • sample (array of int) – array of ordinal responses

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • m (int) – number of ordinal categories

Returns:

negative log-likelihood

Return type:

float

cubmods.cube_0w0.init_theta(m, sample, W, maxiter, tol)#

Preliminary estimates of parameters for CUBE models with covariates only for feeling.

Compute preliminary parameter estimates of a CUBE model with covariates only for feeling, given ordinal responses. These estimates are set as initial values to start the corresponding E-M algorithm within the package. Preliminary estimates for the uncertainty and the overdispersion parameters are computed by short runs of EM. As to the feeling component, it considers the nested CUB model with covariates and calls code{link{inibestgama}} to derive initial estimates for the coefficients of the selected covariates for feeling.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • maxiter (int) – maximum number of iterations allowed for preliminary iterations

  • tol (float) – fixed error tolerance for final estimates for preliminary iterations

Returns:

a tuple of \((\pi^{(0)}, \pmb \gamma^{(0)}, \phi^{(0)})\), where \(\pi^{(0)}\) is the initial estimate for the uncertainty parameter, \(\pmb \gamma^{(0)}\) is the vector of initial estimates for the feeling component (including an intercept term in the first entry), and \(\phi^{(0)}\) is the initial estimate for the overdispersion parameter.

“rtype”: tuple

cubmods.cube_0w0.loglik(m, sample, W, pi, gamma, phi)#

Log-likelihood function of CUBE model with covariates only for feeling.

Compute the log-likelihood function of a CUBE model for ordinal data with subjects’ covariates only for feeling.

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • phi (float) – overdispersion parameter \(\phi\)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • sample (array of int) – array of ordinal responses

Returns:

the log-likelihood value

Return type:

float

cubmods.cube_0w0.mle(sample, m, W, df, formula, ass_pars=None, maxiter=1000, tol=1e-06)#

Main function for CUBE models with covariates only for feeling

Estimate and validate a CUBE model for ordinal data, with covariates only for explaining the feeling component.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for preliminary iterations

  • tol (float) – fixed error tolerance for final estimates for preliminary iterations; the informatio matrix (to compute the variance-covariance matrix) is approximated with approx_hess() (see statsmodels.tools.numdiff for details)

Returns:

an instance of CUBresCUBE0W0 (see the Class for details)

Return type:

object

cubmods.cube_0w0.pmf(m, pi, gamma, phi, W)#

Average probability distribution of a specified CUB model with covariates for the feeling component.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • phi (float) – overdispersion parameter \(\phi\)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the array of the average probability distribution

Return type:

numpy array

cubmods.cube_0w0.pmfi(m, pi, gamma, phi, W)#

Probability distribution for each subject of a specified CUBE model with covariates for feeling only.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • phi (float) – overdispersion parameter \(\phi\)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cube_0w0.prob(m, sample, W, pi, gamma, phi)#

Probability distribution of a CUBE model with covariates for feeling.

Compute the probability distribution of a CUB model with covariates for both the feeling and the uncertainty components. Auxiliary function of .loglik()

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • phi (float) – overdispersion parameter \(\phi\)

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • sample (array of int) – array of ordinal responses

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.cube_ywz module#

CUB models in Python. Module for CUBE (Combination of Uniform and Beta-Binomial) with covariates.

Description:#

This module contains methods and classes for CUB_YWZ model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cube_ywz.CUBresCUBEYWZ(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cube_ywz.Qdue(pars, tauno, sample, W, Z, m)#

Auxiliary function for the log-likelihood estimation of CUBE models with covariates.

Define the opposite of one of the two scalar functions that are maximized when running the E-M algorithm for CUBE models with covariates for feeling, uncertainty and overdispersion.

Parameters:
  • pars (array) – array of initial estimates of parameters for the feeling component and the overdispersion effect

  • tauno (array) – the column vector of the posterior probabilities that each observed rating has been generated by the distribution of the first component of the mixture

  • sample (array of int) – array of ordinal responses

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

  • m (int) – number of ordinal categories

cubmods.cube_ywz.Quno(beta, esterno1)#

Auxiliary function for the log-likelihood estimation of CUBE models with covariates.

Define the opposite one of the two scalar functions that are maximized when running the E-M algorithm for CUBE models with covariates for feeling, uncertainty and overdispersion.

It is iteratively called as an argument of “optim” within CUBE function (with covariates) as the function to minimize to compute the maximum likelihood estimates for the feeling and the overdispersion components.

Parameters:
  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • esterno1 (ndarray) – matrix binding together the column vector of the posterior probabilities that each observed rating has been generated by the first component distribution of the mixture, with the matrix \(\pmb y\) of explicative variables for the uncertainty component, expanded with a unitary vector in the first column to consider also an intercept term

cubmods.cube_ywz.auxmat(m, xi, phi, a, b, c, d, e)#

Auxiliary matrix.

Returns an auxiliary matrix needed for computing the variance-covariance matrix of a CUBE model with covariates.

Parameters:
  • m (int) – number of ordinal categories

  • xi (array of float) – feeling parameters \(\pmb\xi\)

  • phi (array of float) – overdispersion parameter \(\pmb\phi\)

  • a,b,c,d,e (float) – see the reference paper Piccolo, 2015 for details

cubmods.cube_ywz.betabinomial(m, sample, xi, phi)#

Beta-Binomial probabilities of ordinal responses, with feeling and overdispersion parameters for each observation.

Compute the Beta-Binomial probabilities of ordinal responses, given feeling and overdispersion parameters for each observation.

The Beta-Binomial distribution is the Binomial distribution in which the probability of success at each trial is random and follows the Beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics as an overdispersed binomial distribution.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • xi (float) – feeling parameter \(\xi\)

  • phi (float) – overdispersion parameter \(\phi\)

Returns:

array of the same length as sample, containing the Beta-Binomial probabilities of each observation, for the corresponding feeling and overdispersion parameters.

Return type:

array

cubmods.cube_ywz.draw(m, beta, gamma, alpha, df, formula, Y, W, Z, seed=None)#

Draw a random sample from a specified CUBE model.

Parameters:
  • m (int) – number of ordinal categories

  • n (int) – number of ordinal responses to be drawn

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • alpha (array of float) – array \(\pmb \alpha\) of parameters for the overdispersion, whose length equals Z.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cube_ywz.init_theta(m, sample, W, p, v)#

Preliminary parameter estimates for CUBE models with covariates.

Compute preliminary parameter estimates for a CUBE model with covariates for all the three parameters. These estimates are set as initial values to start the E-M algorithm within maximum likelihood estimation.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • p (int) – number of covariates for the uncertainty component

  • v (int) – number of covariates for the overdispersion

Returns:

a tuple of \((\pmb \beta^{(0)}, \pmb \gamma^{(0)}, \pmb \alpha^{(0)})\) of preliminary estimates of parameter vectors for \(\pi = \pi(\pmb{\beta})\), ; xi=xi(pmb{gamma}),; phi=phi(pmb{alpha})` respectively, of a CUBE model with covariates for all the three parameters. In details, they have length equal to Y.columns.size+1, W.columns.size+1 and Z.columns.size+1, respectively, to account for an intercept term for each component.

Return type:

tuple of arrays

cubmods.cube_ywz.loglik(m, sample, Y, W, Z, beta, gamma, alpha)#

Log-likelihood function of a CUBE model with covariates.

Compute the log-likelihood function of a CUBE model for ordinal responses, with covariates for explaining all the three parameters.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • alpha (array of float) – array \(\pmb \alpha\) of parameters for the overdispersion, whose length equals Z.columns.size+1 to include an intercept term in the model (first entry)

Returns:

the log-likelihood value

Return type:

float

cubmods.cube_ywz.mle(m, sample, Y, W, Z, df, formula, ass_pars=None, maxiter=1000, tol=0.01)#

Main function for CUBE models with covariates.

Function to estimate and validate a CUBE model with explicative covariates for all the three parameters.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBEYWZ (see the Class for details)

Return type:

object

cubmods.cube_ywz.pmf(m, beta, gamma, alpha, Y, W, Z)#

Average probability distribution of a specified CUB model with covariates for the feeling component.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • alpha (array of float) – array \(\pmb \alpha\) of parameters for the overdispersion, whose length equals Z.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

Returns:

the array of the average probability distribution

Return type:

numpy array

cubmods.cube_ywz.pmfi(m, beta, gamma, alpha, Y, W, Z)#

Probability distribution for each subject of a specified CUBE model with covariates.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • alpha (array of float) – array \(\pmb \alpha\) of parameters for the overdispersion, whose length equals Z.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cube_ywz.varcov(m, sample, beta, gamma, alpha, Y, W, Z)#

Variance-covariance matrix of a CUBE model with covariates.

Compute the variance-covariance matrix of parameter estimates of a CUBE model with covariates for all the three parameters.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • Z (pandas dataframe) – dataframe of covariates for explaining the overdispersion

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • alpha (array of float) – array \(\pmb \alpha\) of parameters for the overdispersion, whose length equals Z.columns.size+1 to include an intercept term in the model (first entry)

Returns:

the variance-covariance matrix

Return type:

ndarray

cubmods.cubsh module#

CUB models in Python. Module for CUBSH (Combination of Uniform and Binomial with Shelter Effect).

Description:#

This module contains methods and classes for CUBSH model family.

Manual, Examples and References:#

List of TODOs:#

  • TODO: fix 3d plots legend

  • TODO: test all def _*(): (optional functions)

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cubsh.CUBresCUBSH(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, confell, debug, test3, ...])

Main function to plot an object of the Class.

plot3d(ax[, ci, magnified])

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipsoid with its projections.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, confell=False, debug=False, test3=True, figsize=(7, 15))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipsoid

  • confell (bool) – DEPRECATED, defaults to False

  • test3 (bool) – DEPRECATED, defaults to True

  • debug (bool) – DEPRECATED, defaults to False

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot3d(ax, ci=0.95, magnified=False)#

Plots the estimated parameter values in the parameter space and the asymptotic confidence ellipsoid with its projections.

Parameters:
  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipsoid

  • magnified (bool) – if False the limits will be the entire parameter space, otherwise let matplotlib choose the limits

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cubsh.cmf(m, sh, pi1, pi2, xi)#

Cumulative probability of a specified CUBSH model, using alternative parametrization \((\pi_1, \pi_2)\).

\(\Pr(R \leq r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the cumulative probability distribution

Return type:

array

cubmods.cubsh.cmf_delta(m, sh, pi, xi, delta)#

Cumulative probability of a specified CUBSH model, using canonic parametrization \((\pi, \delta)\).

\(\Pr(R \leq r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the cumulative probability distribution

Return type:

array

cubmods.cubsh.draw(m, sh, pi, xi, delta, n, df, formula, seed=None)#

Draw a random sample from a specified CUBSH model, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

  • n (int) – number of ordinal responses

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cubsh.draw2(m, sh, pi1, pi2, xi, n, df, formula, seed=None)#

Draw a random sample from a specified CUBSH model, using alternative parametrization \((\pi_1, \pi_2)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

  • n (int) – number of ordinal responses

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cubsh.init_theta(f, m, sh)#

Preliminary estimators for CUBSH models.

Computes preliminary parameter estimates of a CUBSH model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • f (array of int) – array of the absolute frequencies of given ordinal responses

  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

Returns:

a tuple of \((\pi_1^{(0)}, \pi_2^{(0)}, \xi^{(0)})\)

cubmods.cubsh.loglik(m, sh, pi1, pi2, xi, f)#

Log-likelihood of a CUB model with shelter effect

Compute the log-likelihood of a CUB model with a shelter effect for the given absolute frequency distribution.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

  • f (array) – Vector of the absolute frequency distribution

Returns:

the log-likehood value

Return type:

float

cubmods.cubsh.mean_delta(m, sh, pi, xi, delta)#

Expected value of a specified CUBSH model, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the expected value of the model

Return type:

float

cubmods.cubsh.mle(sample, m, sh, df, formula, maxiter=500, tol=0.0001, ass_pars=None)#

Main function for CUB models with a shelter effect

Estimate and validate a CUB model with a shelter effect.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBSH (see the Class for details)

Return type:

object

Raise:

Exception if \(m \leq 4\)

cubmods.cubsh.pi1pi2_to_pidelta(pi1, pi2)#

Compute \((\pi, \delta)\) from \((\pi_1, \pi_2)\)

\(\pi = \dfrac{\pi_1}{\pi_1 + \pi_2}\)

\(\delta = 1 - \pi_1 - \pi_2\)

Parameters:
  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

Returns:

a tuple of \((\pi, \delta)\) the parameters of uncertainty and shelter choice, respectively

Return type:

tuple

cubmods.cubsh.pidelta_to_pi1pi2(pi, delta)#

Compute \((\pi_1, \pi_2)\) from \((\pi, \delta)\)

\(\pi_1 = (1 - \delta) \pi\)

\(\pi_2 = (1 - \delta)(1 - \pi)\)

Parameters:
  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

a tuple of \((\pi_1, \pi_2)\) the mixing coefficient of the shifted Binomial and the Uniform components, respectively

Return type:

tuple

cubmods.cubsh.plot_simplex(pi1pi2list, ax=None, fname=None)#

Plot simplex of parameters of a CUBSH model.

Note

see the reference Iannario, 2012 for details

Warning

this function still needs several fixes

Parameters:
  • pi1pi2list (list) – list of [pi1, pi2] parameters

  • ax – matplotlib axis

  • fname – if provided, save the plot to fname, defaults to None

  • fname – str

cubmods.cubsh.pmf(m, sh, pi1, pi2, xi)#

Probability distribution of a specified CUBSH model, using alternative parametrization \((\pi_1, \pi_2)\).

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the probability distribution

Return type:

array

cubmods.cubsh.pmf_delta(m, sh, pi, xi, delta)#

Probability distribution of a specified CUBSH model, using canonic parametrization \((\pi, \delta)\).

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the probability distribution

Return type:

array

cubmods.cubsh.prob(m, sh, pi1, pi2, xi, r)#

Probability \(\Pr(R = r | \pmb\theta)\) of a CUBSH model without covariates, using alternative parametrization \((\pi_1, \pi_2)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

  • r (int) – ordinal response

Returns:

the probability \(\Pr(R = r | \pmb\theta)\)

Return type:

float

cubmods.cubsh.proba_delta(m, sh, pi, xi, delta, r)#

Probability \(\Pr(R = r | \pmb\theta)\) of a CUBSH model without covariates, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

  • r (int) – ordinal response

Returns:

the probability \(\Pr(R = r | \pmb\theta)\)

Return type:

float

cubmods.cubsh.std_delta(m, pi, xi, delta)#

Standard deviation of a specified CUB model, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the standard deviation of the model

Return type:

float

cubmods.cubsh.var_delta(m, pi, xi, delta)#

Variance of a specified CUBSH model, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • pi (float) – uncertainty parameter \(\pi\)

  • delta (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

Returns:

the variance of the model

Return type:

float

cubmods.cubsh.varcov(m, sh, pi1, pi2, xi, n)#

Variance-covariance matrix for CUB models with shelter effect, using alternative parametrization \((\pi_1, \pi_2)\).

Compute the variance-covariance matrix of parameter estimates of a CUB model with shelter effect.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi1 (float) – Mixing coefficient for the shifted Binomial component of the mixture distribution \(\pi_1\)

  • pi2 (float) – Mixing coefficient for the discrete Uniform component of the mixture distribution \(\pi_2\)

  • xi (float) – feeling parameter \(\xi\)

  • n (int) – number of ordinal responses

Returns:

the variance-covariance matrix

Return type:

numpy ndarray

cubmods.cubsh.varcov_pxd(m, sh, pi, xi, de, n)#

Variance-covariance matrix for CUB models with shelter effect, using canonic parametrization \((\pi, \delta)\).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • pi (float) – uncertainty parameter \(\pi\)

  • de (float) – shelter choice parameter \(\delta\)

  • xi (float) – feeling parameter \(\xi\)

  • n (int) – number of ordinal responses

Returns:

the variance-covariance matrix

Return type:

numpy ndarray

cubmods.cubsh_ywx module#

CUB models in Python. Module for CUBSH (Combination of Uniform and Binomial with Shelter Effect) with covariates.

Description:#

This module contains methods and classes for CUBSH_YWX model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cubsh_ywx.CUBresCUBSHYWX(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cubsh_ywx.Q1(param, dati1, p)#

Auxiliary function for the log-likelihood estimation of GeCUB models.

Define the opposite one of the two scalar functions that are maximized when running the E-M algorithm for GeCUB models with covariates for feeling, uncertainty and shelter effect.

Parameters:
  • param (array) – array of initial estimates of parameters for the uncertainty component

  • dati1 (ndarray or dataframe) – auxiliary matrix

  • p (int) – number of covariates for the uncertainty component

cubmods.cubsh_ywx.Q2(param, dati2, m)#

Auxiliary function for the log-likelihood estimation of GeCUB models.

Define the opposite one of the two scalar functions that are maximized when running the E-M algorithm for GeCUB models with covariates for feeling, uncertainty and shelter effect.

Parameters:
  • param (array) – array of initial estimates of parameters for the feeling component

  • dati2 (ndarray or dataframe) – auxiliary matrix

  • m (int) – number of ordinal categories

cubmods.cubsh_ywx.draw(m, sh, beta, gamma, omega, Y, W, X, df, formula, seed=None)#

Draw a random sample from a specified CUBSH model with covariates (aka GeCUB model).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • n (int) – number of ordinal responses to be drawn

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cubsh_ywx.init_theta(m, sample, p, s, W)#

Preliminary estimators for CUBSH models with covariates.

Computes preliminary parameter estimates of a CUBSH model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • p (int) – number of covariates for the uncertainty component

  • s (int) – number of covariates for the shelter effect

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

Returns:

a tuple of \((\pmb \beta^{(0)}, \pmb \gamma^{(0)}, \pmb \omega^{(0)})\) of preliminary estimates of parameter vectors for \(\pi = \pi(\pmb{\beta})\), ; xi=xi(pmb{gamma}),; delta=delta(pmb{omega})` respectively, of a CUBSH model with covariates for all the three parameters. In details, they have length equal to Y.columns.size+1, W.columns.size+1 and X.columns.size+1, respectively, to account for an intercept term for each component.

Return type:

tuple of arrays

cubmods.cubsh_ywx.loglik(m, sample, sh, Y, W, X, beta, gamma, omega)#

Log-likelihood function of a CUBSH model with covariates.

Compute the log-likelihood function of a CUBE model for ordinal responses, with covariates for explaining all the three parameters (GeCUB model).

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • sample (array of int) – array of ordinal responses

Returns:

the log-likelihood value

Return type:

float

cubmods.cubsh_ywx.mle(m, sample, sh, Y, W, X, df, formula, ass_pars=None, maxiter=500, tol=0.0001)#

Main function for CUBSH models with covariates for all the components

Function to estimate and validate a CUBSH model for given ordinal responses, with covariates for explaining all the components and the shelter effect.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (int) – maximum number of iterations allowed for running the optimization algorithm

  • tol (float) – fixed error tolerance for final estimates

Returns:

an instance of CUBresCUBSHYWZ (see the Class for details)

Return type:

object

cubmods.cubsh_ywx.pmf(m, sh, beta, gamma, omega, Y, W, X)#

Average probability distribution of a specified CUBSH model with covariates (aka GeCUB model).

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the probability distribution

Return type:

array

cubmods.cubsh_ywx.pmfi(m, sh, beta, gamma, omega, Y, W, X)#

Probability distribution for each subject of a specified CUBSH model with covariates (aka GeCUB model).

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cubsh_ywx.prob(m, sample, sh, Y, W, X, beta, gamma, omega)#

Probability distribution of a CUBSH model with covariates.

Compute the probability distribution of a CUBSH model with covariates.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • sample (array of int) – array of ordinal responses

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.cubsh_ywx.varcov(sample, m, sh, Y, W, X, beta, gamma, omega)#

Variance-covariance matrix of a CUBSH model with covariates

Compute the variance-covariance matrix of parameter estimates of a CUBSH model with covariates.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • beta (array of float) – array \(\pmb \beta\) of parameters for the uncertainty component, whose length equals Y.columns.size+1 to include an intercept term in the model (first entry)

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • Y (pandas dataframe) – dataframe of covariates for explaining the uncertainty component

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • sample (array of int) – array of ordinal responses

Returns:

the variance-covariance matrix of the model

Return type:

numpy ndarray

cubmods.cush module#

CUB models in Python. Module for CUSH (Combination of Uniform and Shelter effect).

Description:#

This module contains methods and classes for CUSH model family.

Manual, Examples and References:#

List of TODOs:#

  • TODO: check and fix gini & laakso

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cush.CUBresCUSH(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, figsize])

Main function to plot an object of the Class.

plot_estim([ci, ax, magnified, figsize, saveas])

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

plot_ordinal([figsize, kind, ax, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, figsize=(7, 8))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the standard error

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_estim(ci=0.95, ax=None, magnified=False, figsize=(7, 7), saveas=None)#

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • magnified (bool) – if False the limits will be the entire parameter space, otherwise let matplotlib choose the limits

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 7), kind='bar', ax=None, saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cush.LRT(m, fc, n)#

Likelihood Ratio Test between the CUSH model and the null model.

Parameters:
  • m (int) – number of ordinal categories

  • fc (float) – relative frequency of the shelter category

  • n (int) – number of observations

Returns:

the value of the LRT

Return type:

float

cubmods.cush.draw(m, sh, delta, n, df, formula, seed=None)#

Draw a random sample from a specified CUSH model.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • delta (float) – shelter choice parameter \(\delta\)

  • n (int) – number of ordinal responses

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cush.gini(delta)#

The Gini index of a specified CUSH model.

Parameters:

delta (float) – shelter choice parameter \(\delta\)

Returns:

the Gini index of the model

Return type:

float

cubmods.cush.laakso(m, delta)#

The Laakso index of a specified CUSH model.

Parameters:
  • m (int) – number of ordinal categories

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

the Laakso index of the model

Return type:

float

cubmods.cush.loglik(sample, m, sh, delta)#

Log-likelihood function for a CUSH model without covariates

Compute the log-likelihood function for a CUSH model without covariate for the given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

the log-likehood value

Return type:

float

cubmods.cush.mean(m, sh, delta)#

Expected value of a specified CUSH model.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

the expected value of the model

Return type:

float

cubmods.cush.mle(sample, m, sh, df, formula, ass_pars=None, maxiter=None, tol=None)#

Main function for CUSH model without covariates.

Estimate and validate a CUSH model for given ordinal responses, without covariates.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (None) – default to None; ensure compatibility with gem.from_formula()

  • tol (None) – default to None; ensure compatibility with gem.from_formula()

Returns:

an instance of CUBresCUSH (see the Class for details)

Return type:

object

cubmods.cush.pmf(m, sh, delta)#

Probability distribution of a specified CUSH model.

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

the probability distribution

Return type:

array

cubmods.cush.var(m, sh, delta)#

Variance of a specified CUSH model.

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • delta (float) – shelter choice parameter \(\delta\)

Returns:

the variance of the model

Return type:

float

cubmods.cush2 module#

CUB models in Python. Module for CUSH2 (Combination of Uniform and 2 Shelter Choices).

Description:#

This module contains methods and classes for CUSH2 model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cush2.CUBresCUSH2(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

plot_par_space([figsize, ax, ci, saveas])

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, figsize=(7, 11))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the standard error

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_par_space(figsize=(7, 5), ax=None, ci=0.95, saveas=None)#

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cush2.draw(m, sh1, sh2, df, formula, delta1, delta2, n, seed=None)#

Draw a random sample from a specified CUSH2 model.

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • delta1 (float) – 1st shelter choice parameter \(\delta_1\)

  • delta2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • n (int) – number of ordinal responses

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cush2.loglik(sample, m, c1, c2)#

Log-likelihood function for a CUSH2 model without covariates.

Compute the log-likelihood function for a CUSH2 model without covariate for the given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • c1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • c2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

Returns:

the log-likehood value

Return type:

float

cubmods.cush2.mle(sample, m, c1, c2, df, formula, ass_pars=None, maxiter=None, tol=None)#

Main function for CUSH2 models without covariates.

Estimate and validate a CUSH2 model for ordinal responses, without covariates.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • c1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • c2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (None) – default to None; ensure compatibility with gem.from_formula()

  • tol (None) – default to None; ensure compatibility with gem.from_formula()

Returns:

an instance of CUBresCUSH2 (see the Class for details)

Return type:

object

cubmods.cush2.pmf(m, c1, c2, d1, d2)#

Probability distribution of a specified CUSH2 model.

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • c1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • c2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • d1 (float) – 1st shelter choice parameter \(\delta_1\)

  • d2 (float) – 2nd shelter choice parameter \(\delta_2\)

Returns:

the probability distribution

Return type:

array

cubmods.cush2.varcov(m, n, d1, d2, fc1, fc2)#

Compute the variance-covariance matrix of parameter estimates of a CUSH2 model without covariates.

Parameters:
  • m (int) – number of ordinal categories

  • n (int) – number of ordinal responses

  • d1 (float) – 1st shelter choice parameter \(\delta_1\)

  • d2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • fc1 (float) – relative frequency of 1st shelter choice

  • fc2 (float) – relative frequency of 2nd shelter choice

Returns:

the variance-covariance matrix

Return type:

numpy ndarray

cubmods.cush2_x0 module#

CUB models in Python. Module for CUSH2 (Combination of Uniform and 2 Shelter Choices) with covariates for the 1st shelter choice.

Description:#

This module contains methods and classes for CUSH2 model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cush2_x0.CUBresCUSH2X0(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cush2_x0.draw(m, sh1, sh2, omega1, delta2, X1, df, formula, seed=None)#

Draw a random sample from a specified CUSH2 model, with covariates for the 1st shelter choice only.

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • delta2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cush2_x0.effe(pars, sample, m, sh1, sh2, X1)#

Auxiliary function for the log-likelihood estimation of CUSH2 models.

Compute the opposite of the scalar function that is maximized when running the E-M algorithm for CUSH2 models with covariates for the 1st shelter choice.

Parameters:
  • pars (array) – array of parameters

  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

cubmods.cush2_x0.loglik(sample, m, sh1, sh2, omega1, delta2, X1)#

Log-likelihood function for a CUSH2 model with covariates for the 1st shelter choice only.

Compute the log-likelihood function for a CUSH2 model with covariates for the 1st shelter choice only, for the given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • delta2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

Returns:

the log-likehood value

Return type:

float

cubmods.cush2_x0.mle(sample, m, sh1, sh2, X1, df, formula, ass_pars=None)#

Main function for CUSH2 models with covariates for the 1st shelter choice only.

Estimate and validate a CUSH2 model for given ordinal responses, with covariates for the 1st shelter choice only.

Parameters:
  • sample (array of int) – array of ordinal responses

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

Returns:

an instance of CUBresCUSH2X0 (see the Class for details)

Return type:

object

cubmods.cush2_x0.pmf(m, sh1, sh2, omega1, delta2, X1)#

Average probability distribution of a specified CUSH2 model with covariates for the 1st shelter choice.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • delta2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

Returns:

the average probability distribution

Return type:

array

cubmods.cush2_x0.pmfi(m, sh1, sh2, omega1, delta2, X1)#

Probability distribution for each subject of a specified CUSH2 model with covariates for the first shelter choice only.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • delta2 (float) – 2nd shelter choice parameter \(\delta_2\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cush2_xx module#

CUB models in Python. Module for CUSH2 (Combination of Uniform and 2 Shelter Choices) with covariates.

Description:#

This module contains methods and classes for CUSH2 model family with covariates for both shelter choices.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cush2_xx.CUBresCUSH2XX(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative average frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cush2_xx.draw(m, sh1, sh2, omega1, omega2, X1, X2, df, formula, seed=None)#

Draw a random sample from a specified CUSH2 model, with covariates for both shelter choices.

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • omega2 (array) – array \(\pmb \omega_2\) of parameters for the 2nd shelter effect, whose length equals X2.columns.size+1 to include an intercept term in the model (first entry)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cush2_xx.effe(pars, sample, m, sh1, sh2, X1, X2)#

Auxiliary function for the log-likelihood estimation of CUSH2 models.

Compute the opposite of the scalar function that is maximized when running the E-M algorithm for CUSH2 models with covariates for both shelter choices.

Parameters:
  • pars (array) – array of parameters

  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

cubmods.cush2_xx.loglik(sample, m, sh1, sh2, omega1, omega2, X1, X2)#

Log-likelihood function for a CUSH2 model with covariates for both shelter choices.

Compute the log-likelihood function for a CUSH2 model with covariates for both shelter choices, for the given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • omega2 (array) – array \(\pmb \omega_2\) of parameters for the 2nd shelter effect, whose length equals X2.columns.size+1 to include an intercept term in the model (first entry)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

Returns:

the log-likehood value

Return type:

float

cubmods.cush2_xx.mle(sample, m, sh1, sh2, X1, X2, df, formula, ass_pars=None)#

Main function for CUSH2 models with covariates for both shelter choices.

Estimate and validate a CUSH2 model for given ordinal responses, with covariates for both shelter choices.

Parameters:
  • sample (array of int) – array of ordinal responses

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

Returns:

an instance of CUBresCUSH2XX (see the Class for details)

Return type:

object

cubmods.cush2_xx.pmf(m, sh1, sh2, omega1, omega2, X1, X2)#

Average probability distribution of a specified CUSH2 model with covariates for both shelter choices.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • omega2 (array) – array \(\pmb \omega_2\) of parameters for the 2nd shelter effect, whose length equals X2.columns.size+1 to include an intercept term in the model (first entry)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

Returns:

the average probability distribution

Return type:

array

cubmods.cush2_xx.pmfi(m, sh1, sh2, omega1, omega2, X1, X2)#

Probability distribution for each subject of a specified CUSH2 model with covariates for both shelter choices.

Auxiliary function of .draw().

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh1 (int) – Category corresponding to the 1st shelter choice \([1,m]\)

  • sh2 (int) – Category corresponding to the 2nd shelter choice \([1,m]\)

  • omega1 (array) – array \(\pmb \omega_1\) of parameters for the 1st shelter effect, whose length equals X1.columns.size+1 to include an intercept term in the model (first entry)

  • omega2 (array) – array \(\pmb \omega_2\) of parameters for the 2nd shelter effect, whose length equals X2.columns.size+1 to include an intercept term in the model (first entry)

  • X1 (DataFrame) – dataframe of covariates for explaining the 1st shelter effect

  • X2 (DataFrame) – dataframe of covariates for explaining the 2nd shelter effect

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cush_x module#

CUB models in Python. Module for CUSH (Combination of Uniform and Shelter effect) with covariates.

Description:#

This module contains methods and classes for CUSH model family.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.cush_x.CUBresCUSHX(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots avreage relative frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots avreage relative frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.cush_x.draw(m, sh, omega, X, df, formula, seed=None)#

Draw a random sample from a specified CUSH model with covariates

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.cush_x.effe(pars, esterno, m, sh)#

Auxiliary function for the log-likelihood estimation of CUSH models with covariates

Compute the opposite of the loglikelihood function for CUSH models with covariates to explain the shelter effect. It is called as an argument for “optim” within .mle() function as the function to minimize.

Parameters:
  • pars (array) – array of the initial parameters estimates

  • esterno (ndarray) – matrix binding together the vector of ordinal data and the matrix XX of explanatory variables whose first column is a column of ones needed to consider an intercept term

  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

cubmods.cush_x.loglik(m, sample, X, omega, sh)#

Log-likelihood function for CUSH models with covariates.

Compute the log-likelihood function for CUSH models with covariates to explain the shelter effect.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the log-likelihood value

Return type:

float

cubmods.cush_x.mle(m, sample, X, sh, df, formula, ass_pars=None, maxiter=None, tol=None)#

Main function for CUSH models with covariates.

Estimate and validate a CUSH model for ordinal responses, with covariates to explain the shelter effect.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • maxiter (None) – default to None; ensure compatibility with gem.from_formula()

  • tol (None) – default to None; ensure compatibility with gem.from_formula()

Returns:

an instance of CUBresCUSHX (see the Class for details)

Return type:

object

cubmods.cush_x.pmf(m, sh, omega, X)#

Average probability distribution of a specified CUSH model with covariates.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the probability distribution

Return type:

array

cubmods.cush_x.pmfi(m, sh, omega, X)#

Probability distribution for each subject of a specified CUSH model with covariates

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.cush_x.prob(m, sample, X, omega, sh)#

Probability distribution of a specified CUSH model with covariates.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\;i = 1 \ldots n\)

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • sh (int) – Category corresponding to the shelter choice \([1,m]\)

  • omega (array) – array \(\pmb \omega\) of parameters for the shelter effect, whose length equals X.columns.size+1 to include an intercept term in the model (first entry)

  • X (pandas dataframe) – dataframe of covariates for explaining the shelter effect

Returns:

the probability array \(\Pr(R = r | \pmb\theta)\) for observed responses

Return type:

float

cubmods.gem module#

CUB models in Python. Module for GEM (Generalized Mixtures).

Description:#

This module contains methods and classes for GEM maximum likelihood estimation and sample drawing.

Manual, Examples and References:#

List of TODOs:#

  • TODO: implement best shelter search

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

cubmods.gem.draw(formula, df=None, m=7, model='cub', n=500, sh=None, seed=None, **params)#

Main function to draw a sample from GEneralized Mixture models.

Parameters:
  • formula (str) – a formula used to draw the sample, see Manual for details

  • df (DataFrame) – the DataFrame with covariates (if any)

  • m (int) – number of ordinal categories

  • model (str) – the model family; default to "cub"; options "cube" and "cush"

  • sh (int) – category corresponding to the shelter choice \([1,m]\)

  • n (int) – number of ordinal responses; it is only effective if the model is without covariates

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • options (dict) – a dictionary of extra options maxiter and tol; see the reference guide for details

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

Return type:

obj

cubmods.gem.estimate(formula, df, m=None, model='cub', sh=None, ass_pars=None, options={})#

Main function to estimate and validate GEneralized Mixture models.

Parameters:
  • formula (str) – a formula used to estimate the model’s parameters, see Manual for details

  • df (DataFrame) – the DataFrame with observed ordinal sample and covariates (if any)

  • m (int) – number of ordinal categories

  • model (str) – the model family; default to "cub"; options "cube" and "cush"

  • sh (int) – category corresponding to the shelter choice \([1,m]\)

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

  • options (dict) – a dictionary of extra options maxiter and tol; see the reference guide for details

Returns:

an instance of the Base Class CUBres extended by the family module; see each module for details

Return type:

obj

cubmods.general module#

CUB models in Python. Module for General functions.

Description:#

This module contains methods and classes for general functions.

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

exception cubmods.general.InvalidCategoriesError(m, model)#

Bases: Exception

Exception: if m is not suitable for model.

exception cubmods.general.InvalidSampleSizeError(n)#

Bases: Exception

Exception: if the sample size is not strictly greater than zero.

exception cubmods.general.NoShelterError(model)#

Bases: Exception

Exception: if a shelter choice is needed but it hasn’t been provided.

exception cubmods.general.NotImplementedModelError(model, formula)#

Bases: Exception

Exception: if the requested model is known but not yet implemented.

exception cubmods.general.ParameterOutOfBoundsError(param, value)#

Bases: Exception

Exception: if the provided parameter value is out of bounds.

exception cubmods.general.ShelterGreaterThanM(m, sh)#

Bases: Exception

Exception: if the provided shelter choice is greater than \(m\).

exception cubmods.general.UnknownModelError(model)#

Bases: Exception

Exception: if the requested family is unknown.

cubmods.general.addones(A)#

Expand with a unitary vector in the first column of the given matrix to consider also an intercept term for CUB models with covariates.

Parameters:

A – a matrix to be expanded

Returns:

the expanded matrix

Return type:

same of A

cubmods.general.aic(l, p)#

Akaike Information Criterion.

Parameters:
  • l (float) – log-likelihood

  • p (int) – number of parameters

Returns:

the AIC value

Return type:

float

cubmods.general.bic(l, p, n)#

Bayesian Information Criterion.

Parameters:
  • l (float) – log-likelihood

  • p (int) – number of parameters

  • n (int) – number of observations

Returns:

the BIC value

Return type:

float

cubmods.general.bitgamma(sample, m, W, gamma)#

Shifted Binomial distribution with covariates.

Return the shifted Binomial probabilities of ordinal responses where the feeling component is explained by covariates via a logistic link.

Parameters:
  • sample (array) – array of ordinal responses

  • m (int) – number of ordinal categories

  • W (pandas dataframe) – dataframe of covariates for explaining the feeling component

  • gamma (array of float) – array \(\pmb \gamma\) of parameters for the feeling component, whose length equals W.columns.size+1 to include an intercept term in the model (first entry)

Returns:

an array of the same length as sample, where each entry is the shifted Binomial probability for the corresponding observation and feeling value.

Return type:

array

cubmods.general.bitxi(m, sample, xi)#

Shifted Binomial probabilities of ordinal responses

Compute the shifted Binomial probabilities of ordinal responses.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array) – array of ordinal responses

  • xi (float) – feeling parameter \(\xi\)

Returns:

A vector of the same length as sample, where each entry is the shifted Binomial probability of the corresponding observation.

Return type:

array

cubmods.general.choices(m)#

Array of ordinal categories.

Parameters:

m (int) – number of ordinal categories

Returns:

array of int from 1 to m

Return type:

array

cubmods.general.colsof(A)#

Number of columns of the given matrix or dataframe.

Parameters:

A (ndarray, dataframe) – the matrix or dataframe

Returns:

number of columns

Return type:

int

cubmods.general.conf_border(Sigma, mx, my, ax, conf=0.95, plane='z', xyz0=(0, 0, 0))#

Plot the bivariate projection of a trivariate confidence ellipse on a plane.

Auxiliary function of plot_ellipsoid().

Parameters:
  • Sigma (ndarray) – bivariate variance-covariance matrix

  • mx (float) – center of the ellipse on the \(x\) axies

  • my (float) – center of the ellipse on the \(y\) axies

  • ax – matpplotlib axis

  • conf (float) – confidence level of the trivariate ellipsoid.

  • plane (str) – plane for the projection; could be x, y or z

  • xyz0 (tuple) – tuple of the bivariate ellipse position

cubmods.general.conf_ell(vcov, mux, muy, ci, ax, color='b', label=True, alpha=0.25)#

Plot bivariate confidence ellipse of estimated parameters at level ci\(=(1 - \alpha/2)\)

Parameters:
  • vcov (ndarray) – Variance-covariance matrix \(2 \times 2\)

  • mux (float) – estimate of first parameter

  • muy (float) – estimate of second parameter

  • ci (float) – confidence level \(=(1 - \alpha/2)\)

  • ax – matplotlib axis

  • color (str) – color of confidence ellipse

  • label (bool) – whether to add a label of confidence level

  • alpha (float) – transparency of confidence ellipse

cubmods.general.dissimilarity(p_obs, p_est)#

Normalized dissimilarity measure.

Compute the normalized dissimilarity measure between observed relative frequencies and estimated (theoretical) probabilities of a discrete distribution.

Parameters:
  • p_obs (array) – Vector of observed relative frequencies

  • p_est (array) – Vector of estimated (theoretical) probabilities

Returns:

Numeric value of the dissimilarity index, assessing the distance to a perfect fit.

Return type:

float

cubmods.general.dummies2(df, DD)#

Create dummy variables from polychotomous variables.

Auxiliary function of cubmods.gem.from_formula(). A dummy variable is created for all polychotomous variables named C(<varname>).

Parameters:
  • df (DataFrame) – a DataFrame with all the covariates and the ordinal response

  • DD (list) – the list of all covariates for each component

Returns:

a tuple of the DataFrame with the dummy variables and the column names

Return type:

tuple

cubmods.general.equal3d(ax)#

Equalize 3d axes.

Auxiliary function of .plot_ellipsoid().

cubmods.general.expit(x)#

Expit function.

It is the inverse of logit. Aka sigmoid or standard logistic.

Parameters:

x (float) – the argument

Returns:

the expit of x

Return type:

float

cubmods.general.formula_parser(formula, model='cub')#

Parse a CUB class formula.

Auxiliary function of cubmods.gem functions.

TODO: add specific Exceptions for formula

Parameters:
  • formula (str) – the formula to be parsed

  • model (str) – the model family

Returns:

a tuple of the ordinal response column name and a list of all covariates’ column names for each component

Return type:

tuple

cubmods.general.freq(sample, m, dataframe=False)#

Absolute frequecies of an observed sample of ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • dataframe (bool) – if True return a DataFrame instead of an array, defaults to False

Returns:

the absolute frequencies of the observed sample

Return type:

array or dataframe

cubmods.general.get_cov_ellipsoid(cov, mu=array([0., 0., 0.]), ci=0.95)#

Return the 3d points representing the covariance matrix cov centred at mu, at confidence level ci\(=(1 - \alpha/2)\).

Auxiliary function of .plot_ellipsoid().

Parameters:
  • cov (ndarray) – Variance-covariance matrix \(3 \times 3\)

  • mu (array) – ellispoid center \((x_0, y_0, z_0)\)

  • ci (float) – confidence level \(=(1 - \alpha/2)\)

Returns:

a tuple of 3d points (X, Y, Z)

Return type:

tuple

cubmods.general.get_minor(A, i, j)#

Get a minor of a matrix.

Auxiliary function of .plot_ellipsoid().

Note

Solution by PaulDong

Parameters:
  • A (ndarray) – a generic matrix

  • i (int) – row of the minor

  • j (int) – column of the minor

Returns:

the minor of A

Return type:

ndarray

cubmods.general.hadprod(Amat, xvett)#

Hadamard product of a matrix with a vector

Return the Hadamard product between the given matrix and vector: this operation corresponds to multiply every row of the matrix by the corresponding element of the vector, and it is equivalent to the standard matrix multiplication to the right with the diagonal matrix whose diagonal is the given vector. It is possible only if the length of the vector equals the number of rows of the matrix. It is an auxiliary function needed for computing the variance-covariance matrix of the estimated model with covariates.

Note

if xvett is a row vector, reshapes it to column vector

Parameters:
  • Amat (ndarray) – A generic matrix

  • xvett (array) – A generic vector

Returns:

the Hadamard product \(\pmb A \odot \pmb x\)

Return type:

ndarray

cubmods.general.kkk(sample, m)#

Sequence of combinatorial coefficients

Compute the sequence of binomial coefficients \(\binom{m-1}{r-1}\), for \(r= 1, \ldots m\), and then returns a vector of the same length as ordinal, whose i-th component is the corresponding binomial coefficient \(\binom{m-1}{r_i-1}\)

Parameters:
  • sample (array) – array of ordinal responses

  • m (int) – number of ordinal categories

Returns:

an array of \(\binom{m-1}{r_i-1}\)

Return type:

array

cubmods.general.load_object(fname)#

Load a saved object from file.

It can used be used to load a CUBsample or a CUBres object, previously saved on a file.

Note

see the Classes for details about these objects

Parameters:

fname (str) – filename

Returns:

the loaded object, instance of CUBsample or CUBres

Return type:

object

cubmods.general.logis(Y, param)#

The logistic transform.

Create a matrix YY binding array Y with a vector of ones, placed as the first column of YY. It applies the logistic transform componentwise to the standard matrix multiplication between YY and param.

Parameters:
  • Y (ndarray, dataframe) – A generic matrix or a dataframe

  • param (array) – Vector of coefficients, whose length is Y.columns.size+1 (to consider also an intercept term)

Returns:

a vector whose length is Y.index.size and whose i-th component is the logistic function

cubmods.general.logit(x)#

Logit function.

It is the inverse of the standard logistic function, aka log-odds.

Parameters:

x (float) – the argument

Returns:

the logit of x

Return type:

float

cubmods.general.lsat(f, n)#

Log-likelihood of saturated model.

Saturated level ,that is the theoretically maximum information that can be obtained by a model using as many parameters as possible. Then, the saturated log-likelihood is computed by assuming that the model is specified by as many parameters as available observations. This is the extreme benchmark for comparing previous log-likelihood quantities.

Parameters:
  • f (array) – absolute frequencies of observed ordinal responses

  • n (int) – number of observations

Returns:

log-likelihood of saturated model

Return type:

float

cubmods.general.luni(m, n)#

Log-likelihood of null model.

Null level, that is when no structure is searched for. Specifically, this is equivalent to assume a discrete Uniform over the support so that any category has the same probability.

Parameters:
  • m (int) – number of ordinal categories

  • n (int) – number of observations

Returns:

the log-likelihood of null model

Return type:

float

cubmods.general.plot_ellipsoid(V, E, ax, zlabel, ci=0.95, magnified=False)#

Plot a trivariate confidence ellipsoid.

Parameters:
  • V (ndarray) – Variance-covariance matrix

  • E (array) – Vector of estimated parameters

  • ax – matplotlib axis

  • zlabel (str) – label for \(z\) axis

  • ci (float) – confidence level \((1 - \alpha/2)\)

  • magnified (bool) – if False plots in the full parameter space

cubmods.general.probbit(m, xi)#

Probability distribution of shifted binomial random variable.

Parameters:
  • m (int) – number of ordinal categories

  • xi (float) – feeling parameter \(\xi\)

Returns:

the vector of the probability distribution of a shifted Binomial model.

Return type:

array

cubmods.general.unique(l)#

Unique elements in a 3-dimensional list.

Auxiliary function of .dummies2().

Parameters:

l (list) – the list to analyze

Returns:

the list of unique elements

Return type:

list

cubmods.ihg module#

CUB models in Python. Module for IHG (Inverse HyperGeometric).

Description:#

This module contains methods and classes for IHG model family without covariates.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.ihg.CUBresIHG(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Object returned by .mle() function. See here the Base for details.

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([ci, saveas, figsize])

Main function to plot an object of the Class.

plot_estim([ci, ax, magnified])

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

plot_ordinal([figsize, ax, kind, saveas])

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(ci=0.95, saveas=None, figsize=(7, 8))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • ci (float) – level \((1-\alpha/2)\) for the standard error

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_estim(ci=0.95, ax=None, magnified=False)#

Plots the estimated parameter values in the parameter space and the asymptotic standard error.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • magnified (bool) – if False the limits will be the entire parameter space, otherwise let matplotlib choose the limits

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots relative frequencies of observed sample, estimated probability distribution and, if provided, probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.ihg.draw(m, theta, n, df, formula, seed=None)#

Draw a random sample from a specified IHG model.

Parameters:
  • m (int) – number of ordinal categories

  • theta (float) – parameter \(\theta\) (probability of 1st shelter category)

  • n (int) – number of ordinal responses to be drawn

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.ihg.effe(theta, m, f)#

Compute the negative log-likelihood function of a IHG model without covariates for a given absolute frequency distribution. Auxiliary function of mle() for optimization algorithm.

Parameters:
  • theta (float) – parameter \(\theta\) (probability of 1st shelter category)

  • m (int) – number of ordinal categories

  • f (array of int) – array of absolute frequency distribution

Returns:

the log-likelihood value

Return type:

float

cubmods.ihg.init_theta(m, f)#

Preliminary estimators for IHG models without covariates.

Computes preliminary parameter estimates of a IHG model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • f (array of int) – array of the absolute frequencies of given ordinal responses

  • m (int) – number of ordinal categories

Returns:

the value of \(\theta^{(0)}\)

cubmods.ihg.loglik(m, theta, f)#

Compute the log-likelihood function of a IHG model without covariates for a given absolute frequency distribution.

Parameters:
  • theta (float) – parameter :math:` heta` (probability of 1st shelter category)

  • m (int) – number of ordinal categories

  • f (array of int) – array of absolute frequency distribution

Returns:

the log-likelihood value

Return type:

float

cubmods.ihg.mle(m, sample, df, formula, ass_pars=None)#

Main function for CUB models without covariates.

Function to estimate and validate a CUB model without covariates for given ordinal responses.

Parameters:
  • sample (array of int) – array of ordinal responses

  • m (int) – number of ordinal categories

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

Returns:

an instance of CUBresIHG (see the Class for details)

Return type:

object

cubmods.ihg.pmf(m, theta)#

Probability distribution of a specified IHG model without covariates.

\(\Pr(R = r | \pmb\theta),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • theta (float) – parameter \(\theta\) (probability of 1st shelter category)

Returns:

the vector of the probability distribution of a CUB model.

Return type:

numpy array

cubmods.ihg.var(m, theta)#

Variance of a specified IHG model.

Parameters:
  • m (int) – number of ordinal categories

  • theta (float) – parameter \(\theta\) (probability of 1st shelter category)

Returns:

the variance of the model

Return type:

float

cubmods.ihg_v module#

CUB models in Python. Module for IHG (Inverse HyperGeometric) with covariates.

Description:#

This module contains methods and classes for IHG model family with covariates.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.ihg_v.CUBresIHGV(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: CUBres

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

plot([saveas, figsize])

Main function to plot an object of the Class.

plot_ordinal([figsize, ax, kind, saveas])

Plots avreage relative frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

plot(saveas=None, figsize=(7, 5))#

Main function to plot an object of the Class.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

plot_ordinal(figsize=(7, 5), ax=None, kind='bar', saveas=None)#

Plots avreage relative frequencies of observed sample, estimated average probability distribution and, if provided, average probability distribution of a known model.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

cubmods.ihg_v.draw(m, nu, V, df, formula, seed=None)#

Draw a random sample from a specified IHG model with covariates

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • nu (array) – array \(\pmb \nu\) of parameters for \(\theta\), whose length equals V.columns.size+1 to include an intercept term in the model (first entry)

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • seed (int, optional) – the seed to ensure reproducibility, defaults to None

Returns:

an instance of CUBsample (see here) containing ordinal responses drawn from the specified model

cubmods.ihg_v.effe(nu, m, sample, V)#

Auxiliary function for the log-likelihood estimation of IHG models with covariates

Compute the opposite of the loglikelihood function for IHG models with covariates. It is called as an argument for “optim” within .mle() function as the function to minimize.

Parameters:
  • nu (float) – initial parameter estimate

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

cubmods.ihg_v.init_theta(m, f)#

Preliminary estimators for IHG models without covariates.

Computes preliminary parameter estimates of a IHG model without covariates for given ordinal responses. These preliminary estimators are used within the package code to start the E-M algorithm.

Parameters:
  • f (array of int) – array of the absolute frequencies of given ordinal responses

  • m (int) – number of ordinal categories

Returns:

the array of \(\pmb\nu^{(0)}\)

cubmods.ihg_v.loglik(m, sample, V, nu)#

Log-likelihood function for IHG models with covariates.

Compute the log-likelihood function for CUSH models with covariates to explain the shelter effect.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • nu (array) – array \(\pmb \nu\) of parameters for \(\theta\), whose length equals V.columns.size+1 to include an intercept term in the model (first entry)

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

Returns:

the log-likelihood value

Return type:

float

cubmods.ihg_v.mle(m, sample, V, df, formula, ass_pars=None)#

Main function for IHG models with covariates.

Estimate and validate a IHG model for ordinal responses, with covariates.

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

  • df (DataFrame) – original DataFrame

  • formula (str) – the formula used

  • ass_pars (dictionary, optional) – dictionary of hypothesized parameters, defaults to None

Returns:

an instance of CUBresIHGV (see the Class for details)

Return type:

object

cubmods.ihg_v.pmf(m, V, nu)#

Average probability distribution of a specified IHG model with covariates.

\(\frac{1}{n} \sum_{i=1}^n \Pr(R_i=r|\pmb\theta; \pmb T_i),\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • nu (array) – array \(\pmb \nu\) of parameters for \(\theta\), whose length equals V.columns.size+1 to include an intercept term in the model (first entry)

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

Returns:

the probability distribution

Return type:

array

cubmods.ihg_v.pmfi(m, V, nu)#

Probability distribution for each subject of a specified IHG model with covariates

\(\Pr(R_i=r|\pmb\theta; \pmb T_i),\; i=1 \ldots n ,\; r=1 \ldots m\)

Parameters:
  • m (int) – number of ordinal categories

  • nu (array) – array \(\pmb \nu\) of parameters for \(\theta\), whose length equals V.columns.size+1 to include an intercept term in the model (first entry)

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

Returns:

the matrix of the probability distribution of dimension \(n \times r\)

Return type:

numpy ndarray

cubmods.ihg_v.prob(m, sample, V, nu)#

Probability distribution of a IHG model with covariates given an observed sample.

Compute the probability distribution of a IHG model with covariates, given an observed sample.

\(\Pr(R_i=r_i|\pmb\theta;\pmb T_i),\; i=1 \ldots n\)

Parameters:
  • m (int) – number of ordinal categories

  • sample (array of int) – array of ordinal responses

  • nu (array) – array \(\pmb \nu\) of parameters for \(\theta\), whose length equals V.columns.size+1 to include an intercept term in the model (first entry)

  • V (pandas dataframe) – dataframe of covariates for explaining the parameter \(\theta\)

Returns:

the array of the probability distribution.

Return type:

numpy array

cubmods.multicub module#

CUB models in Python. Module for MULTICUB and MULTICUBE.

Description:#

This module contains methods and classes for MULTICUB and MULTICUBE tool.

Manual, Examples and References:#

List of TODOs:#

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

cubmods.multicub.multi(ords, ms=None, model='cub', title=None, labels=None, shs=None, plot=True, print_res=False, pos=None, xlim=(0, 1), ylim=(0, 1), equal=True, confell=True, alpha=0.2, ci=0.95, figsize=(7, 7), ax=None)#

Joint plot of estimated CUB models in the parameter space

Return a plot of estimated CUB models represented as points in the parameter space.

Parameters:
  • ords (list) – list of arrays of observed ordinal responses

  • model (str) – model; defaults to cub; options cube

  • title (str) – title of the plot

  • labels (list) – labels of the points

  • shs (int or list) – shelter effect(s); can be an int if the same shelter effect is valid for all samples or a list to specify different shelter choices

  • plot (bool) – if True (default) plot the results;

  • print_res (bool) – if True print the results; defaults to False

  • pos (list) – position of the \(\delta\) or \(\phi\) estimated values

  • xlim (tuple) – x-axis limits

  • ylim (tuple) – y-axis limits

  • equal (bool) – if the plot must have equal aspect; defaults to True

  • alpha (float) – confidence ellipse transparency

  • confell (bool) – if True (default) plot confidence ellipse (for CUB model only)

  • ci (float) – level \((1-\alpha/2)\) for the confidence ellipse

  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

Returns:

ax

cubmods.multicub.pos_kwargs(pos)#

Position of the \(\delta\) or \(\phi\) estimated values

    1
  8   2
7   @   3
  6   4
    5
Parameters:

pos (int) – position (1..8)

Returns:

a dictionary for matplotlib

Return type:

dict

cubmods.smry module#

CUB models in Python. Module for summary tools.

Description:#

This module contains methods and classes for summary tools.

List of TODOs:#

  • TODO: risultati inferenziali come DataFrame nel Manuale e negli esempi

  • TODO: bounds opzionali in CUBE mle (anche CUBSH?)

  • TODO: 2 decimali nei 3d plot?

  • TODO: dissim in multicub plot (aggiungere opzione)

  • TODO: grandezza punti phi in multicube

Credits#

Author:

Massimo Pierini

Date:

2023-24

Credits:

Domenico Piccolo, Rosaria Simone

Contacts:

cub@maxpierini.it

Classes and Functions#

class cubmods.smry.CUBres(model, df, formula, m, n, sample, f, theoric, diss, est_names, estimates, e_types, varmat, stderrs, pval, wald, loglike, muloglik, loglikuni, AIC, BIC, seconds, time_exe, logliksat=None, dev=None, logliksatcov=None, niter=None, maxiter=None, tol=None, sh=None, rho=None, ass_pars=None)#

Bases: object

Default Class for MLE results; each model module extends this Class to an ad hoc Class with specific functions. An instance of the extended Class is returned by .mle() functions of model modules.

Variables:
  • model – the model family

  • df – the original DataFrame with observed sample and covariates (if any)

  • formula – the formula used to fit the data

  • m – number of ordinal categories

  • n – number of observed ordinal responses

  • sample – the observed sample of ordinal resposes

  • f – absolute frequecies of the sample

  • theoric – estimated probabilty distribution

  • diss – dissimilarity index

  • est_names – name of estimated parameters

  • estimates – values of estimated parameters

  • e_types – parameters’ component

  • varmat – variance-covariance matrix of estimated parameters

  • srtderrs – standard errors of estimated parameters

  • pval – p-values of estimated parameters

  • wald – Wald test statistics of estimated parameters

  • loglike – log-likelihood value

  • muloglik – average log-likelihood for each observation

  • loglikuni – log-likelihood of null model

  • AIC – Akaike Information Criterion

  • BIC – Bayesian Information Criterino

  • seconds – execution time of the algorithm

  • time_exe – when the algorithm has been executed

  • logliksat – log-likelihood of saturated model (for models without covariates only)

  • logliksatcovdeprecated

  • dev – deviance

  • niter – number of iterations of the EM algorithm

  • maxiter – maximum number of iterations of the EM algorithm

  • tol – fixed error tolerance

  • sh – shelter choice(s), if any

  • rho – coefficient of correlation between \(\pi\) and \(\xi\)

  • ass_pars – parameters of known model to be compared with the estimates

Methods

as_dataframe()

DataFrame of estimated parameters

as_txt()

Print the summary.

save(fname)

Save a CUBresult object to file named fname + .cub.fit

summary()

Call as_txt()

as_dataframe()#

DataFrame of estimated parameters

as_txt()#

Print the summary. Auxiliary function of summary().

save(fname)#

Save a CUBresult object to file named fname + .cub.fit

summary()#

Call as_txt()

class cubmods.smry.CUBsample(rv, m, pars, model, df, formula, diss, theoric, par_names, p_types, sh=None, seed=None)#

Bases: object

An instance of this Class is returned by .draw() functions. See the corresponding model’s function for details.

Variables:
  • rv – array of drawn ordinal responses

  • m – number of ordinal categories

  • n – number of drawn responses

  • p – number of model’s parameters

  • pars – parameters’ values array

  • model – the model family

  • df – original DataFrame (if provided) with a column of the drawn sample

  • formula – the formula used to draw the sample

  • diss – dissimilarity index between drawn and theoretical distribution

  • theoric – theoretical distribution

  • par_names – names of the parameters

  • p_types – parameters’ component

  • sh – shelter choice(s), if any

  • seed – the seed used to ensure reproducibility

Methods

as_dataframe()

The parameters' values specified.

plot([figsize, kind, ax, saveas])

Basic plot function.

save(fname)

Save a CUBsample object to file named fname + cub.sample

summary()

Print the summary of the drawn sample.

as_dataframe()#

The parameters’ values specified.

Returns:

a DataFrame with parameters’ names and values

Return type:

DataFrame

plot(figsize=(7, 5), kind='bar', ax=None, saveas=None)#

Basic plot function.

Parameters:
  • figsize (tuple of float) – tuple of (length, height) for the figure (useful only if ax is not None)

  • kind (str) – choose a barplot ('bar' default) of a scatterplot ('scatter')

  • ax (matplolib ax, optional) – matplotlib axis, if None a new figure will be created, defaults to None

  • saveas (str) – if provided, name of the file to save the plot

Returns:

ax or a tuple (fig, ax)

save(fname)#

Save a CUBsample object to file named fname + cub.sample

summary()#

Print the summary of the drawn sample.

Module contents#