Manual#

The package cubmods can be used to build models within the CUB class given an observed sample (and, eventually, the covariance matrix) in order to estimate the parameters. Also, for each family, random samples can be drawn from a specified model.

Currently, six families have been defined and implemented:

  • CUB (Combination of Uniform and Binomial)

  • CUBSH (CUB + a SHelter choice)

  • CUSH (Combination of Uniform and a SHelter choice)

  • CUSH2 (Combination of Uniform and 2 SHelter choices)

  • CUBE (Combination of Uniform and BEta-binomial)

  • IHG (Inverse HyperGeometric)

For each family, a model can be defined with or without covariates for one or more parameters.

Details about each family and examples are provided in the following sections.

Even if each family has got its own Maximum Likelihood Estimation function mle() that could be called directly, for example cub.mle(), the function gem.estimate() provides a simplified and generalised procedure for MLE.

Similarly, even if each family has got its own Random Sample Drawing function draw() that could be called directly, for example cub.draw(), the function gem.draw() provides a simplified and generalised procedure to draw a random sample.

In this manual gem functions will be used for the examples.

Notice that, the Dissimilarity index is computed for models with covariates also: it should be interpreted as the fraction of the sample to be changed to achive a perfect fit to the estimated average probability distribution (see Introduction Notes).

The last section, shows the basic usage for the tool multicub.

GeM usage#

GeM (Generalized Mixture) is the main module of cubmods package, which provides simplified and generalized functions to both estimate a model from an observed sample and draw a random sample from a specified model.

The function gem.estimate() is the main function for the estimation and validation of a model from an observed sample, calling for the corresponding .mle() function of the specified family module, with or without covariates.

The function gem.draw() is the main function for drawing a random sample from a specified model, calling for the corresponding .draw() function of the corresponding family module, with or without covariates.

Reference guide

The formula syntax#

Both functions need a formula that is a string specifying the name of the ordinal variable (before the tilde ~ symbol) and of the covariates of the components (after the tilde symbol ~). Covariates for each component are separated by the pipeline symbol |. The zero symbol 0 indicates no covariates for a certain component. The one symbol 1 indicates that we want to estimate the parameter of the constant term only. If more covariates explain a single component, the symbol + concatenates the names. Qualitative variables names, must be placed between brackets () leaded by a C, for example C(varname).

Warning

No columns in the DataFrame should be named constant, 1 or 0. In the column names, only letters, numbers, and underscores _ are allowed. Spaces SHOULD NOT BE used in the column names, but discarded (i.e. replaced di "") or replaced with _. Similarly, any other symbol or special character (for instance, -, !, @, etc) SHOULD BE avoided and discarded or replaced with underscore _.

For example, let’s suppose we have a DataFrame where response is the ordinal variable, age and sex are respectively a quantitative and a qualitative variable to explain the feeling component only, in a cub family model. The formula will be formula = "response ~ 0 | age + C(sex)".

Note

Python will automatically order qualitative variables in alphanumeric order. So, for instance, a variable sex with two categories "M" and "F" will be ordered as ["F", "M"] thus the dummy variabile will be equal to 0 where sex=="F" and equal to 1 where otherwise sex=="M". Consequently, the estimated parameters will be the constant for sex=="F" and C.sex_M for sex=="M". If you want a different order for the categorical variables, you must specify it in the DataFrame, for instance with the pandas class Categorical. In the example:

Script#
1df["sex"] = pd.Categorical(
2    df["sex"],
3    categories=["M", "F"],
4    ordered=True
5)

Notice that spaces are allowed between symbols and variable names in the formula but they aren’t needed: a formula "ord ~ X | Y1 + Y2 | Z" is the same as "ord~X|Y1+Y2|Z".

Warning

The number of fields separated by the pipeline | in a formula MUST BE equal to the number of parameters specifying the model family. Therefore: two for cub and cush2, three for cube and cub with shelter effect, one for cush and ihg.

Arguments of estimate and draw#

Within the function estimate the number of ordinal categories m is internally retrieved if not specified (taking the maximum observed category) but it is advisable to pass it as an argument to the call if some category has zero frequency. Within the function draw instead, the number of ordinal categories m will default to 7 if not otherwise specified.

A pandas DataFrame must always be passed to the function estimate, with the kwarg (keyword argument) df. It should contain, at least, a column of the observed sample and the columns of the covariates (if any). If no df is passed to the function draw for a model without covariates instead, an empty DataFrame will be created.

The number n of ordinal responses to be drawn will default to 500 if not otherwise specified in the function draw for models without covariates. For model with covariates instead, n is not effective because the number of drawn ordinal responses will be equal to the passed DataFrame rows.

A seed could be specified for the function draw to ensure reproducibility. Notice that, for models with covariates, seed cannot be 0 (in case, it will be automatically set to 1).

If no model is declared, the function takes "cub" as default. Currently implemented models are: "cub" (default), "cush", "cube", and "ihg". CUB models with shelter effect are automatically implemented using model="cub" and specifying a shelter choice with the kwarg sh. CUSH2 models are automatically implemented using model="cush" and passing a list of two categories to the kwarg sh instead of an integer, for instance sh=[2, 7].

To the draw method, the parameters’ values (with the kwargs of the corresponding family) must always be passed: for example, pi and xi for CUB models without covariates, beta and gamma for CUB models with covariates for both feeling and uncertainty, etc. See the .draw() function reference of the corresponding family module for details.

If model="cub" (or nothing), then a CUB mixture model is fitted to the data to explain uncertainty, feeling (ordinal~Y|W) and possible shelter effect by further passing the extra argument sh for the corresponding category. Subjects’ covariates can be included by specifying covariates matrices in the formula as ordinal~Y|W|X, to explain uncertainty (Y), feeling (W) or shelter (X). Notice that covariates for the shelter effect can be included only if specified for both feeling and uncertainty too (GeCUB models) because, as in the R package CUB, only the models without covariates and with covariates for all components have been implemented. Nevertheless, the symbol 1 could be used to specify a different combination of components with covariates. For example, if we want to specify a CUB model with the covariate cov for uncertainty only, we could pass the formula ordinal ~ cov | 1 | 1: in this case, for feeling and shelter effect, the constant terms only (\(\gamma_0\) and \(\omega_0\)) will be estimated and the values of the estimated \(\xi\) and \(\delta\) could be computed as \(\hat\xi=\mathrm{expit}(\hat\gamma_0)\) and \(\hat\delta=\mathrm{expit}(\hat\omega_0)\), where \(\mathrm{expit}(x) = 1 / (1 + \exp(-x))\). See this example for the GeCUB model.

If family="cube", then a CUBE mixture model (Combination of Uniform and Beta-Binomial) is fitted to the data to explain uncertainty, feeling and overdispersion. Subjects’ covariates can also be included to explain the feeling component or all the three components by specifying covariates matrices in the Formula as ordinal~Y|W|Z to explain uncertainty (Y), feeling (W) or overdispersion (Z). For different combinations of components with covariates, the symbol 1 can be used. Notice that \(\hat\phi=e^{\hat\alpha_0}\).

If family="ihg", then an IHG model is fitted to the data. IHG models (Inverse HyperGeometric) are a peculiar case of CUBE models, for \(\phi = 1 - \xi\) (Iannario, 2012). The parameter \(\theta\) gives the probability of observing the first category and is therefore a direct measure of preference, attraction, pleasantness toward the investigated item. This is the reason why \(\theta\) is customarily referred to as the preference parameter of the IHG model. Covariates for the preference parameter \(\theta\) have to be specified in matrix form in the Formula as ordinal~V.

If family="cush", then a CUSH model is fitted to the data (Combination of Uniform and SHelter effect). If a category corresponding to the inflation should be passed via argument sh a CUSH model is called and covariates for the shelter parameter \(\delta\) are specified in matrix form Formula as ordinal~X. If two category corresponding to the inflation should be passed via argument sh (as a list or array) a CUSH2 model is called and covariates for the shelters’ parameters \((\delta_1,\delta_2)\) are specified in matrix form Formula as ordinal~X1|X2. Notice that, to specify covariates for a single shelter choice in a CUSH2 model, the formula should be ordinal~X1|0 and not ordinal~0|X2.

Extra arguments include the maximum number of iterations maxiter for the optimization algorithm, the required error tolerance tol, and a dictionary of parameters of a known model ass_pars (assumed parameters) to be compared with the estimates: these could be the parameters used to draw the sample, theoretical parameters, or howsoever specified parameters we want to (graphically) compare with the estimates.

Note

The ass_pars argument is effective for models with covariates too. The .plot() method will show the average probability distribution of the model specified with the assumed parameters.

Methods of estimate and draw#

For both functions, the methods .summary() and .plot() are always available calling the main functions to print a summary and plot the results, respectively. For .plot() arguments and options, see here the CUBsample Class (for object returned by draw) and the extended CUBres Classes of the corresponding family (for objects returned by estimate), defined in each family module.

The method .summary() of objects returned by estimate function, will print a summary of the inferential method applied to the observed sample. Along with estimated parameters, standard errors, Wald tests, and p-values, it will show some model metrics:

  • Correlation: available for CUB family models without covariates only; it is the correlation between \(\hat{\pi}\) and \(\hat{\xi}\)

  • Dissimilarity: the dissimilarity index that can be interpreted as the fraction of sample to be change to achieve a perfect fit given the estimated parameters; available for all models with and without covariates; for models with covariates, the average estimated probability is used

  • Loglik(sat): the log-likelihood of the saturated model; see here and the reference paper Piccolo and Simone, 2019; available for models without covariates only

  • Loglik(MOD): the log-likelihood of the estimated model

  • Loglik(uni): the log-likelihood of the null model; see here and the reference paper Piccolo and Simone, 2019

  • Mean-loglik: mean log-likelihood, i.e. the log-likelihood of the estimated model divided by the number \(n\) of observed ordinal responses

  • Deviance: Likelihood Ratio Test (LRT) between the saturated and the estimated models; available for models without covariates only

  • AIC: Akaike Information Criterion

  • BIC: Bayesian Information Criterion.

Calling .as_dataframe() will return a DataFrame of parameters’ names and values for objects of the Class CUBsample returned by draw. For objects of the extended Base Class CUBres returned by estimate instead, will return a DataFrame with parameters’ component, name, estimated value, standard error, Wald test statistics and p-value.

Calling the method .save(fname) the object can be saved on a file called fname.cub.sample (for draw) or fname.cub.fit (for estimate). Saved objects can then be loaded using the function general.load_object(fname). See this example.

Attributes of estimate and draw#

For both objects returned by estimate and draw, the attributes .formula and .df are always available. The function draw will return the original DataFrame (if provided) with an extra column of the drawn ordinal response called as specified in the formula.

Many other attributes can be called from objects of the Base Class CUBres returned by estimate, such as the computed loglikelihood, the AIC and BIC, etc. For details, see here the Base Class CUBres reference guide.

CUB family#

Basic family of the class CUB. See the references for details: Piccolo, 2003; D'Elia and Piccolo, 2005; Piccolo, 2006; Iannario and Piccolo, 2010; Iannario and Piccolo, 2009; Iannario et al., 2014; Iannario et al., 2022; Piccolo and Simone, 2019.

Without covariates#

Reference guide

A model of the CUB family for responses with \(m\) ordinal categories, without covariates is specified as

\[\Pr(R=r|\boldsymbol{\theta}) = \pi \dbinom{m-1}{r-1}(1-\xi)^{r-1}\xi^{m-r}+\dfrac{1-\pi}{m}, \; r = 1,2,\ldots,m\]

where \(\pi\) and \(\xi\) are the parameters for respectively the uncertainty and the feeling components.

Note that \((1-\pi)\) is the weight of the Uncertainty component and \((1-\xi)\) is the Feeling component for common positive wording.

In the following example, a sample will be drawn from a CUB model of \(n=500\) observations of an ordinal variable with \(m=10\) ordinal categories and parameters \((\pi=.7, \xi=.2)\). A seed=1 will be set to ensure reproducibility.

Notice that a Dissimilarity index is computed: this should be interpreted as the fraction of the drawn sample to be changed to achieve a perfect fit to the theoretical specified model the sample has been drawn from.

Script#
 1# import libraries
 2import matplotlib.pyplot as plt
 3from cubmods.gem import draw
 4
 5# draw a sample
 6drawn = draw(
 7    formula="ord ~ 0 | 0",
 8    m=10, pi=.7, xi=.2,
 9    n=500, seed=1)
10# print the summary of the drawn sample
11print(drawn.summary())
12# show the plot of the drawn sample
13drawn.plot()
14plt.show()
=======================================================================
=====>>> CUB model <<<===== Drawn random sample
=======================================================================
m=10  Sample size=500  seed=1
formula: ord~0|0
-----------------------------------------------------------------------
  component parameter  value
Uncertainty        pi    0.7
    Feeling        xi    0.2
=======================================================================
Sample metrics
Mean     = 7.368000
Variance = 5.687952
Std.Dev. = 2.384943
-----------------------------------------------------------------------
Dissimilarity = 0.0650938
=======================================================================
CUB00 drawn sample

Notice that, since the default value of the kwarg model is "cub" we do not need to specify it.

Calling drawn.as_dataframe() will return a DataFrame with the specified parameters of the theoretical model

     component parameter  value
0  Uncertainty        pi    0.7
1      Feeling        xi    0.2

Using the previously drawn sample, in the next example the parameters \((\hat\pi, \hat\xi)\) will be estimated.

Note that in the function gem.estimate:

  • df needs to be a pandas DataFrame; the attribute drawn.df will return a DataFrame with ord as column name of the drawn ordinal response (as previuosly speficied in the formula)

  • formula needs the ordinal variable name (ord in this case) and the covariates for each component (none in this case, so "0|0")

  • if m is not provided, the maximum observed ordinal value will be assumed and a warning will be raised

  • with ass_pars dictionary, the parameters of a known model (if any) can be specified; in this case, we’ll specify the known parameters used to draw the sample

Script#
 1# inferential method on drawn sample
 2fit = estimate(
 3    df=drawn.df,
 4    formula="ord~0|0",
 5    ass_pars={
 6        "pi": drawn.pars[0],
 7        "xi": drawn.pars[1]
 8    }
 9)
10# print the summary of MLE
11print(fit.summary())
12# show the plot of MLE
13fit.plot()
14plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUB model <<<===== ML-estimates
=======================================================================
m=10  Size=500  Iterations=13  Maxiter=500  Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
    Estimates  StdErr    Wald  p-value
pi      0.675  0.0340  19.872   0.0000
-----------------------------------------------------------------------
Feeling
    Estimates  StdErr    Wald  p-value
xi      0.188  0.0090  20.808   0.0000
-----------------------------------------------------------------------
Correlation   = 0.2105
=======================================================================
Dissimilarity = 0.0599
Loglik(sat)   = -994.063
Loglik(MOD)   = -1000.111
Loglik(uni)   = -1151.293
Mean-loglik   = -2.000
Deviance      = 12.096
-----------------------------------------------------------------------
AIC = 2004.22
BIC = 2012.65
=======================================================================
Elapsed time=0.00202 seconds =====>>> Thu Sep 26 18:00:53 2024
=======================================================================
CUB 00 MLE

See here the reference guide of general module and the reference paper Piccolo and Simone, 2019 for details about log-likelihoods, deviance and information criteria.

Calling fit.as_dataframe() will return a DataFrame with parameters’ estimated values and standard errors

     component parameter  estimate    stderr       wald        pvalue
0  Uncertainty        pi   0.67476  0.033954  19.872485  7.042905e-88
1      Feeling        xi   0.18817  0.009043  20.807551  3.697579e-96

As an example, we can now save the fit object to file. By default, it will be saved as a pickle file.

Script#
1fit.save(fname="cub_mle_results")

The previous code, will save a file cub_mle_results.cub.fit.

We can then load the saved file with the code

Script#
1from cubmods.general import load_object
2
3myfit = load_object("cub_mle_results.cub.fit")

and we can apply to myfit the same methods and attributes of the original fit object.

With covariates#

Reference guide (0|W)

Reference guide (Y|0)

Reference guide (Y|W)

\[\Pr(R_i=r|\pmb\theta, \pmb y_i, \pmb w_i) = \pi_i \dbinom{m-1}{r-1}(1-\xi_i)^{r-1}\xi_i^{m-r}+\dfrac{1-\pi_i}{m} ,\; r = 1,2,\ldots,m\]
\[\begin{split}\left\{ \begin{array}{l} \pi_i = \dfrac{1}{1+\exp\{-\pmb y_i \pmb \beta\}} \\ \xi_i = \dfrac{1}{1+\exp\{-\pmb w_i \pmb \gamma\}} \end{array} \right. \quad \equiv \quad \left\{ \begin{array}{l} \mathrm{logit}(1-\pi_i) = - \pmb y_i \pmb \beta \vphantom{\dfrac{1}{1+\exp\{-\pmb y_i \pmb \beta\}}} \\ \mathrm{logit}(1-\xi_i) = - \pmb w_i \pmb \gamma \vphantom{\dfrac{1}{1+\exp\{-\pmb w_i \pmb \gamma\}}} \end{array} \right.\end{split}\]

All three combinations of covariates has been implemented for CUB family in both Python and R: for uncertainty only, for feeling only, and for both.

Here we’ll show an example with covariates for feeling only.

First of all, we’ll draw a random sample with two covariates for the feeling component: W1 and W2. Note that, having two covariates, we’ll need three \(\gamma\) parameters, to consider the constant term too.

Script#
 1# import libraries
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from cubmods.gem import draw, estimate
 6
 7# Draw a random sample
 8n = 1000
 9np.random.seed(1)
10W1 = np.random.randint(1, 10, n)
11np.random.seed(42)
12W2 = np.random.random(n)
13df = pd.DataFrame({
14    "W1": W1, "W2": W2
15})
16drawn = draw(
17    formula="response ~ 0 | W1 + W2",
18    df=df,
19    m=10, n=n,
20    pi=0.8,
21    gamma=[2.3, 0.2, -5],
22)
23# print the summary
24print(drawn.summary())
=======================================================================
=====>>> CUB(0W) model <<<===== Drawn random sample
=======================================================================
m=10  Sample size=1000  seed=None
formula: res~0|W1+W2
-----------------------------------------------------------------------
  component parameter  value
Uncertainty        pi    0.8
    Feeling  constant    2.3
    Feeling        W1    0.2
    Feeling        W2   -5.0
=======================================================================
Sample metrics
Mean     = 4.566000
Variance = 8.089734
Std.Dev. = 2.844246
-----------------------------------------------------------------------
Dissimilarity = 0.0307673
=======================================================================
Script#
1# plot the drawn sample
2drawn.plot()
3plt.show()
CUB0W drawn sample
Script#
1# print the parameters' values
2print(drawn.as_dataframe())
     component parameter  value
0  Uncertainty        pi    0.8
1      Feeling  constant    2.3
2      Feeling        W1    0.2
3      Feeling        W2   -5.0
Script#
1# print the updated DataFrame
2print(drawn.df)
     W1        W2  res
0     6  0.374540    2
1     9  0.950714    7
2     6  0.731994    8
3     1  0.598658    8
4     1  0.156019    4
..   ..       ...  ...
995   3  0.091582    2
996   9  0.917314    9
997   4  0.136819    1
998   7  0.950237    3
999   8  0.446006    2

[1000 rows x 3 columns]

Finally, we’ll call estimate to estimate the parameters given the observed (actually, drawn) sample. We’ll pass the parameters used to drawn the sample with ass_pars (as a dictionary) to graphically compare the assumed and the estimated average probability distribution.

Script#
 1# MLE estimation
 2fit = estimate(
 3    formula="res ~ 0 | W1+W2",
 4    df=drawn.df,
 5    ass_pars={
 6        "pi": drawn.pars[0],
 7        "gamma": drawn.pars[1:]
 8    }
 9)
10# Print MLE summary
11print(fit.summary())
12# plot the results
13fit.plot()
14plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUB(0W) model <<<===== ML-estimates
=======================================================================
m=10  Size=1000  Iterations=18  Maxiter=500  Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
          Estimates  StdErr     Wald  p-value
pi            0.800  0.0198   40.499   0.0000
-----------------------------------------------------------------------
Feeling
          Estimates  StdErr     Wald  p-value
constant      2.353  0.1001   23.514   0.0000
W1            0.194  0.0138   14.034   0.0000
W2           -5.076  0.1454  -34.909   0.0000
=======================================================================
Dissimilarity = 0.0292
Loglik(MOD)   = -1807.052
Loglik(uni)   = -2302.585
Mean-loglik   = -1.807
-----------------------------------------------------------------------
AIC = 3622.10
BIC = 3641.74
=======================================================================
Elapsed time=0.09656 seconds =====>>> Thu Aug 15 18:31:21 2024
=======================================================================
CUB0W MLE

CUBSH family#

Basic family of the class CUB with shelter effect.

See the references for details: Corduas et al., 2009; Iannario, 2012; Piccolo and Simone, 2019.

Without covariates#

Reference guide

A model of the CUB family with shelter effect for responses with \(m\) ordinal categories, without covariates is specified as

\[\Pr(R=r|\boldsymbol{\theta}) = \delta D_r^{(c)} + (1-\delta)\left(\pi b_r(\xi) + \frac{1-\pi}{m} \right) ,\; r=1,2,\ldots,m\]

where \(\pi\) and \(\xi\) are the parameters for respectively the uncertainty and the feeling components, and \(\delta\) is the weight of the shelter effect.

Other parametrizations have been proposed, such as

\[\Pr(R=r|\boldsymbol{\theta}) = \lambda b_r(\xi) + (1-\lambda) \left[ \eta/m + (1-\eta) D_r^{(c)} \right] ,\; r=1,2,\ldots,m\]

where

\[\begin{split}\left\{ \begin{array}{l} \lambda = \pi(1-\delta) \\ \eta = \dfrac{(1-\pi)(1-\delta)}{1 - \pi(1-\delta)} \end{array} \right.\end{split}\]

See Piccolo and Simone, 2019 (pp 412-413) for the parameters’ interpretation.

Another parametrization, particularly useful for inferential issues is

\[\Pr(R=r|\boldsymbol{\theta}) = \pi_1 b_r{\xi} + \pi_2 /m + (1-\pi_1-\pi_2) D_r^{(c)}\]

where

\[\begin{split}\left\{ \begin{array}{l} \pi_1 = (1-\delta)\pi \\ \pi_2 = (1-\delta)(1-\pi) \end{array} \right.\end{split}\]

See the references for further details.

In the next example, we’ll draw an ordinal response and then estimate the parameters given the sample.

Script#
 1# import libraries
 2import matplotlib.pyplot as plt
 3from cubmods.gem import draw, estimate
 4
 5# draw a sample
 6drawn = draw(
 7    formula="ord ~ 0 | 0 | 0",
 8    m=7, sh=1,
 9    pi=.8, xi=.4, delta=.15,
10    n=1500, seed=42)
11
12print(drawn.as_dataframe())
     component parameter  value
0      Uniform       pi1   0.68
1     Binomial       pi2   0.17
2      Feeling        xi   0.40
3  Uncertainty       *pi   0.80
4      Shelter    *delta   0.15

Notice that:

  • since "cub" is default value of the kwarg model, we do not need to specify it

  • we’ll pass to estimate kwarg values taken from the object drawn

The method .plot() (of the fit object) shows, in the parameters space, the trivariate confidence ellipsoid too, which has not been implemented yet in the CUB package in R. The plot includes the marginal bivariate confidence ellipses too. Notice that, as proven in Pierini, 2024 pp 28-30, the confidence level of the marginal ellipses is greater than the ellipsoid’s confidence level. Indeed, the radius \(r\) of a standardized sphere at confidence level \((1-\alpha_3)\) is equal to \(r = \sqrt{ F^{-1}_{\chi^2_{(3)}}(1-\alpha_3) }\), thus the confidence level of the bivariate marginal ellipses (which are sections of trivariate cylinders) is \((1-\alpha_2) = F_{\chi^2_{(2)}}(r^2)\).

Script#
 1# inferential method on drawn sample
 2fit = estimate(
 3    df=drawn.df, sh=drawn.sh,
 4    formula=drawn.formula,
 5    ass_pars={
 6        "pi1": drawn.pars[0],
 7        "pi2": drawn.pars[1],
 8        "xi": drawn.pars[2],
 9    }
10)
11# print the summary of MLE
12print(fit.summary())
13# show the plot of MLE
14fit.plot()
15plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBSH model <<<===== ML-estimates
=======================================================================
m=7  Shelter=1  Size=1500  Iterations=59  Maxiter=500  Tol=1E-04
-----------------------------------------------------------------------
Alternative parametrization
       Estimates  StdErr    Wald  p-value
pi1        0.661  0.0307  21.508   0.0000
pi2        0.174  0.0344   5.041   0.0000
xi         0.388  0.0077  50.592   0.0000
-----------------------------------------------------------------------
Uncertainty
       Estimates  StdErr    Wald  p-value
pi         0.792  0.0400  19.813   0.0000
-----------------------------------------------------------------------
Feeling
       Estimates  StdErr    Wald  p-value
xi         0.388  0.0077  50.592   0.0000
-----------------------------------------------------------------------
Shelter effect
       Estimates  StdErr    Wald  p-value
delta      0.166  0.0116  14.327   0.0000
=======================================================================
Dissimilarity = 0.0049
Loglik(sat)   = -2734.302
Loglik(MOD)   = -2734.433
Loglik(uni)   = -2918.865
Mean-loglik   = -1.823
Deviance      = 0.263
-----------------------------------------------------------------------
AIC = 5474.87
BIC = 5490.81
=======================================================================
CUBSH 00 MLE

With covariates#

Reference guide

\[\Pr(R_i=r|\pmb\theta, \pmb y_i, \pmb w_i, \pmb x_i) = \delta_i D_r^{(c)} + (1-\delta_i)\left(\pi_i b_r(\xi_i) + \frac{1-\pi_i}{m} \right) ,\; r=1,2,\ldots,m\]
\[\begin{split}\left\{ \begin{array}{l} \pi_i = \dfrac{1}{1+\exp\{-\pmb y_i \pmb \beta\}} \\ \xi_i = \dfrac{1}{1+\exp\{-\pmb w_i \pmb \gamma\}} \\ \delta_i = \dfrac{1}{1+\exp\{-\pmb x_i \pmb \omega\}} \end{array} \right. \quad \equiv \quad \left\{ \begin{array}{l} \mathrm{logit}(1-\pi_i) = -\pmb y_i \pmb \beta \vphantom{\dfrac{1}{1+\exp\{-\pmb y_i \pmb \beta\}}} \\ \mathrm{logit}(1-\xi_i) = -\pmb w_i \pmb \gamma \vphantom{\dfrac{1}{1+\exp\{-\pmb w_i \pmb \gamma\}}} \\ \mathrm{logit}(\delta_i) = \pmb x_i \pmb \omega \vphantom{\dfrac{1}{1+\exp\{-\pmb x_i \pmb \omega\}}} \end{array} \right.\end{split}\]

Only the model with covariates for all components (GeCUB) has been currently defined and implemented, as in the R package CUB.

Nevertheless, thanks to the symbol 1 provided by the formula, we can specify a different combination of covariates.

For example, we’ll specifiy a model CUB with shelter effect, with covariates for uncertainty only. We’ll use the function logit to have better ‘control’ of the parameters values, because \(\gamma_0 = \mathrm{logit}(\xi)\) and similarly for \(\pi\) and \(\delta\).

Script#
 1# import libraries
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from cubmods.general import expit, logit
 6from cubmods.gem import draw, estimate
 7
 8# Draw a random sample
 9n = 1000
10np.random.seed(1)
11W1 = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13    "W1": W1,
14})
15drawn = draw(
16    formula="fee ~ W1 | 1 | 1",
17    df=df,
18    m=9, sh=2,
19    beta=[logit(.8), -.2],
20    gamma=[logit(.3)],
21    omega=[logit(.12)],
22)
23
24# MLE estimation
25fit = estimate(
26    formula="fee ~ W1 | 1 | 1",
27    df=drawn.df, sh=2,
28    ass_pars={
29        "beta":[logit(.8), -.2],
30        "gamma":[logit(.3)],
31        "omega":[logit(.12)],
32    }
33)
34# Print MLE summary
35print(fit.summary())
36# plot the results
37fit.plot()
38plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBSH(YWX) model <<<===== ML-estimates
=======================================================================
m=9  Shelter=2  Size=1000  Iterations=25  Maxiter=500  Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
          Estimates  StdErr     Wald  p-value
constant      0.992  0.3314    2.994   0.0028
W1           -0.127  0.0569   -2.228   0.0259
-----------------------------------------------------------------------
Feeling
          Estimates  StdErr     Wald  p-value
constant     -0.902  0.0381  -23.662   0.0000
-----------------------------------------------------------------------
Shelter effect
          Estimates  StdErr     Wald  p-value
constant     -2.074  0.1260  -16.462   0.0000
=======================================================================
Dissimilarity = 0.0139
Loglik(MOD)   = -2069.978
Loglik(uni)   = -2197.225
Mean-loglik   = -2.070
-----------------------------------------------------------------------
AIC = 4147.96
BIC = 4167.59
=======================================================================
Elapsed time=1.43850 seconds =====>>> Thu Aug 15 19:39:49 2024
=======================================================================
CUBSH YWX MLE

To get the estimated values of \(\hat\xi\) and \(\hat\delta\) we can use the function expit because \(\hat\xi = \mathrm{expit}(\hat\gamma_0)\) and similarly for \(\hat\delta\). Then, we can use the delta-method to compute the standard errors of both \(\hat\xi\) and \(\hat\delta\), for instance \(\widehat{es}(\xi) = \mathrm{expit}[\hat\gamma_0+\widehat{es}(\gamma_0)] - \hat\xi\).

Script#
 1est_xi = expit(fit.estimates[2])
 2est_de = expit(fit.estimates[3])
 3est_xi_se = expit(fit.estimates[2]+fit.stderrs[2]) - est_xi
 4est_de_se = expit(fit.estimates[3]+fit.stderrs[3]) - est_de
 5print(
 6    "     estimates  stderr\n"
 7    f"xi      {est_xi:.4f}  {est_xi_se:.4f}"
 8    "\n"
 9    f"delta   {est_de:.4f}  {est_de_se:.4f}"
10)
     estimates  stderr
xi      0.2886  0.0079
delta   0.1116  0.0131

which, in fact, match the values used to draw the sample.

CUSH family#

Basic family of the class CUSH with a single shelter effect.

See the references for details: Capecchi and Piccolo, 2017; Piccolo and Simone, 2019.

Without covariates#

Reference guide

\[\Pr(R=r|\pmb\theta) = \delta D_r^{(c)} + (1-\delta)/m ,\; r=1,2,\ldots,m\]

In the example, we’ll draw a sample from a CUSH model without covariates and then estimate the parameter \(\delta\) given the observed sample.

Notice that, since the model is not the default "cub", we need to specify it.

Script#
 1# import libraries
 2import matplotlib.pyplot as plt
 3from cubmods.gem import draw, estimate
 4
 5# draw a sample
 6drawn = draw(
 7    formula="ord ~ 0",
 8    model="cush",
 9    sh=7,
10    m=7, delta=.15,
11    n=1500, seed=76)
12
13# inferential method on drawn sample
14fit = estimate(
15    df=drawn.df,
16    model="cush",
17    formula="ord~0",
18    sh=7,
19    ass_pars={
20        "delta": drawn.pars[0],
21    }
22)
23# print the summary of MLE
24print(fit.summary())
25# show the plot of MLE
26fit.plot()
27plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH model <<<===== ML-estimates
=======================================================================
m=7  Shelter=7  Size=1500
-----------------------------------------------------------------------
Shelter effect
       Estimates  StdErr   Wald  p-value
delta      0.124  0.0130  9.532   0.0000
=======================================================================
Dissimilarity = 0.0236
Loglik(sat)   = -2856.039
Loglik(MOD)   = -2859.923
Loglik(uni)   = -2918.865
Mean-loglik   = -1.907
Deviance      = 7.768
-----------------------------------------------------------------------
AIC = 5721.85
BIC = 5727.16
=======================================================================
Elapsed time=0.00113 seconds =====>>> Fri Aug 16 10:44:07 2024
=======================================================================
CUSH 0 MLE

With covariates#

Reference guide

\[\Pr(R_i=r|\pmb\theta,\pmb x_i) = \delta_i D_r^{(c)} + (1-\delta_i)/m ,\; r=1,2,\ldots,m\]
\[\delta_i = \dfrac{1}{1+\exp\{ - \pmb x_i \pmb\omega \}} \quad \equiv \quad \mathrm{logit}(\delta_i) = \pmb x_i \pmb\omega\]

In the example, we’ll draw a sample from a CUSH model with covariates and then estimate the parameter given the observed sample.

Notice that, since the model is not the default "cub", we need to specify it.

Script#
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cubmods.general import logit
from cubmods.gem import draw, estimate

# Draw a random sample
n = 1000
np.random.seed(1)
X = np.random.randint(1, 10, n)
df = pd.DataFrame({
    "X": X,
})
drawn = draw(
    formula="fee ~ X",
    model="cush",
    df=df,
    m=9, sh=5,
    omega=[logit(.05), .2],
)

# MLE estimation
fit = estimate(
    formula="fee ~ X",
    model="cush",
    df=drawn.df, sh=5,
    ass_pars={
        "omega": drawn.pars
    }
)
# Print MLE summary
print(fit.summary())
# plot the results
fit.plot()
plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH(X) model <<<===== ML-estimates
=======================================================================
m=9  Shelter=5  Size=1000
-----------------------------------------------------------------------
Shelter effect
          Estimates  StdErr    Wald  p-value
constant     -3.131  0.4361  -7.180   0.0000
X             0.229  0.0629   3.642   0.0003
=======================================================================
Dissimilarity = 0.0395
Loglik(MOD)   = -2130.030
Loglik(uni)   = -2197.225
Mean-loglik   = -2.130
-----------------------------------------------------------------------
AIC = 4264.06
BIC = 4273.87
=======================================================================
Elapsed time=0.01704 seconds =====>>> Fri Aug 16 10:54:11 2024
=======================================================================
CUSH X MLE

CUSH2 family#

Family of the class CUSH with two shelter effects (CUSH2).

This family has been introduced by Pierini, 2024 (pp 16-20) and first implemented in this Python package. See Piccolo and Simone, 2019 as a reference for the CUB class models.

These models are particularly useful whenever the shelter choices are not polarized, i.e. they’re not at the extremes of the ordinal variable support. In case of polarized responses, finite mixtures based on the Discretized Beta distribution (see Simone and Tutz, 2018 and Simone, 2022) can be used, which have not been implemented in this package yet.

Without covariates#

Reference guide

\[\Pr(R=r|\pmb\theta) = \delta_1 D_r^{(c_1)} + \delta_2 D_r^{(c_2)} + (1-\delta_1-\delta_2)/m ,\; r=1,2,\ldots,m\]

In the example, we’ll draw a sample from a CUSH2 model without covariates and then estimate the parameters given the observed sample.

Notice that, since the model is not the default "cub", we need to specify it. Passing a list of two shelter categories with the kwarg sh, a CUSH2 model will be called.

Script#
 1# import libraries
 2import matplotlib.pyplot as plt
 3from cubmods.gem import draw, estimate
 4
 5# draw a sample
 6drawn = draw(
 7    formula="ord ~ 0 | 0",
 8    model="cush",
 9    sh=[1,4],
10    m=7,
11    delta1=.15, delta2=.1,
12    n=1000, seed=42)
13
14# inferential method on drawn sample
15fit = estimate(
16    df=drawn.df,
17    model="cush",
18    formula="ord~0|0",
19    sh=drawn.sh,
20    ass_pars={
21        "delta1": drawn.pars[0],
22        "delta2": drawn.pars[1],
23    }
24)
25# print the summary of MLE
26print(fit.summary())
27# show the plot of MLE
28fit.plot()
29plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH2 model <<<===== ML-estimates
=======================================================================
m=7  Shelter=[1 4]  Size=1000
-----------------------------------------------------------------------
Shelter effects
        Estimates  StdErr    Wald  p-value
delta1      0.172  0.0149  11.512   0.0000
delta2      0.113  0.0163   6.930   0.0000
=======================================================================
Dissimilarity = 0.0176
Loglik(sat)   = -1849.206
Loglik(MOD)   = -1850.709
Loglik(uni)   = -1945.910
Mean-loglik   = -1.851
Deviance      = 3.006
-----------------------------------------------------------------------
AIC = 3705.42
BIC = 3715.23
=======================================================================
Elapsed time=0.00247 seconds =====>>> Fri Sep 27 11:32:02 2024
=======================================================================
CUSH2 00 MLE

With covariates#

Reference guide (X1|0)

Reference guide (X1|X2)

\[\Pr(R_i=r|\pmb\theta,\pmb x_{1i}, \pmb x_{2i}) = \delta_{1i} D_r^{(c_1)} + \delta_{2i} D_r^{(c_2)} + (1-\delta_{1i}- \delta_{2i})/m ,\; r=1,2,\ldots,m\]
\[\begin{split}\left\{ \begin{array}{l} \delta_{1i} = \dfrac{1}{1+\exp\{ - \pmb x_{1i} \pmb\omega_1 \}} \\ \delta_{2i} = \dfrac{1}{1+\exp\{ - \pmb x_{2i} \pmb\omega_2 \}} \end{array} \right. \quad \equiv \quad \left\{ \begin{array}{l} \mathrm{logit}(\delta_{1i}) = \pmb x_{1i} \pmb\omega_1 \vphantom{\dfrac{1}{1+\exp\{ - \pmb x_{1i} \pmb\omega_1 \}}} \\ \mathrm{logit}(\delta_{2i}) = \pmb x_{2i} \pmb\omega_2 \vphantom{\dfrac{1}{1+\exp\{ - \pmb x_{2i} \pmb\omega_2 \}}} \end{array} \right.\end{split}\]

Two CUSH2 models with covariates have been defined and implemented: for the first shelter choice only and for both.

In this example we’ll draw a sample from a CUSH2 model with covariates for the first shelter choice only and will then estimate the parameters with a CUSH2 model with covariates for both shelter choices but using the symbol 1 in the formula for the second shelter choice to estimate the constant parameter only. This is usually not needed, but we do it here to confirm that \(\mathrm{expit}(\hat\omega_{20})=\hat\delta_2\).

Notice that, since the model is not the default "cub", we need to specify it.

Script#
 1# import libraries
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from cubmods.general import logit, expit
 6from cubmods.gem import draw, estimate
 7
 8# Draw a random sample
 9n = 1000
10np.random.seed(1)
11X = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13    "X": X,
14})
15drawn = draw(
16    formula="fee ~ X | 0",
17    model="cush",
18    df=df,
19    m=9, sh=[2, 8],
20    omega1=[logit(.05), .2],
21    delta2=.1
22)
23
24# MLE estimation
25fit = estimate(
26    formula="fee ~ X | 1",
27    model="cush",
28    df=drawn.df, sh=[2, 8],
29    ass_pars={
30        "omega1": drawn.pars[:2],
31        "omega2": [logit(drawn.pars[-1])]
32    }
33)
34# Print MLE summary
35print(fit.summary())
36# plot the results
37fit.plot()
38plt.show()
39
40est_de2 = expit(fit.estimates[2])
41est_de2_es = expit(fit.estimates[2]+fit.stderrs[2]) - est_de2
42print(
43    "     estimates  stderr\n"
44    f"delta2  {est_de2:.4f}  {est_de2_es:.4f}"
45)
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH2(X1,X2) model <<<===== ML-estimates
=======================================================================
m=9  Shelter=[2 8]  Size=1000
-----------------------------------------------------------------------
Shelter effect 1
          Estimates  StdErr     Wald  p-value
constant     -3.170  0.4216   -7.519   0.0000
X             0.207  0.0613    3.379   0.0007
-----------------------------------------------------------------------
Shelter effect 2
          Estimates  StdErr     Wald  p-value
constant     -2.276  0.1609  -14.149   0.0000
=======================================================================
Dissimilarity = 0.0305
Loglik(MOD)   = -2122.463
Loglik(uni)   = -2197.225
Mean-loglik   = -2.122
-----------------------------------------------------------------------
AIC = 4250.93
BIC = 4265.65
=======================================================================
Elapsed time=0.06553 seconds =====>>> Fri Aug 16 11:29:11 2024
=======================================================================
CUSH2 XX MLE
     estimates  stderr
delta2  0.0931  0.0145

Notice that, as proven by Iannario, 2012 (pp 7-8), CUB models with shelter effect generate a perfect fit at \(R=c\). It can be easily proven that CUSH2 models too generate perfect fits at both \(R=c_1\) and \(R=c_2\). Indeed, we can also graphically see that the estimated probability distribution is closer to the observed sample than the assumed model used to draw the sample, because of the perfect fits generated at \(R=2\) and \(R=8\).

CUBE family#

Family of the class CUBE (Combination of Uniform and BEtaBinomial). CUB models are nested into CUBE models: in fact, a CUB model is equal to a CUBE model with the overdispersion parameter \(\phi=0\). Notiche that \(0\leq\phi\leq0.2\) is the usual range of the overdispersion parameter.

See the references for details: Iannario, 2014; Piccolo, 2015; Piccolo and Simone, 2019.

Without covariates#

Reference guide

\[\Pr(R=r|\pmb{\theta}) = \pi \beta e(\xi,\phi)+\dfrac{1-\pi}{m} ,\; r=1,2,\ldots,m\]

In this example, we’ll draw a sample from a CUBE model and then will estimate the parameters given the observed sample.

Notice that, since the model is not the default "cub", we need to specify it.

The .plot() method of the object fit will show trivariate and bivariate confidence regions too, as in CUBSH models. See here for the values of confidence levels.

Script#
 1# import libraries
 2import matplotlib.pyplot as plt
 3from cubmods.gem import draw, estimate
 4
 5# draw a sample
 6drawn = draw(
 7    formula="ord ~ 0 | 0 | 0",
 8    model="cube",
 9    m=9, pi=.7, xi=.3, phi=.15,
10    n=500, seed=1)
11
12# inferential method on drawn sample
13fit = estimate(
14    df=drawn.df,
15    formula="ord~0|0|0",
16    model="cube",
17    ass_pars={
18        "pi": drawn.pars[0],
19        "xi": drawn.pars[1],
20        "phi": drawn.pars[2],
21    }
22)
23# print the summary of MLE
24print(fit.summary())
25# show the plot of MLE
26fit.plot()
27plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE model <<<===== ML-estimates
=======================================================================
m=9  Size=500  Iterations=62  Maxiter=1000  Tol=1E-06
-----------------------------------------------------------------------
Uncertainty
     Estimates  StdErr    Wald  p-value
pi       0.577  0.0633   9.108   0.0000
-----------------------------------------------------------------------
Feeling
     Estimates  StdErr    Wald  p-value
xi       0.251  0.0217  11.560   0.0000
-----------------------------------------------------------------------
Overdispersion
     Estimates  StdErr    Wald  p-value
phi      0.111  0.0402   2.754   0.0059
=======================================================================
Dissimilarity = 0.0426
Loglik(sat)   = -1037.855
Loglik(MOD)   = -1041.100
Loglik(uni)   = -1098.612
Mean-loglik   = -2.082
Deviance      = 6.491
-----------------------------------------------------------------------
AIC = 2088.20
BIC = 2100.84
=======================================================================
Elapsed time=0.07919 seconds =====>>> Fri Aug 16 12:18:49 2024
=======================================================================
CUBE 000 MLE

With covariates#

Reference guide (0|W|0)

Reference guide (Y|W|Z)

\[\Pr(R_i=r|\pmb{\theta};\pmb y_i, \pmb w_i; \pmb z_i) = \pi_i \beta e(\xi_i,\phi_i)+\dfrac{1-\pi_i}{m}, ,\; r=1,2,\ldots,m\]
\[\begin{split}\left\{ \begin{array}{l} \pi_i = \dfrac{1}{1+\exp\{ -\pmb y_i \pmb\beta\}} \\ \xi_i = \dfrac{1}{1+\exp\{ -\pmb w_i \pmb\gamma\}} \\ \phi_i = \exp\{ \pmb z_i \pmb \alpha \} \end{array} \right. \quad \equiv \quad \left\{ \begin{array}{l} \mathrm{logit}(1-\pi_i) = -\pmb y_i \pmb\beta \vphantom{\dfrac{1}{1+\exp\{ -\pmb y_i \pmb\beta\}}} \\ \mathrm{logit}(1-\xi_i) = -\pmb w_i \pmb\gamma \vphantom{\dfrac{1}{1+\exp\{ -\pmb w_i \pmb\gamma\}}} \\ \log \phi_i = \pmb z_i \pmb \alpha \vphantom{\exp\{ \pmb z_i \pmb \alpha \}} \end{array} \right.\end{split}\]

Currently, as in the R package CUB, two CUBE models with covariates have been defined and implemented: for the feeling only and for all components. Nevertheless, the symbol 1 can always be used in the formula for different combinations of covariates.

In this example, we’ll draw a sample with covariates for feeling only and then will estimate the parameters given the observed sample.

Script#
 1# import libraries
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from cubmods.general import expit, logit
 6from cubmods.gem import draw, estimate
 7
 8# Draw a random sample
 9n = 1000
10np.random.seed(76)
11W = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13    "W": W,
14})
15drawn = draw(
16    formula="fee ~ 0 | W | 0",
17    model="cube",
18    df=df,
19    m=9,
20    pi=.8,
21    gamma=[logit(.3), -.1],
22    phi=.12,
23)
24
25# MLE estimation
26fit = estimate(
27    formula="fee ~ 0 | W | 0",
28    model="cube",
29    df=drawn.df,
30    ass_pars={
31        "pi": drawn.pars[0],
32        "gamma": drawn.pars[1:-1],
33        "phi": drawn.pars[-1]
34    }
35)
36# Print MLE summary
37print(fit.summary())
38# plot the results
39fit.plot()
40plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE(0W0) model <<<===== ML-estimates
=======================================================================
m=9  Size=1000
-----------------------------------------------------------------------
Uncertainty
          Estimates  StdErr    Wald  p-value
pi            0.815  0.0343  23.733   0.0000
-----------------------------------------------------------------------
Feeling
          Estimates  StdErr    Wald  p-value
constant     -0.770  0.1012  -7.612   0.0000
W            -0.116  0.0191  -6.052   0.0000
-----------------------------------------------------------------------
Overdisperson
          Estimates  StdErr    Wald  p-value
phi           0.150  0.0260   5.779   0.0000
=======================================================================
Dissimilarity = 0.0183
Loglik(MOD)   = -1886.654
Loglik(uni)   = -2197.225
Mean-loglik   = -1.887
-----------------------------------------------------------------------
AIC = 3781.31
BIC = 3800.94
=======================================================================
Elapsed time=2.30903 seconds =====>>> Fri Aug 16 12:31:10 2024
=======================================================================
CUBE 0W0 MLE

Notice that the same results can be achieved using a CUBE model with covariates for all components and passing the symbol 1 to the uncertainty and overdispersion components.

Script#
 1# MLE estimation
 2fit = estimate(
 3    formula="fee ~ 1 | W | 1",
 4    model="cube",
 5    df=drawn.df,
 6    ass_pars={
 7        "beta": [logit(drawn.pars[0])],
 8        "gamma": drawn.pars[1:3],
 9        "alpha": [np.log(drawn.pars[3])]
10    }
11)
12# Print MLE summary
13print(fit.summary())
14# plot the results
15fit.plot()
16plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE(YWZ) model <<<===== ML-estimates
=======================================================================
m=9  Size=1000  Iterations=29  Maxiter=1000  Tol=1E-02
-----------------------------------------------------------------------
Uncertainty
          Estimates  StdErr     Wald  p-value
constant      1.423  0.2183    6.518   0.0000
-----------------------------------------------------------------------
Feeling
          Estimates  StdErr     Wald  p-value
constant     -0.778  0.1018   -7.639   0.0000
W            -0.117  0.0193   -6.074   0.0000
-----------------------------------------------------------------------
Overdispersion
          Estimates  StdErr     Wald  p-value
constant     -1.930  0.1756  -10.989   0.0000
=======================================================================
Dissimilarity = 0.0239
Loglik(MOD)   = -1886.690
Loglik(uni)   = -2197.225
Mean-loglik   = -1.887
-----------------------------------------------------------------------
AIC = 3781.38
BIC = 3801.01
=======================================================================
Elapsed time=50.02969 seconds =====>>> Fri Aug 16 12:33:36 2024
=======================================================================
CUBE YWZ MLE

In fact:

Script#
 1est_pi = expit(fit.estimates[0])
 2est_ph = np.exp(fit.estimates[3])
 3est_pi_se = expit(fit.estimates[0]+fit.stderrs[0]) - est_pi
 4est_ph_se = np.exp(fit.estimates[3]+fit.stderrs[3]) - est_ph
 5print(
 6    "     estimates  stderr\n"
 7    f"pi      {est_pi:.4f}  {est_pi_se:.4f}"
 8    "\n"
 9    f"phi     {est_ph:.4f}  {est_ph_se:.4f}"
10)
     estimates  stderr
pi      0.8058  0.0319
phi     0.1451  0.0279

IHG family#

Family of the class IHG (Inverse HyperGeometric).

See the references for details: D’Elia, 2003; D'Elia et al., 2005; Piccolo and Simone, 2019.

Without covariates#

Reference guide

\[\begin{split}\left\{ \begin{array}{l} \Pr(R=1|\theta) = \theta \\ \Pr(R=r+1|\theta) = \Pr(R=r|\theta)(1-\theta)\dfrac{m-r}{m-1-r(1-\theta)},\; r= 1,2, \ldots, m-1 \end{array} \right.\end{split}\]

which is equivalent to

\[\begin{split}\begin{array}{l} \Pr(R=r|\theta) = \frac{ \dbinom{m+B-r-1}{m-r} }{ \dbinom{m+B-1}{m-1} },\; r= 1,2, \ldots, m \\ \textrm{with } B = (m-1)\theta / (1 - \theta) \end{array}\end{split}\]

In this example, we’ll draw a sample from an IHG model and the estimate the parameter from the observed sample.

# import libraries
import matplotlib.pyplot as plt
from cubmods.gem import draw, estimate

# draw a sample
drawn = draw(
    formula="ord ~ 0",
    model="ihg",
    m=10, theta=.2,
    n=500, seed=42)

# inferential method on drawn sample
fit = estimate(
    df=drawn.df,
    formula="ord ~ 0",
    model="ihg",
    ass_pars={
        "theta": drawn.pars[0],
    }
)
# print the summary of MLE
print(fit.summary())
# show the plot of MLE
fit.plot()
plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> IHG model <<<===== ML-estimates
=======================================================================
m=10  Size=500
-----------------------------------------------------------------------
Theta
       Estimates  StdErr    Wald  p-value
theta      0.200  0.0086  23.292   0.0000
=======================================================================
Dissimilarity = 0.0639
Loglik(sat)   = -1044.100
Loglik(MOD)   = -1050.513
Loglik(uni)   = -1151.293
Mean-loglik   = -2.101
Deviance      = 12.824
-----------------------------------------------------------------------
AIC = 2103.03
BIC = 2107.24
=======================================================================
Elapsed time=0.00464 seconds =====>>> Fri Aug 16 12:47:55 2024
=======================================================================
IHG 0 MLE

With covariates#

Reference guide

\[\begin{split}\left\{ \begin{array}{l} \Pr(R_i=1|\pmb\theta;\pmb v_i) = \theta_i \\ \Pr(R_i=r+1|\pmb\theta;\pmb v_i) = \Pr(R_i=r|\pmb\theta;\pmb v_i)(1-\theta_i)\dfrac{m-r}{m-1-r(1-\theta_i)},\; r= 1, \ldots, m-1 \end{array} \right.\end{split}\]
\[\theta_i = \dfrac{1}{1 + \exp\{ - \pmb v_i \pmb \nu \}} \quad \equiv \quad \mathrm{logit}(\theta_i) = \pmb v_i \pmb \nu\]

In this example we’ll draw a sample from an IHG with two covariates and then will estimate the parameters given the observed sample. Notice that IHG models without covariates are unimodals but, however, IHG models with covariates can be bimodal, as the one in the following example.

Script#
 1# import libraries
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from cubmods.gem import draw, estimate
 6from cubmods.general import logit
 7
 8# Draw a random sample
 9n = 1000
10np.random.seed(1)
11V1 = np.random.random(n)
12np.random.seed(42)
13V2 = np.random.random(n)
14df = pd.DataFrame({
15    "V1": V1, "V2": V2
16})
17
18# draw a sample
19drawn = draw(
20    df=df,
21    formula="ord ~ V1 + V2",
22    model="ihg",
23    m=10,
24    nu=[logit(.1), -2, 3],
25    seed=42)
26
27# inferential method on drawn sample
28fit = estimate(
29    df=drawn.df,
30    formula=drawn.formula,
31    model="ihg",
32    ass_pars={
33        "nu": drawn.pars,
34    }
35)
36# print the summary of MLE
37print(fit.summary())
38# show the plot of MLE
39fit.plot()
40plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> IHG(V) model <<<===== ML-estimates
=======================================================================
m=10  Size=1000
-----------------------------------------------------------------------
Theta
          Estimates  StdErr     Wald  p-value
constant     -2.368  0.0998  -23.741   0.0000
V1           -1.973  0.1438  -13.721   0.0000
V2            3.230  0.1451   22.261   0.0000
=======================================================================
Dissimilarity = 0.0455
Loglik(MOD)   = -1958.475
Loglik(uni)   = -2302.585
Mean-loglik   = -1.958
-----------------------------------------------------------------------
AIC = 3922.95
BIC = 3937.67
=======================================================================
Elapsed time=1.10664 seconds =====>>> Fri Aug 16 12:53:12 2024
=======================================================================
IHG V MLE

MULTICUB#

See the Piccolo and Simone, 2019 as a reference.

Reference guide

With the multicub tool, parameters estimated from multiple observed samples can be shown in a single plot.

In this example, we’ll draw three samples from CUBE models and manually add a shelter category. Then we’ll use the multicub tool for CUB models, CUBE models and CUBSH models (that aren’t yet implemented in the R package CUB for the multicub tool).

Notice that, since the samples are drawn from a “CUBE model with shelter effect” (which has not been implemented yet), the estimated parameters’ values will differ from the theoretical ones of the speficied CUBE model used to draw the sample.

The multicub tool in cubmods package can also show confidence ellipses for CUB models.

Script#
 1import numpy as np
 2import pandas as pd
 3import matplotlib.pyplot as plt
 4from cubmods.gem import draw
 5from cubmods.multicub import multi
 6
 7# draw random samples
 8df = pd.DataFrame()
 9for i, (pi, xi, phi) in enumerate(
10    zip([.9, .8, .7], [.3, .5, .7], [.05, .1, .15])
11    ):
12    drawn = draw(
13        formula="ord ~ 0 | 0 | 0",
14        m = 9, model="cube", n=1000,
15        pi=pi, xi=xi, phi=phi,
16        seed=1976
17    )
18    # add a shelter category at c=1
19    df[f"ord{i+1}"] = np.concatenate((
20        drawn.rv, np.repeat(1, 25)
21    ))
22
23# MULTI-CUB
24multi(
25    ords=df, ms=9, model="cub"
26)
27plt.show()
28# MULTI-CUBE
29multi(
30    ords=df, ms=9, model="cube"
31)
32plt.show()
33# MULTI-CUBSH
34multi(
35    ords=df, ms=9, model="cub", shs=1,
36    pos=[1, 6, 2]
37)
38plt.show()
MULTICUB MULTICUBE MULTICUBSH