Manual#
The package cubmods
can be used to build models within the CUB class
given an observed sample (and, eventually, the covariance matrix)
in order to estimate the parameters. Also, for each family,
random samples can be drawn from a specified model.
Currently, six families have been defined and implemented:
CUB (Combination of Uniform and Binomial)
CUBSH (CUB + a SHelter choice)
CUSH (Combination of Uniform and a SHelter choice)
CUSH2 (Combination of Uniform and 2 SHelter choices)
CUBE (Combination of Uniform and BEta-binomial)
IHG (Inverse HyperGeometric)
For each family, a model can be defined with or without covariates for one or more parameters.
Details about each family and examples are provided in the following sections.
Even if each family has got its own Maximum Likelihood Estimation function mle()
that
could be called directly, for example cub.mle()
, the function gem.estimate()
provides a
simplified and generalised procedure for MLE.
Similarly, even if each family has got its own Random Sample Drawing function draw()
that
could be called directly, for example cub.draw()
, the function gem.draw()
provides a
simplified and generalised procedure to draw a random sample.
In this manual gem
functions will be used for the examples.
Notice that, the Dissimilarity index is computed for models with covariates also: it should be interpreted as the fraction of the sample to be changed to achive a perfect fit to the estimated average probability distribution (see Introduction Notes).
The last section, shows the basic usage for the tool multicub
.
GeM usage#
GeM (Generalized Mixture) is the main module of cubmods
package, which provides simplified and
generalized functions to both estimate a model from an observed sample and draw a random sample from a
specified model.
The function gem.estimate()
is the main function for the estimation and
validation of a model from an observed sample, calling for the corresponding .mle()
function of
the specified family module, with or without covariates.
The function gem.draw()
is the main function for drawing a random sample from a specified model,
calling for the corresponding .draw()
function of the corresponding family module,
with or without covariates.
The formula syntax#
Both functions need a formula
that is a string specifying the name of the ordinal
variable (before the tilde ~
symbol)
and of the covariates of the components (after the tilde symbol ~
).
Covariates for each component are
separated by the pipeline symbol |
.
The zero symbol 0
indicates no covariates for a certain component.
The one symbol 1
indicates that we want to estimate the parameter of the constant term only.
If more covariates explain a single component, the symbol +
concatenates the names.
Qualitative variables names, must be placed between brackets ()
leaded by a C
,
for example C(varname)
.
Warning
No columns in the DataFrame should be named constant
, 1
or 0
.
In the column names, only letters, numbers, and underscores _
are allowed.
Spaces SHOULD NOT BE used in the column names, but discarded (i.e. replaced di ""
)
or replaced with _
.
Similarly, any other symbol
or special character (for instance, -
, !
, @
, etc) SHOULD BE avoided and discarded
or replaced with underscore _
.
For example, let’s suppose we have a DataFrame where response
is the ordinal variable,
age
and sex
are respectively a quantitative and a qualitative variable to explain the feeling component
only, in a cub
family model. The formula will be formula = "response ~ 0 | age + C(sex)"
.
Note
Python will automatically order qualitative variables in alphanumeric order. So, for
instance, a variable sex
with two categories "M"
and "F"
will be ordered as
["F", "M"]
thus the dummy variabile will be equal to 0
where sex=="F"
and equal
to 1
where otherwise sex=="M"
. Consequently, the estimated parameters will be the
constant
for sex=="F"
and C.sex_M
for sex=="M"
. If you want a different order
for the categorical variables, you must specify it in the DataFrame, for instance with the
pandas
class Categorical
. In the example:
1df["sex"] = pd.Categorical(
2 df["sex"],
3 categories=["M", "F"],
4 ordered=True
5)
Notice that spaces are allowed between symbols and variable names in the formula but they aren’t
needed: a formula "ord ~ X | Y1 + Y2 | Z"
is the same as "ord~X|Y1+Y2|Z"
.
Warning
The number of fields separated by the pipeline |
in a formula MUST BE equal to
the number of parameters specifying the model family. Therefore: two for cub
and cush2
,
three for cube
and cub
with shelter effect, one for cush
and ihg
.
Arguments of estimate
and draw
#
Within the function estimate
the number of ordinal categories m
is internally retrieved if not specified
(taking the maximum observed category)
but it is advisable to pass it as an argument to the call if some category has zero frequency.
Within the function draw
instead, the number of ordinal categories m
will default to 7
if not otherwise specified.
A pandas
DataFrame must always be passed to the function estimate
, with the kwarg
(keyword argument) df
.
It should contain, at least, a column of the observed sample and the columns of the covariates (if any).
If no df
is passed to the function draw
for a model without covariates
instead, an empty DataFrame will be created.
The number n
of ordinal responses to be drawn will default to 500
if not otherwise specified
in the function draw
for models without covariates. For model with covariates instead, n
is not effective because
the number of drawn ordinal responses will be equal to the passed DataFrame rows.
A seed
could be specified for the function draw
to ensure reproducibility.
Notice that, for models with covariates, seed
cannot be 0
(in case, it will be
automatically set to 1
).
If no model
is declared, the function takes "cub"
as default.
Currently implemented models are: "cub"
(default), "cush"
, "cube"
,
and "ihg"
. CUB models with shelter effect are automatically
implemented using model="cub"
and specifying a shelter choice with the
kwarg sh
. CUSH2 models are automatically
implemented using model="cush"
and passing a list of two categories to
the kwarg sh
instead of an integer, for instance sh=[2, 7]
.
To the draw
method, the parameters’ values (with the kwargs of the corresponding
family) must always be passed:
for example, pi
and xi
for CUB models without covariates, beta
and gamma
for CUB models with covariates for both feeling and uncertainty, etc. See the
.draw()
function reference of the corresponding family module for details.
If model="cub"
(or nothing), then a CUB mixture model is fitted to the data to explain uncertainty,
feeling (ordinal~Y|W
) and possible shelter effect by further passing the extra argument sh
for the corresponding category.
Subjects’ covariates can be included by specifying covariates matrices in the
formula as ordinal~Y|W|X
, to explain uncertainty (Y), feeling (W) or shelter (X).
Notice that
covariates for the shelter effect can be included only if specified for both feeling and uncertainty too (GeCUB models)
because, as in the R package CUB
, only the models without covariates and with covariates for all components
have been implemented.
Nevertheless, the symbol 1
could be used to specify a different combination of components with covariates.
For example, if we want to specify a CUB model with the covariate cov
for uncertainty only, we could pass the
formula ordinal ~ cov | 1 | 1
: in this case, for feeling and shelter effect, the constant terms only
(\(\gamma_0\) and \(\omega_0\)) will be estimated and the values of the estimated \(\xi\) and
\(\delta\) could be computed as \(\hat\xi=\mathrm{expit}(\hat\gamma_0)\) and
\(\hat\delta=\mathrm{expit}(\hat\omega_0)\), where \(\mathrm{expit}(x) = 1 / (1 + \exp(-x))\).
See this example for the GeCUB model.
If family="cube"
, then a CUBE mixture model (Combination of Uniform and Beta-Binomial) is fitted to the data
to explain uncertainty, feeling and overdispersion. Subjects’ covariates can also be included to explain the
feeling component or all the three components by specifying covariates matrices in the Formula as
ordinal~Y|W|Z
to explain uncertainty (Y), feeling (W) or
overdispersion (Z). For different combinations of components with covariates, the symbol 1
can be used.
Notice that \(\hat\phi=e^{\hat\alpha_0}\).
If family="ihg"
, then an IHG model is fitted to the data. IHG models (Inverse HyperGeometric) are a peculiar case of
CUBE models, for \(\phi = 1 - \xi\) (Iannario, 2012). The parameter \(\theta\) gives the probability of observing
the first category and is therefore a direct measure of preference, attraction, pleasantness toward the
investigated item. This is the reason why \(\theta\) is customarily referred to as the
preference parameter of the
IHG model. Covariates for the preference parameter \(\theta\) have to be specified
in matrix form in the Formula as ordinal~V
.
If family="cush"
, then a CUSH model is fitted to the data (Combination of Uniform and SHelter effect).
If a category corresponding to the inflation should be
passed via argument sh
a CUSH model is called and
covariates for the shelter parameter \(\delta\)
are specified in matrix form Formula as ordinal~X
.
If two category corresponding to the inflation should be
passed via argument sh
(as a list or array) a CUSH2 model is called and
covariates for the shelters’ parameters \((\delta_1,\delta_2)\)
are specified in matrix form Formula as ordinal~X1|X2
.
Notice that, to specify covariates for a
single shelter choice in a CUSH2 model,
the formula should be ordinal~X1|0
and not ordinal~0|X2
.
Extra arguments include the maximum
number of iterations maxiter
for the optimization algorithm,
the required error tolerance tol
, and a dictionary of parameters of a known model
ass_pars
(assumed parameters) to be compared with the estimates: these could be
the parameters used to draw the sample, theoretical parameters, or howsoever specified
parameters we want to (graphically) compare with the estimates.
Note
The ass_pars
argument is effective for models with covariates too.
The .plot()
method will show the average probability distribution of the model specified
with the assumed parameters.
Methods of estimate
and draw
#
For both functions, the methods .summary()
and .plot()
are always available calling the
main functions to print a summary and plot the results, respectively. For .plot()
arguments
and options, see here the CUBsample
Class
(for object returned by draw
)
and the extended CUBres
Classes of the corresponding
family (for objects returned by estimate
), defined in each family module.
The method .summary()
of objects returned by estimate
function, will print a summary
of the inferential method applied to the observed sample. Along with estimated parameters,
standard errors, Wald tests, and p-values, it will show some model metrics:
Correlation
: available for CUB family models without covariates only; it is the correlation between \(\hat{\pi}\) and \(\hat{\xi}\)Dissimilarity
: the dissimilarity index that can be interpreted as the fraction of sample to be change to achieve a perfect fit given the estimated parameters; available for all models with and without covariates; for models with covariates, the average estimated probability is usedLoglik(sat)
: the log-likelihood of the saturated model; see here and the reference paper Piccolo and Simone, 2019; available for models without covariates onlyLoglik(MOD)
: the log-likelihood of the estimated modelLoglik(uni)
: the log-likelihood of the null model; see here and the reference paper Piccolo and Simone, 2019Mean-loglik
: mean log-likelihood, i.e. the log-likelihood of the estimated model divided by the number \(n\) of observed ordinal responsesDeviance
: Likelihood Ratio Test (LRT) between the saturated and the estimated models; available for models without covariates onlyAIC
: Akaike Information CriterionBIC
: Bayesian Information Criterion.
Calling .as_dataframe()
will return a DataFrame of parameters’ names and values for objects
of the Class CUBsample
returned by draw
. For objects of the extended Base Class CUBres
returned
by estimate
instead, will return a DataFrame with parameters’ component, name, estimated value,
standard error, Wald test statistics and p-value.
Calling the method .save(fname)
the object can be saved on a file called fname.cub.sample
(for draw
) or fname.cub.fit
(for estimate
).
Saved objects can then be loaded using the function general.load_object(fname)
.
See this example.
Attributes of estimate
and draw
#
For both objects returned by estimate
and draw
, the attributes .formula
and
.df
are always available. The function draw
will return the original DataFrame (if provided)
with an extra column of the drawn ordinal response called as specified in the formula.
Many other attributes can be called from objects of the Base Class CUBres
returned by
estimate
, such as the computed loglikelihood, the AIC and BIC, etc. For details,
see here the Base Class CUBres
reference guide.
CUB family#
Basic family of the class CUB. See the references for details: Piccolo, 2003; D'Elia and Piccolo, 2005; Piccolo, 2006; Iannario and Piccolo, 2010; Iannario and Piccolo, 2009; Iannario et al., 2014; Iannario et al., 2022; Piccolo and Simone, 2019.
Without covariates#
A model of the CUB family for responses with \(m\) ordinal categories, without covariates is specified as
where \(\pi\) and \(\xi\) are the parameters for respectively the uncertainty and the feeling components.
Note that \((1-\pi)\) is the weight of the Uncertainty component and \((1-\xi)\) is the Feeling component for common positive wording.
In the following example, a sample will be drawn from a CUB model of \(n=500\) observations of an ordinal
variable with \(m=10\) ordinal categories
and parameters \((\pi=.7, \xi=.2)\). A seed=1
will be set to ensure reproducibility.
Notice that a Dissimilarity index is computed: this should be interpreted as the fraction of the drawn sample to be changed to achieve a perfect fit to the theoretical specified model the sample has been drawn from.
1# import libraries
2import matplotlib.pyplot as plt
3from cubmods.gem import draw
4
5# draw a sample
6drawn = draw(
7 formula="ord ~ 0 | 0",
8 m=10, pi=.7, xi=.2,
9 n=500, seed=1)
10# print the summary of the drawn sample
11print(drawn.summary())
12# show the plot of the drawn sample
13drawn.plot()
14plt.show()
=======================================================================
=====>>> CUB model <<<===== Drawn random sample
=======================================================================
m=10 Sample size=500 seed=1
formula: ord~0|0
-----------------------------------------------------------------------
component parameter value
Uncertainty pi 0.7
Feeling xi 0.2
=======================================================================
Sample metrics
Mean = 7.368000
Variance = 5.687952
Std.Dev. = 2.384943
-----------------------------------------------------------------------
Dissimilarity = 0.0650938
=======================================================================

Notice that, since the default value of the kwarg model
is
"cub"
we do not need to specify it.
Calling drawn.as_dataframe()
will return a DataFrame with
the specified parameters of the theoretical model
component parameter value
0 Uncertainty pi 0.7
1 Feeling xi 0.2
Using the previously drawn sample, in the next example the parameters \((\hat\pi, \hat\xi)\) will be estimated.
Note that in the function gem.estimate
:
df
needs to be apandas
DataFrame; the attributedrawn.df
will return a DataFrame withord
as column name of the drawn ordinal response (as previuosly speficied in the formula)formula
needs the ordinal variable name (ord
in this case) and the covariates for each component (none in this case, so"0|0"
)if
m
is not provided, the maximum observed ordinal value will be assumed and a warning will be raisedwith
ass_pars
dictionary, the parameters of a known model (if any) can be specified; in this case, we’ll specify the known parameters used to draw the sample
1# inferential method on drawn sample
2fit = estimate(
3 df=drawn.df,
4 formula="ord~0|0",
5 ass_pars={
6 "pi": drawn.pars[0],
7 "xi": drawn.pars[1]
8 }
9)
10# print the summary of MLE
11print(fit.summary())
12# show the plot of MLE
13fit.plot()
14plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUB model <<<===== ML-estimates
=======================================================================
m=10 Size=500 Iterations=13 Maxiter=500 Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
pi 0.675 0.0340 19.872 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
xi 0.188 0.0090 20.808 0.0000
-----------------------------------------------------------------------
Correlation = 0.2105
=======================================================================
Dissimilarity = 0.0599
Loglik(sat) = -994.063
Loglik(MOD) = -1000.111
Loglik(uni) = -1151.293
Mean-loglik = -2.000
Deviance = 12.096
-----------------------------------------------------------------------
AIC = 2004.22
BIC = 2012.65
=======================================================================
Elapsed time=0.00202 seconds =====>>> Thu Sep 26 18:00:53 2024
=======================================================================

See here the reference guide
of general
module and the reference paper
Piccolo and Simone, 2019
for details about log-likelihoods,
deviance and information criteria.
Calling fit.as_dataframe()
will return a DataFrame with
parameters’ estimated values and standard errors
component parameter estimate stderr wald pvalue
0 Uncertainty pi 0.67476 0.033954 19.872485 7.042905e-88
1 Feeling xi 0.18817 0.009043 20.807551 3.697579e-96
As an example, we can now save the fit
object to file. By default,
it will be saved as a pickle
file.
1fit.save(fname="cub_mle_results")
The previous code, will save a file cub_mle_results.cub.fit
.
We can then load the saved file with the code
1from cubmods.general import load_object
2
3myfit = load_object("cub_mle_results.cub.fit")
and we can apply to myfit
the same methods and attributes of the original fit
object.
With covariates#
All three combinations of covariates has been implemented for CUB family in both Python and R: for uncertainty only, for feeling only, and for both.
Here we’ll show an example with covariates for feeling only.
First of all, we’ll draw a random sample with two covariates for the feeling component:
W1
and W2
. Note that, having two covariates, we’ll need three \(\gamma\) parameters,
to consider the constant term too.
1# import libraries
2import numpy as np
3import pandas as pd
4import matplotlib.pyplot as plt
5from cubmods.gem import draw, estimate
6
7# Draw a random sample
8n = 1000
9np.random.seed(1)
10W1 = np.random.randint(1, 10, n)
11np.random.seed(42)
12W2 = np.random.random(n)
13df = pd.DataFrame({
14 "W1": W1, "W2": W2
15})
16drawn = draw(
17 formula="response ~ 0 | W1 + W2",
18 df=df,
19 m=10, n=n,
20 pi=0.8,
21 gamma=[2.3, 0.2, -5],
22)
23# print the summary
24print(drawn.summary())
=======================================================================
=====>>> CUB(0W) model <<<===== Drawn random sample
=======================================================================
m=10 Sample size=1000 seed=None
formula: res~0|W1+W2
-----------------------------------------------------------------------
component parameter value
Uncertainty pi 0.8
Feeling constant 2.3
Feeling W1 0.2
Feeling W2 -5.0
=======================================================================
Sample metrics
Mean = 4.566000
Variance = 8.089734
Std.Dev. = 2.844246
-----------------------------------------------------------------------
Dissimilarity = 0.0307673
=======================================================================
1# plot the drawn sample
2drawn.plot()
3plt.show()

1# print the parameters' values
2print(drawn.as_dataframe())
component parameter value
0 Uncertainty pi 0.8
1 Feeling constant 2.3
2 Feeling W1 0.2
3 Feeling W2 -5.0
1# print the updated DataFrame
2print(drawn.df)
W1 W2 res
0 6 0.374540 2
1 9 0.950714 7
2 6 0.731994 8
3 1 0.598658 8
4 1 0.156019 4
.. .. ... ...
995 3 0.091582 2
996 9 0.917314 9
997 4 0.136819 1
998 7 0.950237 3
999 8 0.446006 2
[1000 rows x 3 columns]
Finally, we’ll call estimate
to estimate the parameters
given the observed (actually, drawn) sample.
We’ll pass the parameters used to drawn the sample with
ass_pars
(as a dictionary) to graphically compare the
assumed and the estimated average probability distribution.
1# MLE estimation
2fit = estimate(
3 formula="res ~ 0 | W1+W2",
4 df=drawn.df,
5 ass_pars={
6 "pi": drawn.pars[0],
7 "gamma": drawn.pars[1:]
8 }
9)
10# Print MLE summary
11print(fit.summary())
12# plot the results
13fit.plot()
14plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUB(0W) model <<<===== ML-estimates
=======================================================================
m=10 Size=1000 Iterations=18 Maxiter=500 Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
pi 0.800 0.0198 40.499 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
constant 2.353 0.1001 23.514 0.0000
W1 0.194 0.0138 14.034 0.0000
W2 -5.076 0.1454 -34.909 0.0000
=======================================================================
Dissimilarity = 0.0292
Loglik(MOD) = -1807.052
Loglik(uni) = -2302.585
Mean-loglik = -1.807
-----------------------------------------------------------------------
AIC = 3622.10
BIC = 3641.74
=======================================================================
Elapsed time=0.09656 seconds =====>>> Thu Aug 15 18:31:21 2024
=======================================================================

CUBSH family#
Basic family of the class CUB with shelter effect.
See the references for details: Corduas et al., 2009; Iannario, 2012; Piccolo and Simone, 2019.
Without covariates#
A model of the CUB family with shelter effect for responses with \(m\) ordinal categories, without covariates is specified as
where \(\pi\) and \(\xi\) are the parameters for respectively the uncertainty and the feeling components, and \(\delta\) is the weight of the shelter effect.
Other parametrizations have been proposed, such as
where
See Piccolo and Simone, 2019 (pp 412-413) for the parameters’ interpretation.
Another parametrization, particularly useful for inferential issues is
where
See the references for further details.
In the next example, we’ll draw an ordinal response and then estimate the parameters given the sample.
1# import libraries
2import matplotlib.pyplot as plt
3from cubmods.gem import draw, estimate
4
5# draw a sample
6drawn = draw(
7 formula="ord ~ 0 | 0 | 0",
8 m=7, sh=1,
9 pi=.8, xi=.4, delta=.15,
10 n=1500, seed=42)
11
12print(drawn.as_dataframe())
component parameter value
0 Uniform pi1 0.68
1 Binomial pi2 0.17
2 Feeling xi 0.40
3 Uncertainty *pi 0.80
4 Shelter *delta 0.15
Notice that:
since
"cub"
is default value of the kwargmodel
, we do not need to specify itwe’ll pass to
estimate
kwarg values taken from the objectdrawn
The method .plot()
(of the fit
object)
shows, in the parameters space, the trivariate confidence ellipsoid too, which has not
been implemented yet in the CUB
package in R.
The plot includes the marginal bivariate confidence ellipses too. Notice that, as proven in
Pierini, 2024 pp 28-30, the confidence level of the marginal ellipses is greater
than the ellipsoid’s confidence level. Indeed, the radius \(r\) of a
standardized sphere at confidence
level \((1-\alpha_3)\) is equal to \(r = \sqrt{ F^{-1}_{\chi^2_{(3)}}(1-\alpha_3) }\), thus
the confidence level of the bivariate marginal ellipses (which are sections of trivariate
cylinders) is \((1-\alpha_2) = F_{\chi^2_{(2)}}(r^2)\).
1# inferential method on drawn sample
2fit = estimate(
3 df=drawn.df, sh=drawn.sh,
4 formula=drawn.formula,
5 ass_pars={
6 "pi1": drawn.pars[0],
7 "pi2": drawn.pars[1],
8 "xi": drawn.pars[2],
9 }
10)
11# print the summary of MLE
12print(fit.summary())
13# show the plot of MLE
14fit.plot()
15plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBSH model <<<===== ML-estimates
=======================================================================
m=7 Shelter=1 Size=1500 Iterations=59 Maxiter=500 Tol=1E-04
-----------------------------------------------------------------------
Alternative parametrization
Estimates StdErr Wald p-value
pi1 0.661 0.0307 21.508 0.0000
pi2 0.174 0.0344 5.041 0.0000
xi 0.388 0.0077 50.592 0.0000
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
pi 0.792 0.0400 19.813 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
xi 0.388 0.0077 50.592 0.0000
-----------------------------------------------------------------------
Shelter effect
Estimates StdErr Wald p-value
delta 0.166 0.0116 14.327 0.0000
=======================================================================
Dissimilarity = 0.0049
Loglik(sat) = -2734.302
Loglik(MOD) = -2734.433
Loglik(uni) = -2918.865
Mean-loglik = -1.823
Deviance = 0.263
-----------------------------------------------------------------------
AIC = 5474.87
BIC = 5490.81
=======================================================================

With covariates#
Only the model with covariates for all components (GeCUB) has been
currently defined and implemented, as in the R package CUB
.
Nevertheless, thanks to the symbol 1
provided by the
formula, we can specify a different combination
of covariates.
For example, we’ll specifiy a model CUB with shelter effect,
with covariates for uncertainty only. We’ll use the function
logit
to have better ‘control’ of the parameters values,
because \(\gamma_0 = \mathrm{logit}(\xi)\) and
similarly for \(\pi\) and \(\delta\).
1# import libraries
2import numpy as np
3import pandas as pd
4import matplotlib.pyplot as plt
5from cubmods.general import expit, logit
6from cubmods.gem import draw, estimate
7
8# Draw a random sample
9n = 1000
10np.random.seed(1)
11W1 = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13 "W1": W1,
14})
15drawn = draw(
16 formula="fee ~ W1 | 1 | 1",
17 df=df,
18 m=9, sh=2,
19 beta=[logit(.8), -.2],
20 gamma=[logit(.3)],
21 omega=[logit(.12)],
22)
23
24# MLE estimation
25fit = estimate(
26 formula="fee ~ W1 | 1 | 1",
27 df=drawn.df, sh=2,
28 ass_pars={
29 "beta":[logit(.8), -.2],
30 "gamma":[logit(.3)],
31 "omega":[logit(.12)],
32 }
33)
34# Print MLE summary
35print(fit.summary())
36# plot the results
37fit.plot()
38plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBSH(YWX) model <<<===== ML-estimates
=======================================================================
m=9 Shelter=2 Size=1000 Iterations=25 Maxiter=500 Tol=1E-04
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
constant 0.992 0.3314 2.994 0.0028
W1 -0.127 0.0569 -2.228 0.0259
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
constant -0.902 0.0381 -23.662 0.0000
-----------------------------------------------------------------------
Shelter effect
Estimates StdErr Wald p-value
constant -2.074 0.1260 -16.462 0.0000
=======================================================================
Dissimilarity = 0.0139
Loglik(MOD) = -2069.978
Loglik(uni) = -2197.225
Mean-loglik = -2.070
-----------------------------------------------------------------------
AIC = 4147.96
BIC = 4167.59
=======================================================================
Elapsed time=1.43850 seconds =====>>> Thu Aug 15 19:39:49 2024
=======================================================================

To get the estimated values of \(\hat\xi\) and \(\hat\delta\)
we can use the function expit
because \(\hat\xi = \mathrm{expit}(\hat\gamma_0)\)
and similarly for \(\hat\delta\). Then, we can use the delta-method
to compute the standard errors of both \(\hat\xi\) and \(\hat\delta\), for instance
\(\widehat{es}(\xi) = \mathrm{expit}[\hat\gamma_0+\widehat{es}(\gamma_0)] - \hat\xi\).
1est_xi = expit(fit.estimates[2])
2est_de = expit(fit.estimates[3])
3est_xi_se = expit(fit.estimates[2]+fit.stderrs[2]) - est_xi
4est_de_se = expit(fit.estimates[3]+fit.stderrs[3]) - est_de
5print(
6 " estimates stderr\n"
7 f"xi {est_xi:.4f} {est_xi_se:.4f}"
8 "\n"
9 f"delta {est_de:.4f} {est_de_se:.4f}"
10)
estimates stderr
xi 0.2886 0.0079
delta 0.1116 0.0131
which, in fact, match the values used to draw the sample.
CUSH family#
Basic family of the class CUSH with a single shelter effect.
See the references for details: Capecchi and Piccolo, 2017; Piccolo and Simone, 2019.
Without covariates#
In the example, we’ll draw a sample from a CUSH model without covariates and then estimate the parameter \(\delta\) given the observed sample.
Notice that, since the model
is not the default "cub"
, we need to specify it.
1# import libraries
2import matplotlib.pyplot as plt
3from cubmods.gem import draw, estimate
4
5# draw a sample
6drawn = draw(
7 formula="ord ~ 0",
8 model="cush",
9 sh=7,
10 m=7, delta=.15,
11 n=1500, seed=76)
12
13# inferential method on drawn sample
14fit = estimate(
15 df=drawn.df,
16 model="cush",
17 formula="ord~0",
18 sh=7,
19 ass_pars={
20 "delta": drawn.pars[0],
21 }
22)
23# print the summary of MLE
24print(fit.summary())
25# show the plot of MLE
26fit.plot()
27plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH model <<<===== ML-estimates
=======================================================================
m=7 Shelter=7 Size=1500
-----------------------------------------------------------------------
Shelter effect
Estimates StdErr Wald p-value
delta 0.124 0.0130 9.532 0.0000
=======================================================================
Dissimilarity = 0.0236
Loglik(sat) = -2856.039
Loglik(MOD) = -2859.923
Loglik(uni) = -2918.865
Mean-loglik = -1.907
Deviance = 7.768
-----------------------------------------------------------------------
AIC = 5721.85
BIC = 5727.16
=======================================================================
Elapsed time=0.00113 seconds =====>>> Fri Aug 16 10:44:07 2024
=======================================================================

With covariates#
In the example, we’ll draw a sample from a CUSH model with covariates and then estimate the parameter given the observed sample.
Notice that, since the model
is not the default "cub"
, we need to specify it.
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cubmods.general import logit
from cubmods.gem import draw, estimate
# Draw a random sample
n = 1000
np.random.seed(1)
X = np.random.randint(1, 10, n)
df = pd.DataFrame({
"X": X,
})
drawn = draw(
formula="fee ~ X",
model="cush",
df=df,
m=9, sh=5,
omega=[logit(.05), .2],
)
# MLE estimation
fit = estimate(
formula="fee ~ X",
model="cush",
df=drawn.df, sh=5,
ass_pars={
"omega": drawn.pars
}
)
# Print MLE summary
print(fit.summary())
# plot the results
fit.plot()
plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH(X) model <<<===== ML-estimates
=======================================================================
m=9 Shelter=5 Size=1000
-----------------------------------------------------------------------
Shelter effect
Estimates StdErr Wald p-value
constant -3.131 0.4361 -7.180 0.0000
X 0.229 0.0629 3.642 0.0003
=======================================================================
Dissimilarity = 0.0395
Loglik(MOD) = -2130.030
Loglik(uni) = -2197.225
Mean-loglik = -2.130
-----------------------------------------------------------------------
AIC = 4264.06
BIC = 4273.87
=======================================================================
Elapsed time=0.01704 seconds =====>>> Fri Aug 16 10:54:11 2024
=======================================================================

CUSH2 family#
Family of the class CUSH with two shelter effects (CUSH2).
This family has been introduced by Pierini, 2024 (pp 16-20) and first implemented in this Python package. See Piccolo and Simone, 2019 as a reference for the CUB class models.
These models are particularly useful whenever the shelter choices are not polarized, i.e. they’re not at the extremes of the ordinal variable support. In case of polarized responses, finite mixtures based on the Discretized Beta distribution (see Simone and Tutz, 2018 and Simone, 2022) can be used, which have not been implemented in this package yet.
Without covariates#
In the example, we’ll draw a sample from a CUSH2 model without covariates and then estimate the parameters given the observed sample.
Notice that, since the model
is not the default "cub"
, we need to specify it.
Passing a list of two shelter categories with the kwarg sh
, a CUSH2 model will be
called.
1# import libraries
2import matplotlib.pyplot as plt
3from cubmods.gem import draw, estimate
4
5# draw a sample
6drawn = draw(
7 formula="ord ~ 0 | 0",
8 model="cush",
9 sh=[1,4],
10 m=7,
11 delta1=.15, delta2=.1,
12 n=1000, seed=42)
13
14# inferential method on drawn sample
15fit = estimate(
16 df=drawn.df,
17 model="cush",
18 formula="ord~0|0",
19 sh=drawn.sh,
20 ass_pars={
21 "delta1": drawn.pars[0],
22 "delta2": drawn.pars[1],
23 }
24)
25# print the summary of MLE
26print(fit.summary())
27# show the plot of MLE
28fit.plot()
29plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH2 model <<<===== ML-estimates
=======================================================================
m=7 Shelter=[1 4] Size=1000
-----------------------------------------------------------------------
Shelter effects
Estimates StdErr Wald p-value
delta1 0.172 0.0149 11.512 0.0000
delta2 0.113 0.0163 6.930 0.0000
=======================================================================
Dissimilarity = 0.0176
Loglik(sat) = -1849.206
Loglik(MOD) = -1850.709
Loglik(uni) = -1945.910
Mean-loglik = -1.851
Deviance = 3.006
-----------------------------------------------------------------------
AIC = 3705.42
BIC = 3715.23
=======================================================================
Elapsed time=0.00247 seconds =====>>> Fri Sep 27 11:32:02 2024
=======================================================================

With covariates#
Two CUSH2 models with covariates have been defined and implemented: for the first shelter choice only and for both.
In this example we’ll draw a sample from a CUSH2 model with
covariates for the first shelter choice only and will then
estimate the parameters with a CUSH2 model with covariates
for both shelter choices but using the symbol 1
in the
formula for the second shelter choice to estimate the
constant parameter only. This is usually not needed, but
we do it here to confirm that \(\mathrm{expit}(\hat\omega_{20})=\hat\delta_2\).
Notice that, since the model
is not the default "cub"
, we need to specify it.
1# import libraries
2import numpy as np
3import pandas as pd
4import matplotlib.pyplot as plt
5from cubmods.general import logit, expit
6from cubmods.gem import draw, estimate
7
8# Draw a random sample
9n = 1000
10np.random.seed(1)
11X = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13 "X": X,
14})
15drawn = draw(
16 formula="fee ~ X | 0",
17 model="cush",
18 df=df,
19 m=9, sh=[2, 8],
20 omega1=[logit(.05), .2],
21 delta2=.1
22)
23
24# MLE estimation
25fit = estimate(
26 formula="fee ~ X | 1",
27 model="cush",
28 df=drawn.df, sh=[2, 8],
29 ass_pars={
30 "omega1": drawn.pars[:2],
31 "omega2": [logit(drawn.pars[-1])]
32 }
33)
34# Print MLE summary
35print(fit.summary())
36# plot the results
37fit.plot()
38plt.show()
39
40est_de2 = expit(fit.estimates[2])
41est_de2_es = expit(fit.estimates[2]+fit.stderrs[2]) - est_de2
42print(
43 " estimates stderr\n"
44 f"delta2 {est_de2:.4f} {est_de2_es:.4f}"
45)
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUSH2(X1,X2) model <<<===== ML-estimates
=======================================================================
m=9 Shelter=[2 8] Size=1000
-----------------------------------------------------------------------
Shelter effect 1
Estimates StdErr Wald p-value
constant -3.170 0.4216 -7.519 0.0000
X 0.207 0.0613 3.379 0.0007
-----------------------------------------------------------------------
Shelter effect 2
Estimates StdErr Wald p-value
constant -2.276 0.1609 -14.149 0.0000
=======================================================================
Dissimilarity = 0.0305
Loglik(MOD) = -2122.463
Loglik(uni) = -2197.225
Mean-loglik = -2.122
-----------------------------------------------------------------------
AIC = 4250.93
BIC = 4265.65
=======================================================================
Elapsed time=0.06553 seconds =====>>> Fri Aug 16 11:29:11 2024
=======================================================================

estimates stderr
delta2 0.0931 0.0145
Notice that, as proven by Iannario, 2012 (pp 7-8), CUB models with shelter effect generate a perfect fit at \(R=c\). It can be easily proven that CUSH2 models too generate perfect fits at both \(R=c_1\) and \(R=c_2\). Indeed, we can also graphically see that the estimated probability distribution is closer to the observed sample than the assumed model used to draw the sample, because of the perfect fits generated at \(R=2\) and \(R=8\).
CUBE family#
Family of the class CUBE (Combination of Uniform and BEtaBinomial). CUB models are nested into CUBE models: in fact, a CUB model is equal to a CUBE model with the overdispersion parameter \(\phi=0\). Notiche that \(0\leq\phi\leq0.2\) is the usual range of the overdispersion parameter.
See the references for details: Iannario, 2014; Piccolo, 2015; Piccolo and Simone, 2019.
Without covariates#
In this example, we’ll draw a sample from a CUBE model and then will estimate the parameters given the observed sample.
Notice that, since the model
is not the default "cub"
, we need to specify it.
The .plot()
method of the object fit
will show trivariate and bivariate confidence
regions too, as in CUBSH models. See here for the values of confidence levels.
1# import libraries
2import matplotlib.pyplot as plt
3from cubmods.gem import draw, estimate
4
5# draw a sample
6drawn = draw(
7 formula="ord ~ 0 | 0 | 0",
8 model="cube",
9 m=9, pi=.7, xi=.3, phi=.15,
10 n=500, seed=1)
11
12# inferential method on drawn sample
13fit = estimate(
14 df=drawn.df,
15 formula="ord~0|0|0",
16 model="cube",
17 ass_pars={
18 "pi": drawn.pars[0],
19 "xi": drawn.pars[1],
20 "phi": drawn.pars[2],
21 }
22)
23# print the summary of MLE
24print(fit.summary())
25# show the plot of MLE
26fit.plot()
27plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE model <<<===== ML-estimates
=======================================================================
m=9 Size=500 Iterations=62 Maxiter=1000 Tol=1E-06
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
pi 0.577 0.0633 9.108 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
xi 0.251 0.0217 11.560 0.0000
-----------------------------------------------------------------------
Overdispersion
Estimates StdErr Wald p-value
phi 0.111 0.0402 2.754 0.0059
=======================================================================
Dissimilarity = 0.0426
Loglik(sat) = -1037.855
Loglik(MOD) = -1041.100
Loglik(uni) = -1098.612
Mean-loglik = -2.082
Deviance = 6.491
-----------------------------------------------------------------------
AIC = 2088.20
BIC = 2100.84
=======================================================================
Elapsed time=0.07919 seconds =====>>> Fri Aug 16 12:18:49 2024
=======================================================================

With covariates#
Currently, as in the R package CUB
, two CUBE models with covariates have been defined and implemented:
for the feeling only and for all components.
Nevertheless, the symbol 1
can always be used in the
formula for different combinations of covariates.
In this example, we’ll draw a sample with covariates for feeling only and then will estimate the parameters given the observed sample.
1# import libraries
2import numpy as np
3import pandas as pd
4import matplotlib.pyplot as plt
5from cubmods.general import expit, logit
6from cubmods.gem import draw, estimate
7
8# Draw a random sample
9n = 1000
10np.random.seed(76)
11W = np.random.randint(1, 10, n)
12df = pd.DataFrame({
13 "W": W,
14})
15drawn = draw(
16 formula="fee ~ 0 | W | 0",
17 model="cube",
18 df=df,
19 m=9,
20 pi=.8,
21 gamma=[logit(.3), -.1],
22 phi=.12,
23)
24
25# MLE estimation
26fit = estimate(
27 formula="fee ~ 0 | W | 0",
28 model="cube",
29 df=drawn.df,
30 ass_pars={
31 "pi": drawn.pars[0],
32 "gamma": drawn.pars[1:-1],
33 "phi": drawn.pars[-1]
34 }
35)
36# Print MLE summary
37print(fit.summary())
38# plot the results
39fit.plot()
40plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE(0W0) model <<<===== ML-estimates
=======================================================================
m=9 Size=1000
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
pi 0.815 0.0343 23.733 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
constant -0.770 0.1012 -7.612 0.0000
W -0.116 0.0191 -6.052 0.0000
-----------------------------------------------------------------------
Overdisperson
Estimates StdErr Wald p-value
phi 0.150 0.0260 5.779 0.0000
=======================================================================
Dissimilarity = 0.0183
Loglik(MOD) = -1886.654
Loglik(uni) = -2197.225
Mean-loglik = -1.887
-----------------------------------------------------------------------
AIC = 3781.31
BIC = 3800.94
=======================================================================
Elapsed time=2.30903 seconds =====>>> Fri Aug 16 12:31:10 2024
=======================================================================

Notice that the same results can be achieved using a CUBE
model with covariates for all components and passing
the symbol 1
to the uncertainty and overdispersion
components.
1# MLE estimation
2fit = estimate(
3 formula="fee ~ 1 | W | 1",
4 model="cube",
5 df=drawn.df,
6 ass_pars={
7 "beta": [logit(drawn.pars[0])],
8 "gamma": drawn.pars[1:3],
9 "alpha": [np.log(drawn.pars[3])]
10 }
11)
12# Print MLE summary
13print(fit.summary())
14# plot the results
15fit.plot()
16plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> CUBE(YWZ) model <<<===== ML-estimates
=======================================================================
m=9 Size=1000 Iterations=29 Maxiter=1000 Tol=1E-02
-----------------------------------------------------------------------
Uncertainty
Estimates StdErr Wald p-value
constant 1.423 0.2183 6.518 0.0000
-----------------------------------------------------------------------
Feeling
Estimates StdErr Wald p-value
constant -0.778 0.1018 -7.639 0.0000
W -0.117 0.0193 -6.074 0.0000
-----------------------------------------------------------------------
Overdispersion
Estimates StdErr Wald p-value
constant -1.930 0.1756 -10.989 0.0000
=======================================================================
Dissimilarity = 0.0239
Loglik(MOD) = -1886.690
Loglik(uni) = -2197.225
Mean-loglik = -1.887
-----------------------------------------------------------------------
AIC = 3781.38
BIC = 3801.01
=======================================================================
Elapsed time=50.02969 seconds =====>>> Fri Aug 16 12:33:36 2024
=======================================================================

In fact:
1est_pi = expit(fit.estimates[0])
2est_ph = np.exp(fit.estimates[3])
3est_pi_se = expit(fit.estimates[0]+fit.stderrs[0]) - est_pi
4est_ph_se = np.exp(fit.estimates[3]+fit.stderrs[3]) - est_ph
5print(
6 " estimates stderr\n"
7 f"pi {est_pi:.4f} {est_pi_se:.4f}"
8 "\n"
9 f"phi {est_ph:.4f} {est_ph_se:.4f}"
10)
estimates stderr
pi 0.8058 0.0319
phi 0.1451 0.0279
IHG family#
Family of the class IHG (Inverse HyperGeometric).
See the references for details: D’Elia, 2003; D'Elia et al., 2005; Piccolo and Simone, 2019.
Without covariates#
which is equivalent to
In this example, we’ll draw a sample from an IHG model and the estimate the parameter from the observed sample.
# import libraries
import matplotlib.pyplot as plt
from cubmods.gem import draw, estimate
# draw a sample
drawn = draw(
formula="ord ~ 0",
model="ihg",
m=10, theta=.2,
n=500, seed=42)
# inferential method on drawn sample
fit = estimate(
df=drawn.df,
formula="ord ~ 0",
model="ihg",
ass_pars={
"theta": drawn.pars[0],
}
)
# print the summary of MLE
print(fit.summary())
# show the plot of MLE
fit.plot()
plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> IHG model <<<===== ML-estimates
=======================================================================
m=10 Size=500
-----------------------------------------------------------------------
Theta
Estimates StdErr Wald p-value
theta 0.200 0.0086 23.292 0.0000
=======================================================================
Dissimilarity = 0.0639
Loglik(sat) = -1044.100
Loglik(MOD) = -1050.513
Loglik(uni) = -1151.293
Mean-loglik = -2.101
Deviance = 12.824
-----------------------------------------------------------------------
AIC = 2103.03
BIC = 2107.24
=======================================================================
Elapsed time=0.00464 seconds =====>>> Fri Aug 16 12:47:55 2024
=======================================================================

With covariates#
In this example we’ll draw a sample from an IHG with two covariates and then will estimate the parameters given the observed sample. Notice that IHG models without covariates are unimodals but, however, IHG models with covariates can be bimodal, as the one in the following example.
1# import libraries
2import numpy as np
3import pandas as pd
4import matplotlib.pyplot as plt
5from cubmods.gem import draw, estimate
6from cubmods.general import logit
7
8# Draw a random sample
9n = 1000
10np.random.seed(1)
11V1 = np.random.random(n)
12np.random.seed(42)
13V2 = np.random.random(n)
14df = pd.DataFrame({
15 "V1": V1, "V2": V2
16})
17
18# draw a sample
19drawn = draw(
20 df=df,
21 formula="ord ~ V1 + V2",
22 model="ihg",
23 m=10,
24 nu=[logit(.1), -2, 3],
25 seed=42)
26
27# inferential method on drawn sample
28fit = estimate(
29 df=drawn.df,
30 formula=drawn.formula,
31 model="ihg",
32 ass_pars={
33 "nu": drawn.pars,
34 }
35)
36# print the summary of MLE
37print(fit.summary())
38# show the plot of MLE
39fit.plot()
40plt.show()
warnings.warn("No m given, max(ordinal) has been taken")
=======================================================================
=====>>> IHG(V) model <<<===== ML-estimates
=======================================================================
m=10 Size=1000
-----------------------------------------------------------------------
Theta
Estimates StdErr Wald p-value
constant -2.368 0.0998 -23.741 0.0000
V1 -1.973 0.1438 -13.721 0.0000
V2 3.230 0.1451 22.261 0.0000
=======================================================================
Dissimilarity = 0.0455
Loglik(MOD) = -1958.475
Loglik(uni) = -2302.585
Mean-loglik = -1.958
-----------------------------------------------------------------------
AIC = 3922.95
BIC = 3937.67
=======================================================================
Elapsed time=1.10664 seconds =====>>> Fri Aug 16 12:53:12 2024
=======================================================================

MULTICUB#
See the Piccolo and Simone, 2019 as a reference.
With the multicub tool, parameters estimated from multiple observed samples can be shown in a single plot.
In this example, we’ll draw three samples from CUBE
models and manually add a shelter category. Then we’ll
use the multicub tool for CUB models, CUBE models and
CUBSH models (that aren’t yet implemented in the R package CUB
for the multicub tool).
Notice that, since the samples are drawn from a “CUBE model with shelter effect” (which has not been implemented yet), the estimated parameters’ values will differ from the theoretical ones of the speficied CUBE model used to draw the sample.
The multicub tool in cubmods
package can also show confidence
ellipses for CUB models.
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4from cubmods.gem import draw
5from cubmods.multicub import multi
6
7# draw random samples
8df = pd.DataFrame()
9for i, (pi, xi, phi) in enumerate(
10 zip([.9, .8, .7], [.3, .5, .7], [.05, .1, .15])
11 ):
12 drawn = draw(
13 formula="ord ~ 0 | 0 | 0",
14 m = 9, model="cube", n=1000,
15 pi=pi, xi=xi, phi=phi,
16 seed=1976
17 )
18 # add a shelter category at c=1
19 df[f"ord{i+1}"] = np.concatenate((
20 drawn.rv, np.repeat(1, 25)
21 ))
22
23# MULTI-CUB
24multi(
25 ords=df, ms=9, model="cub"
26)
27plt.show()
28# MULTI-CUBE
29multi(
30 ords=df, ms=9, model="cube"
31)
32plt.show()
33# MULTI-CUBSH
34multi(
35 ords=df, ms=9, model="cub", shs=1,
36 pos=[1, 6, 2]
37)
38plt.show()


