BIP.Bayes¶
Basic Likelihood tools such as functions for computing likelihoods, Latin Hypercube sampling (efficient random sampling) and other tools which don’t belong on other packages, or apply to multiple packages.
Melding Module¶
-
class
BIP.Bayes.Melding.
FitModel
(K, model, inits, tf, thetanames, phinames, wl=1, nw=1, verbose=False, burnin=1000, constraints=[])¶ Fit a model to data generating Bayesian posterior distributions of inputs and outputs of the model. Fitting process can be monitored via a curses interface.
-
AIC_from_RSS
()¶ Calculates the Akaike information criterion from the residual sum of squares of the best fitting run.
-
do_inference
(prior, data, predlen, method, likvar, likfun=<functools._lru_cache_wrapper object>)¶ Call the samplers an do the actual inference
Parameters: - likfun – Likelihood function
- prior –
- data –
- predlen –
- method –
- likvar –
-
optimize
(data, p0, optimizer='scipy', tol=0.0001, verbose=0, plot=0)¶ Finds best parameters using an optimization approach
Parameters: - data: Dictionary of observed series
- p0: Sequence (list or tuple) of initial values for the parameters
- optimizer: Optimization library to use: ‘scipy’: fmin (Nelder-Mead) or ‘oo’:OpenOpt.NLP
- tol: Tolerance of the error
- verbose: If true show stats of the optimization run at the end
- plot: If true plots a run based on the optimized parameters.
-
plot_results
(names=[], dbname='results', savefigs=0)¶ Plot the final results of the inference
-
prior_sample
()¶ Generates a set of samples from the starting theta prior distributions for reporting purposes.
Returns: Dictionary with (name,sample) pairs
-
run
(data, method, likvar, likfun='Normal', pool=False, adjinits=True, ew=0, dbname='results', monitor=False, initheta=[])¶ Fit the model against data
Parameters: - data: dictionary with variable names and observed series, as Key and value respectively.
- method: Inference method: “ABC”, “SIR”, “MCMC” or “DREAM”
- likfun : Likelihood function to be used: currently suported: “Normal” and “Poisson”.
- likvar: Variance of the likelihood function in the SIR and MCMC method
- pool: Pool priors on model’s outputs.
- adjinits: whether to adjust inits to data
- ew: Whether to use expanding windows instead of moving ones.
- dbname: name of the sqlite3 database
- monitor: Whether to monitor certains variables during the inference. If not False, should be a list of valid phi variable names.
- initheta: starting position in parameter space for the sampling to start. (only used by MCMC and DREAM)
-
set_priors
(tdists, tpars, tlims, pdists, ppars, plims)¶ Set the prior distributions for Phi and Theta
Parameters: - pdists: distributions for the output variables. For example: [scipy.stats.uniform,scipy.stats.norm]
- ppars: paramenters for the distributions in pdists. For example: [(0,1),(0,1)]
- plims: Limits of the range of each phi. List of (min,max) tuples.
- tdists: same as pdists, but for input parameters (Theta).
- tpars: same as ppars, but for tdists.
- tlims: Limits of the range of each theta. List of (min,max) tuples.
-
-
class
BIP.Bayes.Melding.
Meld
(K, L, model, ntheta, nphi, alpha=0.5, verbose=0, viz=False)¶ Bayesian Melding class
-
abcRun
(fitfun=None, data={}, t=1, pool=False, savetemp=False)¶ Runs the model for inference through Approximate Bayes Computation techniques. This method should be used as an alternative to the sir.
Parameters: - fitfun: Callable which will return the goodness of fit of the model to data as a number between 0-1, with 1 meaning perfect fit
- t: number of time steps to retain at the end of the of the model run for fitting purposes.
- data: dict containing observed time series (lists of length t) of the state variables. This dict must have as many items the number of state variables, with labels matching variables names. Unorbserved variables must have an empty list as value.
- pool: if True, Pools the user provided priors on the model’s outputs, with the model induced priors.
- savetemp: Should temp results be saved. Useful for long runs. Alows for resuming the simulation from last sa
-
add_salt
(dataset, band)¶ Adds a few extra uniformly distributed data points beyond the dataset range. This is done by adding from a uniform dist.
Parameters: - dataset: vector of data
- band: Fraction of range to extend [0,1[
Returns: Salted dataset.
-
current_plot
(series, data, idx, vars=[], step=0)¶ Plots the last simulated series along with data
Parameters: - series: Record array with the simulated series.
- idx: Integer index of the curve to plot .
- data: Dictionary with the full dataset.
- vars: List with variable names to be plotted.
- step: Step of the chain
-
filtM
(cond, x, limits)¶ Multiple condition filtering. Remove values in x[i], if corresponding values in cond[i] are less than limits[i][0] or greater than limits[i][1].
Parameters: - cond: is an array of conditions.
- limits: is a list of tuples (ll,ul) with length equal to number of lines in cond and x.
- x: array to be filtered.
-
getPosteriors
(t)¶ Updates the posteriors of the model’s output for the last t time steps. Returns two record arrays: - The posteriors of the Theta - the posterior of Phi last t values of time-series. self.L by t arrays.
Parameters: - t: length of the posterior time-series to return.
-
imp_sample
(n, data, w)¶ Importance sampling
Returns: returns a sample of size n
-
logPooling
(phi)¶ Returns the probability associated with each phi[i] on the pooled pdf of phi and q2phi.
Parameters: - phi: prior of Phi induced by the model and q1theta.
-
mcmc_run
(data, t=1, likvariance=10, burnin=1000, nopool=False, method='MH', constraints=[], likfun=<functools._lru_cache_wrapper object>)¶ MCMC based fitting
Parameters: - data: observed time series on the model’s output
- t: length of the observed time series
- likvariance: variance of the Normal likelihood function
- nopool: True if no priors on the outputs are available. Leads to faster calculations
- method: Step method. defaults to Metropolis hastings
- constraints:
- likfun: Likelihood function
-
run
(*args)¶ Runs the model through the Melding inference.model model is a callable which return the output of the deterministic model, i.e. the model itself. The model is run self.K times to obtain phi = M(theta).
-
runModel
(savetemp, t=1, k=None)¶ Handles running the model k times keeping a temporary savefile for resuming calculation in case of interruption.
Parameters: - savetemp: Boolean. create a temp file?
- t: number of time steps
Returns: - self.phi: a record array of shape (k,t) with the results.
-
setPhi
(names, dists=[<scipy.stats._continuous_distns.norm_gen object>], pars=[(0, 1)], limits=[(-5, 5)])¶ Setup the models Outputs, or Phi, and generate the samples from prior distributions needed for the melding replicates.
Parameters: - names: list of string with the names of the variables.
- dists: is a list of RNG from scipy.stats
- pars: is a list of tuples of variables for each prior distribution, respectively.
- limits: lower and upper limits on the support of variables.
-
setPhiFromData
(names, data, limits)¶ Setup the model outputs and set their prior distributions from the vectors in data. This method is to be used when the prior distributions are available in the form of a sample from an empirical distribution such as a bayesian posterior. In order to expand the samples provided, K samples are generated from a kernel density estimate of the original sample.
Parameters: - names: list of string with the names of the variables.
- data: list of vectors. Samples of the proposed distribution.
- limits: list of tuples (ll,ul),lower and upper limits on the support of variables.
-
setTheta
(names, dists=[<scipy.stats._continuous_distns.norm_gen object>], pars=[(0, 1)], lims=[(0, 1)])¶ Setup the models inputs and generate the samples from prior distributions needed for the dists the melding replicates.
Parameters: - names: list of string with the names of the parameters.
- dists: is a list of RNG from scipy.stats
- pars: is a list of tuples of parameters for each prior distribution, respectivelydists
-
setThetaFromData
(names, data, limits)¶ Setup the model inputs and set the prior distributions from the vectors in data. This method is to be used when the prior distributions are available in the form of a sample from an empirical distribution such as a bayesian posterior. In order to expand the samples provided, K samples are generated from a kernel density estimate of the original sample.
Parameters: - names: list of string with the names of the parameters.
- data: list of vectors. Samples of a proposed distribution
- limits: List of (min,max) tuples for each theta to make sure samples are not generated outside these limits.
-
simple_plot
(data, theta)¶ Does a single evaluation of the model with theta as parameters and plot alongside with data
-
sir
(data={}, t=1, variance=0.1, pool=False, savetemp=False, likfun=<functools._lru_cache_wrapper object>)¶ Run the model output through the Sampling-Importance-Resampling algorithm. Returns 1 if successful or 0 if not.
Parameters: - data: observed time series on the model’s output
- t: length of the observed time series
- variance: variance of the Normal likelihood function
- pool: False if no priors on the outputs are available. Leads to faster calculations
- savetemp: Boolean. create a temp file?
-
-
BIP.Bayes.Melding.
basicfit
(s1, s2)¶ Calculates a basic fitness calculation between a model- generated time series and a observed time series. it uses a Mean square error.
Parameters: - s1: model-generated time series. record array.
- s2: observed time series. dictionary with keys matching names of s1
Return: Root mean square deviation between ´s1´ and ´s2´.
-
BIP.Bayes.Melding.
clearNaN
(obs)¶ Loops through an array with data series as columns, and Replaces NaNs with the mean of the other series.
Parameters: - obs: 2-dimensional numpy array
Returns: array of the same shape as obs
-
BIP.Bayes.Melding.
enumRun
(model, theta, k)¶ Returns model results plus run number.
Parameters: - model: model callable
- theta: model input list
- k: run number
Return: - res: result list
- k: run number
-
BIP.Bayes.Melding.
model
(theta, n=1)¶ Model (r,p0, n=1) Simulates the Population dynamic Model (PDM) Pt = rP0 for n time steps. P0 is the initial population size. Example model for testing purposes.
-
BIP.Bayes.Melding.
model_as_ra
(theta, model, phinames)¶ Does a single run of self.model and returns the results as a record array
-
BIP.Bayes.Melding.
plotRaHist
(arr)¶ Plots a record array as a panel of histograms
-
BIP.Bayes.Melding.
randint
(low, high=None, size=None, dtype='l')¶ Return random integers from low (inclusive) to high (exclusive).
Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).
- low : int
- Lowest (signed) integer to be drawn from the distribution (unless
high=None
, in which case this parameter is one above the highest such integer). - high : int, optional
- If provided, one above the largest (signed) integer to be drawn
from the distribution (see above for behavior if
high=None
). - size : int or tuple of ints, optional
- Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned. - dtype : dtype, optional
Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’.
New in version 1.11.0.
- out : int or ndarray of ints
- size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.
- random.random_integers : similar to randint, only for the closed
- interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
>>> np.random.randint(2, size=10) array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0]) >>> np.random.randint(1, size=10) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Generate a 2 x 4 array of ints between 0 and 4, inclusive:
>>> np.random.randint(5, size=(2, 4)) array([[4, 0, 2, 1], [3, 2, 2, 0]])
-
BIP.Bayes.Melding.
random
()¶ random_sample(size=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the stated interval. To sample
multiply the output of random_sample by (b-a) and add a:
(b - a) * random_sample() + a
- size : int or tuple of ints, optional
- Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
- out : float or ndarray of floats
- Array of random floats of shape size (unless
size=None
, in which case a single float is returned).
>>> np.random.random_sample() 0.47108547995356098 >>> type(np.random.random_sample()) <type 'float'> >>> np.random.random_sample((5,)) array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
Three-by-two array of random numbers from [-5, 0):
>>> 5 * np.random.random_sample((3, 2)) - 5 array([[-3.99149989, -0.52338984], [-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])
-
BIP.Bayes.Melding.
seed
(seed=None)¶ Seed the generator.
This method is called when RandomState is initialized. It can be called again to re-seed the generator. For details, see RandomState.
- seed : int or array_like, optional
- Seed for RandomState. Must be convertible to 32 bit unsigned integers.
RandomState
Log-Likelihood Functions¶
-
BIP.Bayes.like.
Bernoulli
(x, p)¶ Log-Like Bernoulli
Parameters: - x – data
- p – probability
>>> Bernoulli([0,1,1,1,0,0,1,1],0.5) -5.54517744448
-
BIP.Bayes.like.
Beta
(x, a, b)¶ Log-Like Beta
Parameters: - x – data
- a –
- b –
>>> Beta([.2,.3,.7,.6,.4],2,5) -0.434845728904
-
BIP.Bayes.like.
Binomial
(x, n, p)¶ Binomial Log-Likelihood
Parameters: - x – data
- n –
- p –
>>> Binomial([2,3],6,0.3) -2.81280615454
-
BIP.Bayes.like.
Categor
(x, hist)¶ Categorical Log-likelihood generalization of a Bernoulli process for variables with any constant number of discrete values.
Parameters: - x: data vector (list)
- hist: tuple (prob,classes) classes contain the superior limit of the histogram classes
>>> Categor([1],([.3,.7],[0,1])) -0.356674943939
-
BIP.Bayes.like.
Gamma
(x, alpha, beta)¶ Log-Like Gamma
Parameters: - x – data
- alpha –
- beta –
>>> Gamma([2,3,7,6,4],2,2) -11.015748357
-
BIP.Bayes.like.
Lognormal
(x, mu, tau)¶ Lognormal Log-likelihood
Parameters: - mu: mean
- tau: precision (1/sd)
>>> Lognormal((0.5,1,1.2),0,0.5) -3.15728720569
-
BIP.Bayes.like.
Negbin
(x, r, p)¶ Negative Binomial Log-Likelihood
Parameters: - x – data
- r –
- p –
>>> Negbin([2,3],6,0.3) -9.16117424315
-
BIP.Bayes.like.
Normal
¶ Normal Log-like
Parameters: - mu: mean
- tau: precision (1/variance)
>>> Normal((0,),0,1) -0.918938533205
-
BIP.Bayes.like.
Poisson
(x, mu)¶ Poisson Log-Likelihood function :param x: vector of data :param mu: mean
>>> Poisson([2],2) -1.30685281944
-
BIP.Bayes.like.
Simple
(x, w, a, start=0)¶ Find out what it is. ;-)
-
BIP.Bayes.like.
Uniform
(x, xmin, xmax)¶ Uniform Log-likelihood
Parameters: - x: data vector(list)
- min: lower limit of the distribution
- max: upper limit of the distribution
>>> Uniform([1.1,2.3,3.4,4],0,5) -6.4377516497364011 >>> Uniform([1.1,2.3,3.4,6],0,5) -inf
-
BIP.Bayes.like.
Weibull
(x, alpha, beta)¶ Log-Like Weibull
Parameters: - x – data
- alpha –
- beta –
>>> Weibull([2,1,0.3,.5,1.7],1.5,3) -7.811955373
-
BIP.Bayes.like.
find_best_tau
¶ returns the value of tau which maximizes normal loglik for a fixed (x,mu) :param x: :param mu:
Plotting tools¶
Module with specialized plotting functions for the Melding results
-
BIP.Bayes.PlotMeld.
peakdet
(v, delta, x=None)¶ Converted from MATLAB script at http://billauer.co.il/peakdet.html Currently returns two lists of tuples, but maybe arrays would be better function [maxtab, mintab]=peakdet(v, delta, x) %PEAKDET Detect peaks in a vector % [MAXTAB, MINTAB] = PEAKDET(V, DELTA) finds the local % maxima and minima (“peaks”) in the vector V. % MAXTAB and MINTAB consists of two columns. Column 1 % contains indices in V, and column 2 the found values. % % With [MAXTAB, MINTAB] = PEAKDET(V, DELTA, X) the indices % in MAXTAB and MINTAB are replaced with the corresponding % X-values. % % A point is considered a maximum peak if it has the maximal % value, and was preceded (to the left) by a value lower by % DELTA. % Eli Billauer, 3.4.05 (Explicitly not copyrighted). % This function is released to the public domain; Any use is allowed.
-
BIP.Bayes.PlotMeld.
pred_new_cases
(obs, series, weeks, names=[], title='Total new cases per window: predicted vs observed', ws=7)¶ Predicted total new cases in a window vs oserved.
-
BIP.Bayes.PlotMeld.
violin_plot
(ax, data, positions, bp=False, prior=False)¶ Create violin plots on an axis
Parameters: - ax: A subplot object
- data: A list of data sets to plot
- positions: x values to position the violins. Can be datetime.date objects.
- bp: Whether to plot the boxplot on top.
- prior: whether the first element of data is a Prior distribution.
Latin Hypercube Sampling¶
Module with specialized plotting functions for the Melding results
-
BIP.Bayes.PlotMeld.
peakdet
(v, delta, x=None) Converted from MATLAB script at http://billauer.co.il/peakdet.html Currently returns two lists of tuples, but maybe arrays would be better function [maxtab, mintab]=peakdet(v, delta, x) %PEAKDET Detect peaks in a vector % [MAXTAB, MINTAB] = PEAKDET(V, DELTA) finds the local % maxima and minima (“peaks”) in the vector V. % MAXTAB and MINTAB consists of two columns. Column 1 % contains indices in V, and column 2 the found values. % % With [MAXTAB, MINTAB] = PEAKDET(V, DELTA, X) the indices % in MAXTAB and MINTAB are replaced with the corresponding % X-values. % % A point is considered a maximum peak if it has the maximal % value, and was preceded (to the left) by a value lower by % DELTA. % Eli Billauer, 3.4.05 (Explicitly not copyrighted). % This function is released to the public domain; Any use is allowed.
-
BIP.Bayes.PlotMeld.
pred_new_cases
(obs, series, weeks, names=[], title='Total new cases per window: predicted vs observed', ws=7) Predicted total new cases in a window vs oserved.
-
BIP.Bayes.PlotMeld.
violin_plot
(ax, data, positions, bp=False, prior=False) Create violin plots on an axis
Parameters: - ax: A subplot object
- data: A list of data sets to plot
- positions: x values to position the violins. Can be datetime.date objects.
- bp: Whether to plot the boxplot on top.
- prior: whether the first element of data is a Prior distribution.