BIP.Bayes

Basic Likelihood tools such as functions for computing likelihoods, Latin Hypercube sampling (efficient random sampling) and other tools which don’t belong on other packages, or apply to multiple packages.

Melding Module

class BIP.Bayes.Melding.FitModel(K, model, inits, tf, thetanames, phinames, wl=1, nw=1, verbose=False, burnin=1000, constraints=[])

Fit a model to data generating Bayesian posterior distributions of inputs and outputs of the model. Fitting process can be monitored via a curses interface.

AIC_from_RSS()

Calculates the Akaike information criterion from the residual sum of squares of the best fitting run.

do_inference(prior, data, predlen, method, likvar, likfun=<functools._lru_cache_wrapper object>)

Call the samplers an do the actual inference

Parameters:
  • likfun – Likelihood function
  • prior
  • data
  • predlen
  • method
  • likvar
optimize(data, p0, optimizer='scipy', tol=0.0001, verbose=0, plot=0)

Finds best parameters using an optimization approach

Parameters:
  • data: Dictionary of observed series
  • p0: Sequence (list or tuple) of initial values for the parameters
  • optimizer: Optimization library to use: ‘scipy’: fmin (Nelder-Mead) or ‘oo’:OpenOpt.NLP
  • tol: Tolerance of the error
  • verbose: If true show stats of the optimization run at the end
  • plot: If true plots a run based on the optimized parameters.
plot_results(names=[], dbname='results', savefigs=0)

Plot the final results of the inference

prior_sample()

Generates a set of samples from the starting theta prior distributions for reporting purposes.

Returns:Dictionary with (name,sample) pairs
run(data, method, likvar, likfun='Normal', pool=False, adjinits=True, ew=0, dbname='results', monitor=False, initheta=[])

Fit the model against data

Parameters:
  • data: dictionary with variable names and observed series, as Key and value respectively.
  • method: Inference method: “ABC”, “SIR”, “MCMC” or “DREAM”
  • likfun : Likelihood function to be used: currently suported: “Normal” and “Poisson”.
  • likvar: Variance of the likelihood function in the SIR and MCMC method
  • pool: Pool priors on model’s outputs.
  • adjinits: whether to adjust inits to data
  • ew: Whether to use expanding windows instead of moving ones.
  • dbname: name of the sqlite3 database
  • monitor: Whether to monitor certains variables during the inference. If not False, should be a list of valid phi variable names.
  • initheta: starting position in parameter space for the sampling to start. (only used by MCMC and DREAM)
set_priors(tdists, tpars, tlims, pdists, ppars, plims)

Set the prior distributions for Phi and Theta

Parameters:
  • pdists: distributions for the output variables. For example: [scipy.stats.uniform,scipy.stats.norm]
  • ppars: paramenters for the distributions in pdists. For example: [(0,1),(0,1)]
  • plims: Limits of the range of each phi. List of (min,max) tuples.
  • tdists: same as pdists, but for input parameters (Theta).
  • tpars: same as ppars, but for tdists.
  • tlims: Limits of the range of each theta. List of (min,max) tuples.
class BIP.Bayes.Melding.Meld(K, L, model, ntheta, nphi, alpha=0.5, verbose=0, viz=False)

Bayesian Melding class

abcRun(fitfun=None, data={}, t=1, pool=False, savetemp=False)

Runs the model for inference through Approximate Bayes Computation techniques. This method should be used as an alternative to the sir.

Parameters:
  • fitfun: Callable which will return the goodness of fit of the model to data as a number between 0-1, with 1 meaning perfect fit
  • t: number of time steps to retain at the end of the of the model run for fitting purposes.
  • data: dict containing observed time series (lists of length t) of the state variables. This dict must have as many items the number of state variables, with labels matching variables names. Unorbserved variables must have an empty list as value.
  • pool: if True, Pools the user provided priors on the model’s outputs, with the model induced priors.
  • savetemp: Should temp results be saved. Useful for long runs. Alows for resuming the simulation from last sa
add_salt(dataset, band)

Adds a few extra uniformly distributed data points beyond the dataset range. This is done by adding from a uniform dist.

Parameters:
  • dataset: vector of data
  • band: Fraction of range to extend [0,1[
Returns:

Salted dataset.

current_plot(series, data, idx, vars=[], step=0)

Plots the last simulated series along with data

Parameters:
  • series: Record array with the simulated series.
  • idx: Integer index of the curve to plot .
  • data: Dictionary with the full dataset.
  • vars: List with variable names to be plotted.
  • step: Step of the chain
filtM(cond, x, limits)

Multiple condition filtering. Remove values in x[i], if corresponding values in cond[i] are less than limits[i][0] or greater than limits[i][1].

Parameters:
  • cond: is an array of conditions.
  • limits: is a list of tuples (ll,ul) with length equal to number of lines in cond and x.
  • x: array to be filtered.
getPosteriors(t)

Updates the posteriors of the model’s output for the last t time steps. Returns two record arrays: - The posteriors of the Theta - the posterior of Phi last t values of time-series. self.L by t arrays.

Parameters:
  • t: length of the posterior time-series to return.
imp_sample(n, data, w)

Importance sampling

Returns:returns a sample of size n
logPooling(phi)

Returns the probability associated with each phi[i] on the pooled pdf of phi and q2phi.

Parameters:
  • phi: prior of Phi induced by the model and q1theta.
mcmc_run(data, t=1, likvariance=10, burnin=1000, nopool=False, method='MH', constraints=[], likfun=<functools._lru_cache_wrapper object>)

MCMC based fitting

Parameters:
  • data: observed time series on the model’s output
  • t: length of the observed time series
  • likvariance: variance of the Normal likelihood function
  • nopool: True if no priors on the outputs are available. Leads to faster calculations
  • method: Step method. defaults to Metropolis hastings
  • constraints:
  • likfun: Likelihood function
run(*args)

Runs the model through the Melding inference.model model is a callable which return the output of the deterministic model, i.e. the model itself. The model is run self.K times to obtain phi = M(theta).

runModel(savetemp, t=1, k=None)

Handles running the model k times keeping a temporary savefile for resuming calculation in case of interruption.

Parameters:
  • savetemp: Boolean. create a temp file?
  • t: number of time steps
Returns:
  • self.phi: a record array of shape (k,t) with the results.
setPhi(names, dists=[<scipy.stats._continuous_distns.norm_gen object>], pars=[(0, 1)], limits=[(-5, 5)])

Setup the models Outputs, or Phi, and generate the samples from prior distributions needed for the melding replicates.

Parameters:
  • names: list of string with the names of the variables.
  • dists: is a list of RNG from scipy.stats
  • pars: is a list of tuples of variables for each prior distribution, respectively.
  • limits: lower and upper limits on the support of variables.
setPhiFromData(names, data, limits)

Setup the model outputs and set their prior distributions from the vectors in data. This method is to be used when the prior distributions are available in the form of a sample from an empirical distribution such as a bayesian posterior. In order to expand the samples provided, K samples are generated from a kernel density estimate of the original sample.

Parameters:
  • names: list of string with the names of the variables.
  • data: list of vectors. Samples of the proposed distribution.
  • limits: list of tuples (ll,ul),lower and upper limits on the support of variables.
setTheta(names, dists=[<scipy.stats._continuous_distns.norm_gen object>], pars=[(0, 1)], lims=[(0, 1)])

Setup the models inputs and generate the samples from prior distributions needed for the dists the melding replicates.

Parameters:
  • names: list of string with the names of the parameters.
  • dists: is a list of RNG from scipy.stats
  • pars: is a list of tuples of parameters for each prior distribution, respectivelydists
setThetaFromData(names, data, limits)

Setup the model inputs and set the prior distributions from the vectors in data. This method is to be used when the prior distributions are available in the form of a sample from an empirical distribution such as a bayesian posterior. In order to expand the samples provided, K samples are generated from a kernel density estimate of the original sample.

Parameters:
  • names: list of string with the names of the parameters.
  • data: list of vectors. Samples of a proposed distribution
  • limits: List of (min,max) tuples for each theta to make sure samples are not generated outside these limits.
simple_plot(data, theta)

Does a single evaluation of the model with theta as parameters and plot alongside with data

sir(data={}, t=1, variance=0.1, pool=False, savetemp=False, likfun=<functools._lru_cache_wrapper object>)

Run the model output through the Sampling-Importance-Resampling algorithm. Returns 1 if successful or 0 if not.

Parameters:
  • data: observed time series on the model’s output
  • t: length of the observed time series
  • variance: variance of the Normal likelihood function
  • pool: False if no priors on the outputs are available. Leads to faster calculations
  • savetemp: Boolean. create a temp file?
BIP.Bayes.Melding.basicfit(s1, s2)

Calculates a basic fitness calculation between a model- generated time series and a observed time series. it uses a Mean square error.

Parameters:
  • s1: model-generated time series. record array.
  • s2: observed time series. dictionary with keys matching names of s1
Return:

Root mean square deviation between ´s1´ and ´s2´.

BIP.Bayes.Melding.clearNaN(obs)

Loops through an array with data series as columns, and Replaces NaNs with the mean of the other series.

Parameters:
  • obs: 2-dimensional numpy array
Returns:

array of the same shape as obs

BIP.Bayes.Melding.enumRun(model, theta, k)

Returns model results plus run number.

Parameters:
  • model: model callable
  • theta: model input list
  • k: run number
Return:
  • res: result list
  • k: run number
BIP.Bayes.Melding.model(theta, n=1)

Model (r,p0, n=1) Simulates the Population dynamic Model (PDM) Pt = rP0 for n time steps. P0 is the initial population size. Example model for testing purposes.

BIP.Bayes.Melding.model_as_ra(theta, model, phinames)

Does a single run of self.model and returns the results as a record array

BIP.Bayes.Melding.plotRaHist(arr)

Plots a record array as a panel of histograms

BIP.Bayes.Melding.randint(low, high=None, size=None, dtype='l')

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

low : int
Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is one above the highest such integer).
high : int, optional
If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
dtype : dtype, optional

Desired dtype of the result. All dtypes are determined by their name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available and a specific precision may have different C types depending on the platform. The default value is ‘np.int’.

New in version 1.11.0.

out : int or ndarray of ints
size-shaped array of random integers from the appropriate distribution, or a single such random int if size not provided.
random.random_integers : similar to randint, only for the closed
interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Generate a 2 x 4 array of ints between 0 and 4, inclusive:

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])
BIP.Bayes.Melding.random()

random_sample(size=None)

Return random floats in the half-open interval [0.0, 1.0).

Results are from the “continuous uniform” distribution over the stated interval. To sample Unif[a, b), b > a multiply the output of random_sample by (b-a) and add a:

(b - a) * random_sample() + a
size : int or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
out : float or ndarray of floats
Array of random floats of shape size (unless size=None, in which case a single float is returned).
>>> np.random.random_sample()
0.47108547995356098
>>> type(np.random.random_sample())
<type 'float'>
>>> np.random.random_sample((5,))
array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428])

Three-by-two array of random numbers from [-5, 0):

>>> 5 * np.random.random_sample((3, 2)) - 5
array([[-3.99149989, -0.52338984],
       [-2.99091858, -0.79479508],
       [-1.23204345, -1.75224494]])
BIP.Bayes.Melding.seed(seed=None)

Seed the generator.

This method is called when RandomState is initialized. It can be called again to re-seed the generator. For details, see RandomState.

seed : int or array_like, optional
Seed for RandomState. Must be convertible to 32 bit unsigned integers.

RandomState

Log-Likelihood Functions

BIP.Bayes.like.Bernoulli(x, p)

Log-Like Bernoulli

Parameters:
  • x – data
  • p – probability
>>> Bernoulli([0,1,1,1,0,0,1,1],0.5)
-5.54517744448
BIP.Bayes.like.Beta(x, a, b)

Log-Like Beta

Parameters:
  • x – data
  • a
  • b
>>> Beta([.2,.3,.7,.6,.4],2,5)
-0.434845728904
BIP.Bayes.like.Binomial(x, n, p)

Binomial Log-Likelihood

Parameters:
  • x – data
  • n
  • p
>>> Binomial([2,3],6,0.3)
-2.81280615454
BIP.Bayes.like.Categor(x, hist)

Categorical Log-likelihood generalization of a Bernoulli process for variables with any constant number of discrete values.

Parameters:
  • x: data vector (list)
  • hist: tuple (prob,classes) classes contain the superior limit of the histogram classes
>>> Categor([1],([.3,.7],[0,1]))
-0.356674943939
BIP.Bayes.like.Gamma(x, alpha, beta)

Log-Like Gamma

Parameters:
  • x – data
  • alpha
  • beta
>>> Gamma([2,3,7,6,4],2,2)
-11.015748357
BIP.Bayes.like.Lognormal(x, mu, tau)

Lognormal Log-likelihood

Parameters:
  • mu: mean
  • tau: precision (1/sd)
>>> Lognormal((0.5,1,1.2),0,0.5)
-3.15728720569
BIP.Bayes.like.Negbin(x, r, p)

Negative Binomial Log-Likelihood

Parameters:
  • x – data
  • r
  • p
>>> Negbin([2,3],6,0.3)
-9.16117424315
BIP.Bayes.like.Normal

Normal Log-like

Parameters:
  • mu: mean
  • tau: precision (1/variance)
>>> Normal((0,),0,1)
-0.918938533205
BIP.Bayes.like.Poisson(x, mu)

Poisson Log-Likelihood function :param x: vector of data :param mu: mean

>>> Poisson([2],2)
-1.30685281944
BIP.Bayes.like.Simple(x, w, a, start=0)

Find out what it is. ;-)

BIP.Bayes.like.Uniform(x, xmin, xmax)

Uniform Log-likelihood

Parameters:
  • x: data vector(list)
  • min: lower limit of the distribution
  • max: upper limit of the distribution
>>> Uniform([1.1,2.3,3.4,4],0,5)
-6.4377516497364011
>>> Uniform([1.1,2.3,3.4,6],0,5)
-inf
BIP.Bayes.like.Weibull(x, alpha, beta)

Log-Like Weibull

Parameters:
  • x – data
  • alpha
  • beta
>>> Weibull([2,1,0.3,.5,1.7],1.5,3)
-7.811955373
BIP.Bayes.like.find_best_tau

returns the value of tau which maximizes normal loglik for a fixed (x,mu) :param x: :param mu:

Plotting tools

Module with specialized plotting functions for the Melding results

BIP.Bayes.PlotMeld.peakdet(v, delta, x=None)

Converted from MATLAB script at http://billauer.co.il/peakdet.html Currently returns two lists of tuples, but maybe arrays would be better function [maxtab, mintab]=peakdet(v, delta, x) %PEAKDET Detect peaks in a vector % [MAXTAB, MINTAB] = PEAKDET(V, DELTA) finds the local % maxima and minima (“peaks”) in the vector V. % MAXTAB and MINTAB consists of two columns. Column 1 % contains indices in V, and column 2 the found values. % % With [MAXTAB, MINTAB] = PEAKDET(V, DELTA, X) the indices % in MAXTAB and MINTAB are replaced with the corresponding % X-values. % % A point is considered a maximum peak if it has the maximal % value, and was preceded (to the left) by a value lower by % DELTA. % Eli Billauer, 3.4.05 (Explicitly not copyrighted). % This function is released to the public domain; Any use is allowed.

BIP.Bayes.PlotMeld.pred_new_cases(obs, series, weeks, names=[], title='Total new cases per window: predicted vs observed', ws=7)

Predicted total new cases in a window vs oserved.

BIP.Bayes.PlotMeld.violin_plot(ax, data, positions, bp=False, prior=False)

Create violin plots on an axis

Parameters:
  • ax: A subplot object
  • data: A list of data sets to plot
  • positions: x values to position the violins. Can be datetime.date objects.
  • bp: Whether to plot the boxplot on top.
  • prior: whether the first element of data is a Prior distribution.

Latin Hypercube Sampling

Module with specialized plotting functions for the Melding results

BIP.Bayes.PlotMeld.peakdet(v, delta, x=None)

Converted from MATLAB script at http://billauer.co.il/peakdet.html Currently returns two lists of tuples, but maybe arrays would be better function [maxtab, mintab]=peakdet(v, delta, x) %PEAKDET Detect peaks in a vector % [MAXTAB, MINTAB] = PEAKDET(V, DELTA) finds the local % maxima and minima (“peaks”) in the vector V. % MAXTAB and MINTAB consists of two columns. Column 1 % contains indices in V, and column 2 the found values. % % With [MAXTAB, MINTAB] = PEAKDET(V, DELTA, X) the indices % in MAXTAB and MINTAB are replaced with the corresponding % X-values. % % A point is considered a maximum peak if it has the maximal % value, and was preceded (to the left) by a value lower by % DELTA. % Eli Billauer, 3.4.05 (Explicitly not copyrighted). % This function is released to the public domain; Any use is allowed.

BIP.Bayes.PlotMeld.pred_new_cases(obs, series, weeks, names=[], title='Total new cases per window: predicted vs observed', ws=7)

Predicted total new cases in a window vs oserved.

BIP.Bayes.PlotMeld.violin_plot(ax, data, positions, bp=False, prior=False)

Create violin plots on an axis

Parameters:
  • ax: A subplot object
  • data: A list of data sets to plot
  • positions: x values to position the violins. Can be datetime.date objects.
  • bp: Whether to plot the boxplot on top.
  • prior: whether the first element of data is a Prior distribution.