Non-Gaussian Behavior; Testing Fits

Introduction

The various analyses in the Tutorial assume implicitly that every probability distribution relevant to a fit is Gaussian. The input data and priors are assumed Gaussian. The chi**2 function is assumed to be well approximated by a Gaussian in the vicinity of its minimum, in order to estimate uncertainties for the best-fit parameters. Functions of those parameters are assumed to yield results that are described by Gaussian random variables. These assumptions are usually pretty good for high-statistics data, when standard deviations are small, but can lead to problems with low statistics.

Here we present three methods for testing these assumptions. Some of these techniques, like the statistical bootstrap and Bayesian integration, can also be used to analyze non-Gaussian results.

Bootstrap Error Analysis; Non-Gaussian Output

The bootstrap provides an efficient way to check on a fit’s validity, and also a method for analyzing non-Gaussian outputs. The strategy is to:

  1. make a large number of “bootstrap copies” of the original input data and prior that differ from each other by random amounts characteristic of the underlying randomness in the original data and prior (see the documentation for

    for more information);

  2. repeat the entire fit analysis for each bootstrap copy of the data and prior, extracting fit results from each;

  3. use the variation of the fit results from bootstrap copy to bootstrap copy to determine an approximate probability distribution (possibly non-Gaussian) for the each result.

To illustrate, we return to our fit in the section on Correlated Parameters; Gaussian Bayes Factor, where the uncertainties on the final parameters were relatively large. We will use a booststrap analysis to check the error estimates coming out of that fit. We do this by adding code right after the fit, in the main() function:

import numpy as np
import gvar as gv
import lsqfit

def main():
    x, y = make_data()
    prior = make_prior()
    fit = lsqfit.nonlinear_fit(prior=prior, data=(x,y), fcn=fcn)
    print(fit)
    print('p1/p0 =', fit.p[1] / fit.p[0], 'p3/p2 =', fit.p[3] / fit.p[2])
    print('corr(p0,p1) =', gv.evalcorr(fit.p[:2])[1,0])

    # boostrap analysis: collect bootstrap data
    print('\nBootstrap Analysis:')
    Nbs = 40                # number of bootstrap copies
    output = {'p':[], 'p1/p0':[], 'p3/p2':[]}
    for bsfit in fit.bootstrapped_fit_iter(Nbs):
        p = bsfit.pmean
        output['p'].append(p)
        output['p1/p0'].append(p[1] / p[0])
        output['p3/p2'].append(p[3] / p[2])

    # average over bootstrap copies and tabulate results
    output = gv.dataset.avg_data(output, bstrap=True)
    print(gv.tabulate(output))
    print('corr(p0,p1) =', gv.evalcorr(output['p'][:2])[1,0])

def make_data():
    x = np.array([
        4., 2., 1., 0.5, 0.25, 0.167, 0.125, 0.1, 0.0833, 0.0714, 0.0625
        ])
    y = gv.gvar([
        '0.198(14)', '0.216(15)', '0.184(23)', '0.156(44)', '0.099(49)',
        '0.142(40)', '0.108(32)', '0.065(26)', '0.044(22)', '0.041(19)',
        '0.044(16)'
        ])
    return x, y

def make_prior():
    p = gv.gvar(['0(1)', '0(1)', '0(1)', '0(1)'])
    p[1] = 20 * p[0] + gv.gvar('0.0(1)')     # p[1] correlated with p[0]
    return p

def fcn(x, p):
    return (p[0] * (x**2 + p[1] * x)) / (x**2 + x * p[2] + p[3])

if __name__ == '__main__':
    main()

The bootstrapped_fit_iter produces fits bsfit for each of Nbs=40 different bootstrap copies of the input data (y and the prior). We collect the mean values for the various parameters and functions of parameters from each fit, ignoring the uncertainties, and then calculate averages and covariance matrices from these results using gvar.dataset.avg_data().

Most of the bootstrap results agree with the results coming directly from the fit:

Least Square Fit:
  chi2/dof [dof] = 0.61 [11]    Q = 0.82    logGBF = 19.129

Parameters:
              0   0.149 (17)     [  0.0 (1.0) ]  
              1    2.97 (34)     [     0 (20) ]  
              2    1.23 (61)     [  0.0 (1.0) ]  *
              3    0.59 (15)     [  0.0 (1.0) ]  

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 20/0.1)

p1/p0 = 19.97(67)     p3/p2 = 0.48(22)
corr(p0,p1) = 0.957067820817

Bootstrap Averages:
  key/index         value
-------------------------
        p 0    0.146 (19)
          1     2.91 (29)
          2     1.14 (76)
          3     0.59 (16)
      p3/p2     0.49 (40)
      p1/p0    19.86 (80)
corr(p0,p1) = 0.949819729559

In particular, the bootstrap analysis confirms the previous error estimates (to within 10-30%, since Nbs=40) except for p3/p2, where the error is substantially larger in the bootstrap analysis.

If p3/p2 is important, one might want to look more closely at its distribution. We use the bootstrap to create histograms of the probability distributions of p3/p2 and p1/p01 by adding the following code to the end of the main() function:

print('Histogram Analysis:')
count = {'p1/p0':[], 'p3/p2':[]}
hist = {
    'p1/p0':gv.PDFHistogram(fit.p[1] / fit.p[0]),
    'p3/p2':gv.PDFHistogram(fit.p[3] / fit.p[2]),
    }

# collect bootstrap data
for bsfit in fit.bootstrapped_fit_iter(n=1000):
    p = bsfit.pmean
    count['p1/p0'].append(hist['p1/p0'].count(p[1] / p[0]))
    count['p3/p2'].append(hist['p3/p2'].count(p[3] / p[2]))

# calculate averages and covariances
count = gv.dataset.avg_data(count)

# print histogram statistics and show plots
import matplotlib.pyplot as plt
pltnum = 1
for k in count:
    print(k + ':')
    print(hist[k].analyze(count[k]).stats)
    plt.subplot(1, 2, pltnum)
    plt.xlabel(k)
    hist[k].make_plot(count[k], plot=plt)
    if pltnum == 2:
        plt.ylabel('')
    pltnum += 1
plt.show()

Here we do 1000 bootstrap copies (rather than 40) to improve the accuracy of the bootstrap results. The output from this code shows statistical analyses of the histogram data for p1/p0 and p3/p2:

Histogram Analysis:
p1/p0:
   mean = 19.970(23)   sdev = 0.722(16)   skew = 0.127(75)   ex_kurt = -0.01(14)
   median = 19.939(28)   plus = 0.737(35)   minus = 0.654(31)
p3/p2:
   mean = 0.5206(76)   sdev = 0.2396(64)   skew = 0.671(83)   ex_kurt = 0.88(18)
   median = 0.4894(79)   plus = 0.309(18)   minus = 0.1492(54)

The code also displays histograms of the probability distributions, where the dashed lines show the results expected directly from the fit (that is, in the Gaussian approximation):

_images/eg3d.png

While the distribution for p1/p0 is consistent with the fit results (dashed line) and Gaussian, the distribution for p3/p2 is significantly skewed, with a much longer tail to the right. The final result for p3/p2 might more accurately be summarized as 0.48 with errors of +0.31 and -0.15, although the Gaussian estimate of 0.48±0.22 would suffice for many applications. The skewed distribution for p3/p2 is not particularly surprising given the ±50% uncertainty in the denominator p2.

Bayesian Integrals

Bayesian expectation values provide an alternative to least-squares fits. These expectation values are integrals over the fit parameters that are weighted by the probability density function (PDF for the parameters) proportional to exp(-chi**2/2), where chi**2 includes contributions from both the data and the priors. They can be used to calculate mean values of the parameters, their covariances, and the means and covariances of any function of the parameters. These will agree with the best-fit results of our least-squares fits provided chi**2 is well approximated by its quadratic expansion in the parameters — that is, insofar as exp(-chi**2/2) is well approximated by the Gaussian distribution in the parameters specified by their best-fit means and covariance matrix (from fit.p).

Here we use lsqfit.BayesIntegrator to evaluate Bayesian expectation values. lsqfit.BayesIntegrator uses the vegas module for adaptive multi-dimensional integration to evaluate expectation values. It integrates arbitrary functions of the parameters, multiplied by the probability density function, over the entire parameter space. (Module vegas must be installed for lsqfit.BayesIntegrator.)

To illustrate lsqfit.BayesIntegrator, we again revisit the analysis in the section on Correlated Parameters; Gaussian Bayes Factor. We modify the end of the main() function of our original code to evaluate the means and covariances of the parameters, and also their probability histograms, using a Bayesian integral:

import matplotlib.pyplot as plt
import numpy as np
import gvar as gv
import lsqfit

def main():
    x, y = make_data()
    prior = make_prior()
    fit = lsqfit.nonlinear_fit(prior=prior, data=(x,y), fcn=fcn)
    print(fit)

    # Bayesian integrator
    expval = lsqfit.BayesIntegrator(fit)

    # adapt integrator expval to PDF from fit
    neval = 1000
    nitn = 10
    expval(neval=neval, nitn=nitn)

    # <g(p)> gives mean and covariance matrix, and counts for histograms
    hist = [
        gv.PDFHistogram(fit.p[0]), gv.PDFHistogram(fit.p[1]),
        gv.PDFHistogram(fit.p[2]), gv.PDFHistogram(fit.p[3]),
        ]
    def g(p):
        return dict(
            mean=p,
            outer=np.outer(p, p),
            count=[
                hist[0].count(p[0]), hist[1].count(p[1]),
                hist[2].count(p[2]), hist[3].count(p[3]),
                ],
            )

    # evaluate expectation value of g(p)
    results = expval(g, neval=neval, nitn=nitn, adapt=False)

    # analyze results
    print('\nIterations:')
    print(results.summary())
    print('Integration Results:')
    pmean = results['mean']
    pcov =  results['outer'] - np.outer(pmean, pmean)
    print('    mean(p) =', pmean)
    print('    cov(p) =\n', pcov)

    # create GVars from results
    p = gv.gvar(gv.mean(pmean), gv.mean(pcov))
    print('\nBayesian Parameters:')
    print(gv.tabulate(p))

    # show histograms
    print('\nHistogram Statistics:')
    count = results['count']
    for i in range(4):
        # print histogram statistics
        print('p[{}]:'.format(i))
        print(hist[i].analyze(count[i]).stats)
        # make histogram plots
        plt.subplot(2, 2, i + 1)
        plt.xlabel('p[{}]'.format(i))
        hist[i].make_plot(count[i], plot=plt)
        if i % 2 != 0:
            plt.ylabel('')
    plt.show()

def make_data():
    x = np.array([
        4.    ,  2.    ,  1.    ,  0.5   ,  0.25  ,  0.167 ,  0.125 ,
        0.1   ,  0.0833,  0.0714,  0.0625
        ])
    y = gv.gvar([
        '0.198(14)', '0.216(15)', '0.184(23)', '0.156(44)', '0.099(49)',
        '0.142(40)', '0.108(32)', '0.065(26)', '0.044(22)', '0.041(19)',
        '0.044(16)'
        ])
    return x, y

def make_prior():
    p = gv.gvar(['0(1)', '0(1)', '0(1)', '0(1)'])
    p[1] = 20 * p[0] + gv.gvar('0.0(1)')     # p[1] correlated with p[0]
    return p

def fcn(x, p):
    return (p[0] * (x**2 + p[1] * x)) / (x**2 + x * p[2] + p[3])

if __name__ == '__main__':
    main()

Here expval is an integrator that is used to evaluate expectation values of arbitrary functions of the fit parameters. BayesIntegrator uses output (fit) from a least-squares fit to design a vegas integrator optimized for calculating expectation values. The integrator uses an iterative Monte Carlo algorithm that adapts to the probability density function after each iteration. See the vegas documentation for much more information.

We first call the integrator without a function. This allows it to adapt to the probability density function from the fit without the extra overhead of evaluating a function of the parameters. The integrator uses nitn=10 iterations of the vegas algorithm, with at most neval=1000 evaluations of the probability density function for each iteration.

We then use the optimized integrator to evaluate the expectation value of function g(p) (turning adaptation off with adapt=False). The expectation value of g(p) is returned in dictionary results.

The results from this script are:

Least Square Fit:
  chi2/dof [dof] = 0.61 [11]    Q = 0.82    logGBF = 19.129

Parameters:
              0   0.149 (17)     [  0.0 (1.0) ]  
              1    2.97 (34)     [     0 (20) ]  
              2    1.23 (61)     [  0.0 (1.0) ]  *
              3    0.59 (15)     [  0.0 (1.0) ]  

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 20/0.1)


Iterations:
itn   integral        average         chi2/dof        Q
-------------------------------------------------------
  1   1.051(32)       1.051(32)           0.00     1.00
  2   1.015(21)       1.033(19)           0.74     0.94
  3   1.046(24)       1.037(15)           0.74     0.99
  4   1.058(36)       1.042(15)           0.75     1.00
  5   1.009(28)       1.036(13)           0.71     1.00
  6   0.982(24)       1.027(11)           0.72     1.00
  7   1.016(22)       1.025(10)           0.72     1.00
  8   1.031(27)       1.0260(97)          0.70     1.00
  9   1.122(77)       1.037(12)           0.72     1.00
 10   1.023(24)       1.035(11)           0.73     1.00

Integration Results:
    mean(p) = [0.15514(35) 3.1019(71) 1.453(16) 0.6984(34)]
    cov(p) =
[[0.000252(11) 0.00488(23) 0.00882(57) 0.001436(88)]
 [0.00488(23) 0.1044(47) 0.177(11) 0.0298(18)]
 [0.00882(57) 0.177(11) 0.362(29) 0.0331(37)]
 [0.001436(88) 0.0298(18) 0.0331(37) 0.0345(16)]]

Bayesian Parameters:
  index         value
---------------------
      0    0.155 (16)
      1     3.10 (32)
      2     1.45 (60)
      3     0.70 (19)

Histogram Statistics:
p[0] -
   mean = 0.15518(33)   sdev = 0.01659(29)   skew = 0.119(62)   ex_kurt = 0.52(40)
   median = 0.15471(23)   plus = 0.01610(47)   minus = 0.01553(24)
p[1] -
   mean = 3.1056(68)   sdev = 0.3382(61)   skew = 0.158(48)   ex_kurt = 0.15(10)
   median = 3.0969(46)   plus = 0.3244(97)   minus = 0.3157(48)
p[2] -
   mean = 1.454(16)   sdev = 0.626(20)   skew = 0.505(94)   ex_kurt = 0.37(17)
   median = 1.4082(86)   plus = 0.615(17)   minus = 0.5520(83)
p[3] -
   mean = 0.6717(41)   sdev = 0.1956(46)   skew = -0.39(10)   ex_kurt = 1.54(16)
   median = 0.6730(26)   plus = 0.2013(67)   minus = 0.1537(22)

The iterations table shows results from each of the nitn=10 vegas iterations used to evaluate the expectation values. Estimates for the integral of the probability density function are listed for each iteration. (Results from the integrator are approximate, with error estimates.) These are consistent with each other and with the (more accurate) overall average.

The integration results show that the Bayesian estimates for the means of the parameters are accurate to roughly 1% or better, which is sufficiently accurate here given the size of the standard deviations. Estimates for the covariance matrix elements are less accurate, which is typical. This information is converted into gvar.GVars for the parameters and tabulated under “Bayesian Parameters,” for comparison with the original fit results — the agreement is pretty good. This is further confirmed by the (posterior) probability distributions for each parameter:

_images/eg3e.png

The means are shifted slightly from the fit results and there is modest skewing, but the differences are not great.

As a second example of Bayesian integration, we return briefly to the problem described in Positive Parameters; Non-Gaussian Priors: we want the average a of noisy data subject the constraint that the average must be positive. The constraint is likely to introduce strong distortions in the probability density function (PDF) given that the fit analysis suggests a value of 0.011±0.013. We plot the actual PDF using the following code, beginning with a fit that uses a flat prior (between 0 and 0.04):

import gvar as gv
import lsqfit

# data, prior, and fit function
y = gv.gvar([
   '-0.17(20)', '-0.03(20)', '-0.39(20)', '0.10(20)', '-0.03(20)',
   '0.06(20)', '-0.23(20)', '-0.23(20)', '-0.15(20)', '-0.01(20)',
   '-0.12(20)', '0.05(20)', '-0.09(20)', '-0.36(20)', '0.09(20)',
   '-0.07(20)', '-0.31(20)', '0.12(20)', '0.11(20)', '0.13(20)'
   ])

prior = {}
prior['erfinv(50*a-1)'] = gv.gvar('0(1)') / gv.sqrt(2)

def fcn(p, N=len(y)):
   a = (1 + p['50*a-1']) / 50.
   return N * [a]

# least-squares fit
fit = lsqfit.nonlinear_fit(prior=prior, data=y, fcn=fcn)
print(fit)
a = (1 + fit.p['50*a-1']) / 50
print('a =', a)

# Bayesian analysis: histogram for a
hist = gv.PDFHistogram(a, nbin=16, binwidth=0.5)

def g(p):
    a = (1 + p['50*a-1']) / 50
    return hist.count(a)

expval = lsqfit.BayesIntegrator(fit)
expval(neval=1009, nitn=10)
count = expval(g, neval=1000, nitn=10, adapt=False)

# print out results and show plot
print('\nHistogram Analysis:')
print (hist.analyze(count).stats)

hist.make_plot(count, show=True)

The output from this script is

Least Square Fit:
  chi2/dof [dof] = 0.85 [20]    Q = 0.65    logGBF = 5.2385

Parameters:
  erfinv(50a-1)   -0.42 (68)     [  0.00 (71) ]  
-----------------------------------------------
          50a-1   -0.44 (64)     [  0.00 (80) ]  

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 15/0.0)

a = 0.011(13)

Histogram Analysis:
   mean = 0.014064(17)   sdev = 0.010702(16)   skew = 0.6245(21)   ex_kurt = -0.5521(45)
   median = 0.011856(14)   plus = 0.014450(31)   minus = 0.008771(17)

and the probability distribution for a looks like

_images/eg6.png

This distribution is distorted between a=0 and the mean value, but otherwise is fairly similar to the Gaussian result 0.11±0.13 (dashed line). A more accurate summary of the result for a would be 0.12 with an error of +0.14 and -0.09, though again the Gaussian result is not terribly misleading even in this case.

The Bayesian integrals are relatively simple in these example. More complicated problems can require much more computer time to evaluate the integrals, with hundreds of thousands or millions of integrand evaluations per iteration (neval). This is particularly true as the number of parameters increases. BayesIntegrator uses information from the least-squares fit to simplify the integration for vegas by optimizing the integration variables used, but integrals over tens of variables are intrinsically challenging. BayesIntegrator can be used with MPI to run such integrals on multiple processors, for a considerable speed-up.

We used Bayesian integrals here to deal with non-Gaussian behavior in fit outputs. The case study Case Study: Outliers and Bayesian Integrals shows how to use them when the input data is not quite Gaussian.

Testing Fits with Simulated Data

Ideally we would test a fitting protocol by doing fits of data similar to our actual fit but where we know the correct values for the fit parameters ahead of the fit. Method

returns an iterator that creates any number of such simulations of the original fit.

A key assumption underlying least-squares fitting is that the fit data y[i] are random samples from a distribution whose mean is the fit function fcn(x, fitp) evaluated with the best-fit values fitp for the parameters. simulated_fit_iter iterators generate simulated data by drawing other random samples from the same distribution, assigning them the same covariance matrix as the original data. The simulated data are fit using the same priors and fitter settings as in the original fit, and the results (an lsqfit.nonlinear_fit object) are returned by the iterator. The fits with simulated data should have good chi**2 values, and the results from these fits should agree, within errors, with the original fit results since the simulated data are from the same distribution as the original data. There is a problem with the fitting protocol if this is not the case most of the time.

To illustrate we again examine the fits in the section on Correlated Parameters; Gaussian Bayes Factor: we add three fit simulations at the end of the main() function:

import numpy as np
import gvar as gv
import lsqfit

def main():
    x, y = make_data()
    prior = make_prior()
    fit = lsqfit.nonlinear_fit(prior=prior, data=(x,y), fcn=fcn)
    print(40 * '*' + ' real fit')
    print(fit.format(True))

    # 3 simulated fits
    for sfit in fit.simulated_fit_iter(n=3):
        # print simulated fit details
        print(40 * '=' + ' simulation')
        print(sfit.format(True))

        # compare simulated fit results with exact values (pexact=fit.pmean)
        diff = sfit.p - sfit.pexact
        print('\nsfit.p - pexact =', diff)
        print(gv.fmt_chi2(gv.chi2(diff)))
        print

def make_data():
    x = np.array([
        4.    ,  2.    ,  1.    ,  0.5   ,  0.25  ,  0.167 ,  0.125 ,
        0.1   ,  0.0833,  0.0714,  0.0625
        ])
    y = gv.gvar([
        '0.198(14)', '0.216(15)', '0.184(23)', '0.156(44)', '0.099(49)',
        '0.142(40)', '0.108(32)', '0.065(26)', '0.044(22)', '0.041(19)',
        '0.044(16)'
        ])
    return x, y

def make_prior():
    p = gv.gvar(['0(1)', '0(1)', '0(1)', '0(1)'])
    p[1] = 20 * p[0] + gv.gvar('0.0(1)')        # p[1] correlated with p[0]
    return p

def fcn(x, p):
    return (p[0] * (x**2 + p[1] * x)) / (x**2 + x * p[2] + p[3])

if __name__ == '__main__':
    main()

This code produces the following output, showing how the input data fluctuate from simulation to simulation:

**************************************** real fit
Least Square Fit:
  chi2/dof [dof] = 0.61 [11]    Q = 0.82    logGBF = 19.129

Parameters:
              0   0.149 (17)     [  0.0 (1.0) ]  
              1    2.97 (34)     [     0 (20) ]  
              2    1.23 (61)     [  0.0 (1.0) ]  *
              3    0.59 (15)     [  0.0 (1.0) ]  

Fit:
     x[k]          y[k]      f(x[k],p)
--------------------------------------
        4    0.198 (14)     0.193 (11)  
        2    0.216 (15)     0.210 (10)  
        1    0.184 (23)     0.209 (15)  *
      0.5    0.156 (44)     0.177 (15)  
     0.25    0.099 (49)     0.124 (13)  
    0.167    0.142 (40)     0.094 (12)  *
    0.125    0.108 (32)     0.075 (11)  *
      0.1    0.065 (26)    0.0629 (96)  
   0.0833    0.044 (22)    0.0538 (87)  
   0.0714    0.041 (19)    0.0471 (79)  
   0.0625    0.044 (16)    0.0418 (72)  

Settings:
  svdcut/n = 1e-12/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 20/0.1)

======================================== simulation
Least Square Fit:
  chi2/dof [dof] = 1.2 [11]    Q = 0.27    logGBF = 15.278

Parameters:
              0   0.134 (14)     [  0.0 (1.0) ]  
              1    2.68 (29)     [     0 (20) ]  
              2    0.68 (47)     [  0.0 (1.0) ]  
              3    0.54 (12)     [  0.0 (1.0) ]  

Fit:
     x[k]          y[k]      f(x[k],p)
--------------------------------------
        4    0.200 (14)     0.186 (12)  
        2    0.192 (15)     0.212 (10)  *
        1    0.242 (23)     0.221 (16)  
      0.5    0.163 (44)     0.187 (18)  
     0.25    0.089 (49)     0.126 (15)  
    0.167    0.130 (40)     0.093 (13)  
    0.125    0.103 (32)     0.073 (11)  
      0.1    0.046 (26)    0.0599 (96)  
   0.0833    0.054 (22)    0.0508 (85)  
   0.0714    0.004 (19)    0.0441 (77)  **
   0.0625    0.060 (16)    0.0389 (70)  *

Settings:
  svdcut/n = None/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 7/0.0)


sfit.p - pexact = [-0.015(14) -0.29(29) -0.54(47) -0.05(12)]
chi2/dof = 0.34 [4]    Q = 0.85

======================================== simulation
Least Square Fit:
  chi2/dof [dof] = 1.1 [11]    Q = 0.38    logGBF = 17.048

Parameters:
              0   0.156 (18)     [  0.0 (1.0) ]  
              1    3.12 (36)     [     0 (20) ]  
              2    1.35 (66)     [  0.0 (1.0) ]  *
              3    0.77 (20)     [  0.0 (1.0) ]  

Fit:
     x[k]          y[k]      f(x[k],p)
--------------------------------------
        4    0.207 (14)     0.201 (11)  
        2    0.224 (15)     0.214 (10)  
        1    0.163 (23)     0.206 (14)  *
      0.5    0.162 (44)     0.167 (15)  
     0.25    0.124 (49)     0.113 (14)  
    0.167    0.111 (40)     0.084 (12)  
    0.125    0.085 (32)     0.066 (11)  
      0.1    0.097 (26)    0.0550 (93)  *
   0.0833    0.020 (22)    0.0469 (83)  *
   0.0714    0.043 (19)    0.0409 (75)  
   0.0625    0.031 (16)    0.0362 (68)  

Settings:
  svdcut/n = None/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 23/0.0)


sfit.p - pexact = [0.008(18) 0.15(36) 0.13(66) 0.18(20)]
chi2/dof = 0.22 [4]    Q = 0.93

======================================== simulation
Least Square Fit:
  chi2/dof [dof] = 0.76 [11]    Q = 0.68    logGBF = 17.709

Parameters:
              0   0.138 (14)     [  0.0 (1.0) ]  
              1    2.77 (29)     [     0 (20) ]  
              2    0.72 (46)     [  0.0 (1.0) ]  
              3    0.53 (11)     [  0.0 (1.0) ]  

Fit:
     x[k]          y[k]      f(x[k],p)
--------------------------------------
        4    0.196 (14)     0.193 (12)  
        2    0.218 (15)     0.221 (10)  
        1    0.240 (23)     0.231 (16)  
      0.5    0.157 (44)     0.198 (18)  
     0.25    0.157 (49)     0.134 (14)  
    0.167    0.022 (40)     0.099 (12)  *
    0.125    0.070 (32)     0.078 (11)  
      0.1    0.090 (26)    0.0644 (96)  
   0.0833    0.045 (22)    0.0547 (86)  
   0.0714    0.053 (19)    0.0475 (78)  
   0.0625    0.059 (16)    0.0420 (70)  *

Settings:
  svdcut/n = None/0    tol = (1e-08*,1e-10,1e-10)    (itns/time = 10/0.0)


sfit.p - pexact = [-0.010(14) -0.21(29) -0.51(46) -0.06(11)]
chi2/dof = 0.77 [4]    Q = 0.54

The parameters sfit.p produced by the simulated fits agree well with the original fit parameters pexact=fit.pmean, with good fits in each case. We calculate the chi**2 for the difference sfit.p - pexact in each case; good chi**2 values validate the parameter values, standard deviations, and correlations.

Goodness of Fit

The quality of a fit is often judged by the value of chi**2/N, where N is the number of degrees of freedom. Conventionally we expect chi**2/N to be of order 1 ± sqrt(2/N) since fluctuations in the mean values of the data are of order the uncertainties in the data. More precisely the means are assumed to be random samples drawn from a Gaussian distribution whose means are given by the best-fit function and whose covariance matrix comes from the data. There are two situations where this measure of goodness-of-fit becomes unreliable.

The first situation is when there is a large SVD cut on the data. As discussed in SVD Cuts and Inadequate Statistics, an SVD cut increases the uncertainties in the data without increasing the random fluctuations in the data means. As a result contributions from the parts of the chi**2 function affected by the SVD cut tend to be much smaller than naively expected, artificially pulling chi**2/N down.

The second situation that compromises chi**2 is when some or all of the priors used in a fit are broad — that is, when a fit result for a parameter has a much smaller uncertainty than the corresponding prior, but a mean that is artificially close to the prior’s mean. This often arises when the means used in the priors are not random samples, which is frequently the case (unlike for the fit data). Again contributions to chi**2 associated with such priors tend to be much smaller than naively expected, pulling chi**2 down.

These complications can conspire to make chi**2/N significantly less than 1 when the fit is good. Of greater concern, they can mask evidence of a bad fit: chi**2/N 1 is not necessarily evidence of a good fit in such situations.

A simple way to address these situations is to redo the fit with keyword parameter noise=True. This causes lsqfit.nonlinear_fit to add extra fluctuations to the means in the prior and the data that are characteristic of the probability distributions associated with the priors and the SVD cut, respectively:

prior =>  prior + (gv.sample(prior) - gv.mean(prior))
    y =>  y + gv.sample(y.correction)

These fluctuations should leave fit results unchanged (within errors) but increase chi**2/N so it is of order one.

To add this test to the fit from SVD Cuts and Inadequate Statistics, we modify the code to include a second fit at the end:

import numpy as np
import gvar as gv
import lsqfit

def main():
    ysamples = [
        [0.0092441016, 0.0068974057, 0.0051480509, 0.0038431422, 0.0028690492],
        [0.0092477405, 0.0069030565, 0.0051531383, 0.0038455855, 0.0028700587],
        [0.0092558569, 0.0069102437, 0.0051596569, 0.0038514537, 0.0028749153],
        [0.0092294581, 0.0068865156, 0.0051395262, 0.003835656, 0.0028630454],
        [0.009240534, 0.0068961523, 0.0051480046, 0.0038424661, 0.0028675632],
        ]
    y = gv.dataset.avg_data(ysamples)
    x = np.array([15., 16., 17., 18., 19.])
    def fcn(p):
        return p['a'] * gv.exp(- p['b'] * x)
    prior = gv.gvar(dict(a='0.75(5)', b='0.30(3)'))
    fit = lsqfit.nonlinear_fit(data=y, prior=prior, fcn=fcn, svdcut=0.0028)
    print(fit.format(True))

    print('\n================ Add noise to prior, SVD')
    noisyfit = lsqfit.nonlinear_fit(
        data=y, prior=prior, fcn=fcn, svdcut=0.0028, noise=True,
        )
    print(noisyfit.format(True))

if __name__ == '__main__':
    main()

Running this code gives the following output:

Least Square Fit:
  chi2/dof [dof] = 0.38 [5]    Q = 0.86    logGBF = 53.598

Parameters:
              a    0.74338 (29)      [ 0.750 (50) ]  
              b   0.292480 (42)      [ 0.300 (30) ]  

Settings:
  svdcut/n = 0.0028/3    tol = (1e-08*,1e-10,1e-10)    (itns/time = 5/0.0)


================ Add noise to prior, SVD
Least Square Fit:
  chi2/dof [dof] = 0.81 [5]    Q = 0.54    logGBF = 52.523

Parameters:
              a    0.74343 (29)      [ 0.803 (50) ]  *
              b   0.292485 (42)      [ 0.314 (30) ]  

Fit:
      key            y[key]         f(p)[key]
---------------------------------------------
        0    0.0092436 (39)    0.0092443 (32)  
        1    0.0068987 (35)    0.0069000 (26)  
        2    0.0051496 (30)    0.0051502 (21)  
        3    0.0038437 (23)    0.0038441 (17)  
        4    0.0028689 (17)    0.0028693 (14)  

Settings:
  svdcut/n = 0.0028/3*    tol = (1e-08*,1e-10,1e-10)    (itns/time = 5/0.0)

The fit with extra noise has a larger chi**2, as expected, but is still a good fit. It also gives fit parameters that agree within errors with those from the original fit. In general, there is probably something wrong with the original fit (e.g., svdcut too small, or priors inconsistent with the fit data) if adding noise makes chi**2/N signficantly larger than one, or changes the best-fit values of the parameters significantly.

Fit Residuals and Q-Q Plots

It can be useful to examine the normalized residuals from a fit (in array fit.residuals). These are the differences between the data and the corresponding values from the fit function using the best-fit values for the fit parameters. The differences are projected onto the eigenvectors of the correlation matrix and normalized by dividing by the square root of the corresponding eigenvalues. The statistical assumptions underlying lsqfit.nonlinear_fit imply that the normalized fit residuals should be uncorrelated and distributed randomly about zero in a Gaussian distribution.

One way to test whether residuals from a fit have a Gaussian distribution is to make a Q-Q plot. Plots for the two fits from the previous section (one without extra noise on the left, and the other with noise) are:

_images/eg10e1.png _images/eg10e2.png

These plots were made using

by adding the following lines at the end of the main() method:

fit.qqplot_residuals().show()
noisyfit.qqplot_residuals().show()

In each case the residuals are first ordered, from smallest to largest. They are then plotted against the value expected for a residual at that position in the list if the list elements were drawn at random from a Gaussian distribution of unit width and zero mean. (For example, given 100 samples from the Gaussian distribution, the sample in position 16 of the ordered list should have a value around -1 since 16% of the values should be more than one standard deviation below the mean.) The residuals are consistent with a Gaussian distribution if they fall on or near a straight line in such a plot.

The plots show fits of the residuals to straight lines (red solid lines). The residuals in the left plot (without additional noise) are reasonably consistent with a straight line (and, therefore, a Gaussian distribution), but the slope (0.55) is much less than one. This is because chi2/dof = 0.38 for this fit is much smaller than one. Typically we expect the slope to be roughly the square root of chi2/dof (since chi2 equals the sum of the residuals squared).

The residuals in the right plot are also quite linear in the Q-Q plot. In this case the fit residuals include extra noise associated with the priors and with the SVD cut, as discussed in the previous section. As a result chi2/dof = 0.81 is much closer to one, as is the resulting slope (0.90) in the Q-Q plot. The fit line through the residuals is much closer here to the dashed line in the plot, which is what would result if the residuals had unit standard deviation and zero mean.

The fits pictured above have relatively few residuals. Q-Q plots become increasingly compelling as the number of residuals increases. The following plots, from fits without (left) and with (right) prior/SVD noise, come from a lattice QCD analysis with 383 residuals:

_images/bmixing1.png _images/bmixing2.png