Package InversionTest :: Module StatisticalTests
[hide private]
[frames] | no frames]

Module StatisticalTests

General statistical tests that operate on population samples, such as the binomial test, sign test, and Wilcoxon signed rank test. Also, contains relevant CDF, PMF, and PDF functions for related distributions such as the binomial and normal distributions.

Note: If available, SciPy's optimized versions of binomial testing and normal CDF calculations are utilized by default. These should be marginally faster, as they have hooks into C code.

Author: Benjamin D. Nye License: Apache License V2.0

Classes [hide private]
  SignTestCancelationError
  SignTestException
  SignTestInvalidPairLengthError
  SystemReseededRandom
Random number generator that randomly re-positions a standard MT generator based on the system random entropy generator, so that any permutation is theoretically possible.
Functions [hide private]
tuple of (int or float, int or float)
absAndValKey(x)
Return the absolute value and the value of a number
float
binomialCDF(x, n, p)
Binomial CDF, using log-gamma implementation
bool
binomialNormalApproximationHeuristic(n, p, threshold=0.3, minN=25)
A heuristic for when the normal approximation is reasonable
float
binomialPMF(x, n, p)
Binomial PMF, using log-gamma implementation
float
binomialTest(x, n, p=0.5, alternative='two.sided')
Wrapper for the SciPy binomial two-sided test and binomial CDF calculations for one-tailed tests.
 
genericSymmetricPairedTest(statistic, statisticAlternative, alternative, *args, **kwds)
Generic symmetric paired test, to convert between different-sided results
float
permutationMeanTest(x, y, alternative='two.sided', pValue=0.99, iterations=100000, useStoppingRule=True, maxExactN=7)
Permutation test that tests for differences in the means of two samples (e.g., a two-sample t-like statistic of mean(s1)-mean(s2)).
float
permutationRankTest(x, y, alternative='two.sided', pValue=0.99, iterations=100000, useStoppingRule=True, maxExactN=7)
Permutation test that tests for differences in the ranks of two samples.
float
permutationTest(x, y, funct, alternative='two.sided')
A generic permutation hypothesis test between two sample populations.
float
pythonBinomialTest(x, n, p=0.5, alternative='two.sided', useMinlike=True)
Exact binomial test, where two-sided test uses a minlike formulation.
float
pythonNormalCDF(x, loc=0.0, scale=1.0)
Normal CDF (Phi) implementation based on the error function in Python 2.7
float
pythonSignTestStatistic(series, series2=None, mu=0.0, alternative='two.sided')
A sign test, which works based on the counts that are greater or less than the compared pairs or null hypothesis mean.
float, float
pythonWilcoxonSignedRankStatistic(series, series2=None, mu=0.0)
A Wilcoxon two-sided test.
float
scipyBinomialTestStatistic(x, n, p=0.5, alternative='two.sided')
Wrapper for the SciPy binomial two-sided test and binomial CDF calculations for one-tailed tests.
float, float
scipyWilcoxonStatistic(series, series2=None, mu=0.0)
The SciPy Wilcoxon two-sided test.
float
signTest(series, series2=None, mu=0.0, alternative='two.sided')
A sign test, which works based on the counts that are greater or less than the compared pairs or null hypothesis mean.
float
transformSymmetricPValueHypothesis(statisticVal, pValue, originalAlternative, newAlternative)
Transform a probability of one hypothesis into another hypothesis, assuming a symmetric distribution (such as a normal distribution).
float, float
wilcoxonMeanScore(series, series2=None, mu=0.0)
Get the mean Wilcoxon score, given equality
float, float
wilcoxonSignedRankStatistic(series, series2=None, mu=0.0)
A Wilcoxon two-sided test.
float
wilcoxonSignedRankTest(x, y=None, mu=0.0, alternative='two.sided')
A Wilcoxon Signed Rank test, for distributions symmetric around the median
Variables [hide private]
  GREATER_THAN_HYPOTHESIS = 'greater'
  IS_SCIPY_AVAILABLE = True
  LESS_THAN_HYPOTHESIS = 'less'
  SQRT_OF_TWO = 1.41421356237
  TEST_HYPOTHESES = frozenset(['greater', 'less', 'two.sided'])
  TWO_SIDED_HYPOTHESIS = 'two.sided'
  __loader__ = <zipimporter object "C:\Python27\lib\site-package...
  __package__ = 'InversionTest'
  pi = 3.14159265359
Function Details [hide private]

absAndValKey(x)

 

Return the absolute value and the value of a number

Parameters:
  • x (int or float) - Some number
Returns: tuple of (int or float, int or float)
abs(x), x

binomialCDF(x, n, p)

 

Binomial CDF, using log-gamma implementation

Parameters:
  • x (int) - # of successes
  • n (int) - Number of observations
  • p (float) - Probability of a success
Returns: float
Cummulative distribution function for x in the binomial distribution

binomialNormalApproximationHeuristic(n, p, threshold=0.3, minN=25)

 

A heuristic for when the normal approximation is reasonable

Parameters:
  • n (int) - Number of observations
  • p (float) - Probability of a success
  • threshold (float) - Threshold for the heuristic, in [0, 1] where 0 is never and 1 is always
  • minN (int) - Minimum n to have before using the approximation under any circumstances
Returns: bool
True (use normal distribution) if heuristic < threshold and n > minN, else False

binomialPMF(x, n, p)

 

Binomial PMF, using log-gamma implementation

Parameters:
  • x (int) - # of successes
  • n (int) - Number of observations
  • p (float) - Probability of a success
Returns: float
Point mass for x in the binomial distribution

binomialTest(x, n, p=0.5, alternative='two.sided')

 

Wrapper for the SciPy binomial two-sided test and binomial CDF calculations for one-tailed tests. This allows testing two-sided and both one-sided hypotheses.

Parameters:
  • x (int) - Number of successes
  • n (int) - Number of observations
  • p (float) - Assumed probability of a success, in [0,1]
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, in [0,1]

genericSymmetricPairedTest(statistic, statisticAlternative, alternative, *args, **kwds)

 

Generic symmetric paired test, to convert between different-sided results

Parameters:
  • statistic (callable) - A statistic function, in the form f(x, y, mu)
  • statisticAlternative (str) - The hypothesis for the test alternative, from TEST_HYPOTHESES set
  • args (list of object) - Variable arguments, to pass to the statistic
  • kwds (dict of {str : object}) - Variable keyword arguments, to pass to the statistic

permutationMeanTest(x, y, alternative='two.sided', pValue=0.99, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Permutation test that tests for differences in the means of two samples (e.g., a two-sample t-like statistic of mean(s1)-mean(s2)). If nToApprox is defined, this sets a cutoff for exact estimation after which a Monte Carlo approximation is used instead.

Parameters:
  • x (list of float) - First sample set of data points
  • y (list of float) - Second sample set of data points
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis, given the alternative

permutationRankTest(x, y, alternative='two.sided', pValue=0.99, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Permutation test that tests for differences in the ranks of two samples.

Parameters:
  • x (list of float) - First sample set of data points
  • y (list of float) - Second sample set of data points
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis, given the alternative

permutationTest(x, y, funct, alternative='two.sided')

 

A generic permutation hypothesis test between two sample populations. Runs all permutations of funct(x',y') where x' and y' are generated from the data points in x and y, then finds where funct(x,y) falls into the generated distribution. NOTE: This is crushingly slow as len(x) + len(y) > 10. A Monte Carlo or partial-coverage permutation test is recommended for larger N.

Parameters:
  • x (list of float) - First sample set of data points
  • y (list of float) - Second sample set of data points
  • funct (callable) - Statistic function, in form funct(x, y)
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, given the alternative

pythonBinomialTest(x, n, p=0.5, alternative='two.sided', useMinlike=True)

 

Exact binomial test, where two-sided test uses a minlike formulation. This two-sided approach was chosen to match frameworks like R and SciPy. For reference, the minlike calculation is the sum of all p(k,n) where p(k,n)<=p(x,n)

Parameters:
  • x (int) - Number of successes
  • n (int) - Number of observations
  • p (float) - Assumed probability of a success, in [0,1]
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
  • useMinlike (bool) - If True, calculate a minlike two-tail. Else, return 2*min(p_low, p_high).
Returns: float
Probability of the null hypothesis, in [0,1]

pythonNormalCDF(x, loc=0.0, scale=1.0)

 

Normal CDF (Phi) implementation based on the error function in Python 2.7

Parameters:
  • x (float) - Value for CDF, x in P(X<=x) where X ~ N(mu=loc, var=scale**2)
  • loc (float) - Location parameter (mean)
  • scale (float) - Scale parameter (variance)
Returns: float
CDF value from the normal distribution

pythonSignTestStatistic(series, series2=None, mu=0.0, alternative='two.sided')

 

A sign test, which works based on the counts that are greater or less than the compared pairs or null hypothesis mean. Uses the binomial test with p=0.5 to calculate the probability.

Parameters:
  • series (list of float) - The series of values to test
  • series2 (list of float or None) - A series of comparison pairs, optionally. If None, mu is used instead.
  • mu (float) - A comparison value to compare all values in series against (used only if series2 is None)
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, in [0,1]

pythonWilcoxonSignedRankStatistic(series, series2=None, mu=0.0)

 

A Wilcoxon two-sided test. This uses a normal approximation and adjusts for ties using a variance penalty of (t^3-t)/48.0 for each tie of length t.

Parameters:
  • series (list of float) - A series of values
  • series2 (list of float or None) - A second series of values, optionally (if None, mu is used instead)
  • mu (float) - The presumed median for values (used only if y is None)
Returns: float, float
Wilcoxon statistic value (W+), Probability of the null hypothesis

scipyBinomialTestStatistic(x, n, p=0.5, alternative='two.sided')

 

Wrapper for the SciPy binomial two-sided test and binomial CDF calculations for one-tailed tests. This allows testing two-sided and both one-sided hypotheses.

Parameters:
  • x (int) - Number of successes
  • n (int) - Number of observations
  • p (float) - Assumed probability of a success, in [0,1]
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, in [0,1]

scipyWilcoxonStatistic(series, series2=None, mu=0.0)

 

The SciPy Wilcoxon two-sided test. This test always uses a normal approximation and does not adjust for ties properly, but should be (theoretically) faster than the pure python versions here, as it uses C code under the hood.

Parameters:
  • series (list of float) - A series of values
  • series2 (list of float or None) - A second series of values, optionally (if None, mu is used instead)
  • mu (float) - The presumed median for values (used only if y is None)
Returns: float, float
Wilcoxon statistic value of min(W+,W-), Probability of the null hypothesis

signTest(series, series2=None, mu=0.0, alternative='two.sided')

 

A sign test, which works based on the counts that are greater or less than the compared pairs or null hypothesis mean. Uses the binomial test with p=0.5 to calculate the probability.

Parameters:
  • series (list of float) - The series of values to test
  • series2 (list of float or None) - A series of comparison pairs, optionally. If None, mu is used instead.
  • mu (float) - A comparison value to compare all values in series against (used only if series2 is None)
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, in [0,1]

transformSymmetricPValueHypothesis(statisticVal, pValue, originalAlternative, newAlternative)

 

Transform a probability of one hypothesis into another hypothesis, assuming a symmetric distribution (such as a normal distribution). This is used to convert from functions that return a statistic and pValue, but a different hypothesis must be examined.

Parameters:
  • statisticVal (float) - Value of the statistic (assumed that the CDF integrates from lowest to highest values)
  • pValue (float) - Probability, as based on the original alternative for the statistic calculator
  • originalAlternative (str) - The alternate hypothesis assumed by the test, from TEST_HYPOTHESES set
  • newAlternative (str) - The new alternate hypothesis to report a pValue for
Returns: float
Adjusted pValue to reflect new alternative hypothesis

wilcoxonMeanScore(series, series2=None, mu=0.0)

 

Get the mean Wilcoxon score, given equality

Parameters:
  • series (list of float) - A series of values
  • series2 (list of float or None) - A second series of values, optionally (if None, mu is used instead)
  • mu (float) - The presumed median for values (used only if y is None)
Returns: float, float
Mean Wilcoxon statistic value

wilcoxonSignedRankStatistic(series, series2=None, mu=0.0)

 

A Wilcoxon two-sided test. This uses a normal approximation and adjusts for ties using a variance penalty of (t^3-t)/48.0 for each tie of length t.

Parameters:
  • series (list of float) - A series of values
  • series2 (list of float or None) - A second series of values, optionally (if None, mu is used instead)
  • mu (float) - The presumed median for values (used only if y is None)
Returns: float, float
Wilcoxon statistic value (W+), Probability of the null hypothesis

wilcoxonSignedRankTest(x, y=None, mu=0.0, alternative='two.sided')

 

A Wilcoxon Signed Rank test, for distributions symmetric around the median

Parameters:
  • x (list of float) - A series of values
  • y (list of float or None) - A second series of values, optionally (if None, mu is used instead)
  • mu (float) - The presumed median for values (used only if y is None)
  • alternative (str) - The alternate hypothesis for this test, from TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis, given the alternative

Variables Details [hide private]

__loader__

Value:
<zipimporter object "C:\Python27\lib\site-packages\inversiontest-1.1-p\
y2.7.egg\InversionTest\">