Package InversionTest :: Module InversionAnalysis
[hide private]
[frames] | no frames]

Module InversionAnalysis

Main module for inversion counting, hypothesis tests, and sequence transformations. The exposed API of the package relies on the implementations contained in this module.

Author: Benjamin D. Nye License: Apache License V2.0

Classes [hide private]
  GradedPosetSequence
An analysis of the inversion distribution of a graded poset This can be used to analyze the reference sequence against a sorted version of itself or against permutations of these elements in a sequence passed in (which may be graded or not).
  InvalidGradedPosetError
  MissingElementError
Functions [hide private]
float
_inversionComparisonTest(test, classTest, orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, measure='adaptive', **kwds)
A hypothesis test for comparing the similarity of some orderings based on some test function (test) and its equivalent in the GradedPosetSequence class (classTest).
list of object or list of list of object
filterElements(sequence, excludedElements, isGraded=False)
Remove elements from a nested list, if they are part of the excluded elements.
int
getDefaultIndex(element, maxIndex, defaultGrade=None)
Get a default index for a missing element.
object
getMedianValue(values)
Get the median of a sequence Each element of the sequence must be able to be averaged (added and divided by a number)
int
getValueIndex(value, sequence, isGraded=False, defaultGrade=None)
Get the index of the given value in the sequence
int
inversionCount(seq, seq2=None, isGraded=False, hashableElements=True)
Calculate the inversions between seq and seq2.
float
inversionCountCDF(seq, seq2=None, isGraded=False, hashableElements=True, cdfType='adaptive')
Get the cummulative distribution function probability for the number of inversions between seq and seq2.
int
inversionCountMax(seq, seq2=None, isGraded=False, hashableElements=True)
Calculate the maximum possible number of inversions, based on the sequence (# of elements and grade structure).
float
inversionCountMean(seq, seq2=None, isGraded=False, hashableElements=True)
Calculate the mean inversions across permutations, based on the sequence (# of elements and grade structure).
float
inversionCountVariance(seq, seq2=None, isGraded=False, hashableElements=True)
Calculate the variance of inversions across permutations, based on the sequence (# of elements and grade structure).
float
inversionPermutationMeanTest(orderings, orderings2, seq=None, isGraded=False, hashableElements=True, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)
Perform a Permutation or Monte Carlo Mean Difference Test on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq.
float
inversionPermutationRankTest(orderings, orderings2, seq=None, isGraded=False, hashableElements=True, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)
Perform a Permutation or Monte Carlo Rank on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq.
float
inversionSignTest(orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, measure='similarity', alternative='greater', mu=0.5)
A sign test based on the values of some measure.
float
inversionSimilarity(seq, seq2=None, isGraded=False, hashableElements=True)
Calculate the similarity between sequences: 1-inversions/maxInversions If seq2 is None, this works equivalently to seq2 being a sorted version of seq
float
inversionWilcoxonTest(orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, cdfType='adaptive', alternative='greater')
A Wilcoxon rank sum test based on the complement of the CDF (1-cdf).
list of int
makeFlatIndexSequence(sequence, referenceSequence, isSeqNested=False, isRefNested=False, tieFunction=<built-in function sorted>, indexFunction=<function getValueIndex at 0x05143FB0>)
Convert 'sequence' into a sequence of indices from the reference sequence, Note: This flattens the sequence, if it was nested, for inversion counting.
list of list of int or list of int
makeIndexSequence(sequence, referenceSequence, isSeqNested=False, isRefNested=False, tieFunction=<built-in function sorted>, indexFunction=<function getValueIndex at 0x05143FB0>)
Convert 'sequence' into a sequence of indices from the reference sequence, rather than objects Note: This retains the original sequence structure (so if the sequence was nested, it remains nested)
list of list of object
medianSequence(permutations, isGraded=False, hashableElements=True)
From the given permutations, make a sequence ordered by the median rank of each element.
int, list
mergeSortInversionCount(sequence)
Calculate the number of inversions requried to sort sequence via mergesort.
 
pairwiseGroupComparison(orderings, orderings2, isGraded=False, hashableElements=True, cdfType='adaptive', alternative='greater')
Variables [hide private]
  ADAPTIVE_CDF = 'adaptive'
  CDF_TYPES = frozenset(['adaptive', 'exact', 'normal'])
  DEFAULT_GRADES = frozenset(['Left', 'Max', 'Mean', 'Right', 'Z...
  EXACT_CDF = 'exact'
  GREATER_THAN_HYPOTHESIS = 'greater'
  LEFT_GRADE = 'Left'
  MAX_GRADE = 'Max'
  MEAN_GRADE = 'Mean'
  NORMAL_CDF = 'normal'
  RIGHT_GRADE = 'Right'
  SIMILARITY_METRIC = 'similarity'
  VALID_GRADE_CONTAINERS = (<type 'list'>, <type 'tuple'>, <type...
  ZERO_GRADE = 'Zero'
  __loader__ = <zipimporter object "C:\Python27\lib\site-package...
  __package__ = 'InversionTest'
Function Details [hide private]

_inversionComparisonTest(test, classTest, orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, measure='adaptive', **kwds)

 

A hypothesis test for comparing the similarity of some orderings based on some test function (test) and its equivalent in the GradedPosetSequence class (classTest).

Parameters:
  • test (callable) - A test function, in the form, f(orderings, orderings2, **kwds)
  • classTest (callable) - An unbound method for the GradedPosetSequence class, in the form f(self, orderings, orderings2, isGraded, measure, **kwds)
  • orderings (list of list of object or list of list of list of object) - One or more comparison sequences (first sample)
  • orderings2 (list of list of object or list of list of list of object) - One or more comparison sequences (second sample)
  • seq (list of object or list of list of object) - Sequence to calculate the number of inversions
  • isGraded (bool) - If True, both 'seq' and sequences in 'orderings' are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
  • measure (str) - The type of CDF or similarity to calculate for the inversion counts
Returns: float
Probability of the null hypothesis

filterElements(sequence, excludedElements, isGraded=False)

 

Remove elements from a nested list, if they are part of the excluded elements. Returns a copy of the list, in all cases

Parameters:
  • sequence (list of object or list of list of object) - Standard sequence ([x,y,...]) or nested sequence ([[x,..], [y,..],..])
  • excludedElements (list of object) - List of elements to be removed from the sequence
Returns: list of object or list of list of object
Sequence (in same structure) with excluded elements removed

getDefaultIndex(element, maxIndex, defaultGrade=None)

 

Get a default index for a missing element. Default grades are: LEFT_GRADE - Always -1, ranked before the lowest existing element(s) ZERO_GRADE - Always 0, ranked equal to the lowest element(s) MEAN_GRADE - Always half of the maximum grade MAX_GRADE - Always equal to the maximum grade, ranked equal to highest element(s) RIGHT_GRADE - Always maximum grade +1 than the maximum grade, after the highest element(s)

Parameters:
  • maxIndex (int) - Maximum possible grade for sequence
  • defaultGrade (str) - The type of default grade to use, from DEFAULT_GRADES set
Returns: int
Ranking for a default element

getMedianValue(values)

 

Get the median of a sequence Each element of the sequence must be able to be averaged (added and divided by a number)

Parameters:
  • values (list of object) - A list of values
Returns: object
Median value of the list

getValueIndex(value, sequence, isGraded=False, defaultGrade=None)

 

Get the index of the given value in the sequence

Parameters:
  • value (object) - Value to get the position for
  • sequence (list of list of object) - Nested sequence which is used to determine sorting value of elements
Returns: int
Sorting key for the value

inversionCount(seq, seq2=None, isGraded=False, hashableElements=True)

 

Calculate the inversions between seq and seq2. If seq2 is None, this works equivalently to seq2 being a sorted version of seq

Parameters:
  • seq (list of object or list of list of object) - Sequence to calculate the inversions for
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: int
Number of inversions

inversionCountCDF(seq, seq2=None, isGraded=False, hashableElements=True, cdfType='adaptive')

 

Get the cummulative distribution function probability for the number of inversions between seq and seq2. If seq2 is None, this works equivalently to seq2 being a sorted version of seq

Parameters:
  • seq (list of object or list of list of object) - Sequence to calculate the number of inversions
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: float
P(X<=x) where X is a R.V. based on all permutations and x is the actual number of inversions between seq and seq2

inversionCountMax(seq, seq2=None, isGraded=False, hashableElements=True)

 

Calculate the maximum possible number of inversions, based on the sequence (# of elements and grade structure). If the second optional sequence is provided, its grade structure is also considered. Note: This only depends on the structure of the sequence(s), not their order

Parameters:
  • seq (list of object or list of list of object) - Sequence to get the maximum possible inversions for any permutation
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: int
Maximum inversions

inversionCountMean(seq, seq2=None, isGraded=False, hashableElements=True)

 

Calculate the mean inversions across permutations, based on the sequence (# of elements and grade structure). If the second optional sequence is provided, its grade structure is also considered. Note: This only depends on the structure of the sequence(s), not their order

Parameters:
  • seq (list of object or list of list of object) - Sequence to get the mean inversions for all possible permutations
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: float
Mean inversions

inversionCountVariance(seq, seq2=None, isGraded=False, hashableElements=True)

 

Calculate the variance of inversions across permutations, based on the sequence (# of elements and grade structure). If the second optional sequence is provided, its grade structure is also considered. Note: This only depends on the structure of the sequence(s), not their order

Parameters:
  • seq (list of object or list of list of object) - Sequence to get the variance of inversions for all possible permutations
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: float
Variance of inversions

inversionPermutationMeanTest(orderings, orderings2, seq=None, isGraded=False, hashableElements=True, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Perform a Permutation or Monte Carlo Mean Difference Test on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq. This requires two samples, as these are shuffled for form the permutations.

Parameters:
  • orderings (list of list of object or list of list of list of object) - One or more comparison sequences (first sample)
  • orderings2 (list of list of object or list of list of list of object) - One or more comparison sequences (second sample)
  • seq (list of object or list of list of object) - Sequence to calculate the number of inversions
  • isGraded (bool) - If True, both 'seq' and sequences in 'orderings' are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
  • measure (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternative hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis

inversionPermutationRankTest(orderings, orderings2, seq=None, isGraded=False, hashableElements=True, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Perform a Permutation or Monte Carlo Rank on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq. This requires two samples, as these are shuffled for form the permutations.

Parameters:
  • orderings (list of list of object or list of list of list of object) - One or more comparison sequences (first sample)
  • orderings2 (list of list of object or list of list of list of object) - One or more comparison sequences (second sample)
  • seq (list of object or list of list of object) - Sequence to calculate the number of inversions
  • isGraded (bool) - If True, both 'seq' and sequences in 'orderings' are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
  • measure (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternative hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis

inversionSignTest(orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, measure='similarity', alternative='greater', mu=0.5)

 

A sign test based on the values of some measure. This measure can be either similarity or the complement of the cdf (1-cdf), both of which are in the range [0,1] where 0 is the worst match and 1 is the best match. For each sequence in 'orderings', the measure is calculated comparing seq against the permutation. This creates a list of measure values that the sign test can be applied to. If two sets of orderings are provided, this runs a two-sample test and ignores mu. Note: A calculation based on similarity will provide different results than one based on CDF if mu!=0.5. When mu=0.5, both are identical.

Parameters:
  • orderings (list of list of object or list of list of list of object) - One or more comparison sequences (first sample)
  • orderings2 (list of list of object or list of list of list of object) - One or more comparison sequences (second sample)
  • seq (list of object or list of list of object) - Reference sequence to calculate the number of inversions against
  • isGraded (bool) - If True, both 'seq' and sequences in 'orderings' are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
  • measure (str) - The measure to calculate for each sequence, either SIMILARITY_METRIC or from CDF_TYPES set
  • mu (float) - Assumed mean of the sign test, to allow assymetric testing
  • alternative (str) - Alternate hypothesis for the test, from the TEST_HYPOTHESES set ('greater', 'less', 'two.sided')
Returns: float
Probability of the null hypothesis

inversionSimilarity(seq, seq2=None, isGraded=False, hashableElements=True)

 

Calculate the similarity between sequences: 1-inversions/maxInversions If seq2 is None, this works equivalently to seq2 being a sorted version of seq

Parameters:
  • seq (list of object or list of list of object) - Sequence to get the similarity for
  • seq2 (list of object or list of list of object) - A comparison sequence for the sequence 'seq'
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: float
Similarity of sequences by inversion counts, in [0,1], where 1 is a perfect match

inversionWilcoxonTest(orderings, orderings2=None, seq=None, isGraded=False, hashableElements=True, cdfType='adaptive', alternative='greater')

 

A Wilcoxon rank sum test based on the complement of the CDF (1-cdf). For each ordering in 'orderings', 1-cdf is calculated comparing seq against the permutation. This creates a list of probabilities of getting a worse match by chance (one probability for each comparison sequence).

Parameters:
  • orderings (list of list of object or list of list of list of object) - One or more comparison sequences (first sample)
  • orderings2 (list of list of object or list of list of list of object) - One or more comparison sequences (second sample)
  • seq (list of object or list of list of object) - Sequence to calculate the number of inversions
  • isGraded (bool) - If True, both 'seq' and sequences in 'orderings' are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
  • cdfType (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternate hypothesis for the test, from the TEST_HYPOTHESES set ('greater', 'less', 'two.sided')
Returns: float
Probability of the null hypothesis

makeFlatIndexSequence(sequence, referenceSequence, isSeqNested=False, isRefNested=False, tieFunction=<built-in function sorted>, indexFunction=<function getValueIndex at 0x05143FB0>)

 

Convert 'sequence' into a sequence of indices from the reference sequence, Note: This flattens the sequence, if it was nested, for inversion counting. rather than objects, then flatten.

Parameters:
  • sequence (list of list of object or list of object) - Sequence to convert into indices
  • referenceSequence (list of list of object or list of object) - Sequence with "true ordering" used to collect element indices
  • isSeqNested (bool) - If True, sequence is nested (where nested elements share a grade)
  • isRefNested (bool) - If True, reference is nested (where nested elements share a grade)
  • indexFunction (callable) - Function for calculating the index of an element, as f(element, referenceSequence, isRefNested)
Returns: list of int
Flat sequence of indices, where an element's index is its grade in the reference sequence

makeIndexSequence(sequence, referenceSequence, isSeqNested=False, isRefNested=False, tieFunction=<built-in function sorted>, indexFunction=<function getValueIndex at 0x05143FB0>)

 

Convert 'sequence' into a sequence of indices from the reference sequence, rather than objects Note: This retains the original sequence structure (so if the sequence was nested, it remains nested)

Parameters:
  • sequence (list of list of object or list of object) - Sequence to convert into indices
  • referenceSequence (list of list of object or list of object) - Sequence with "true ordering" used to collect element indices
  • isSeqNested (bool) - If True, sequence is nested (where nested elements share a grade)
  • isRefNested (bool) - If True, reference is nested (where nested elements share a grade)
  • tieFunction (callable) - How to sort elements within the same grade (after applying indices), in form seq = f(seq) The 'sorted' function assumes they are incomparable and removes any inversions. By comparison, a reversed sort would apply an inversion penalty to elements in a grade.
  • indexFunction (callable) - Function for calculating the index of an element, as f(element, referenceSequence, isRefNested)
Returns: list of list of int or list of int
The 'sequence' parameter tranformed into integer indices, based on the reference sequence ordering.

medianSequence(permutations, isGraded=False, hashableElements=True)

 

From the given permutations, make a sequence ordered by the median rank of each element. Note: This requires that all permutations contain the same set of elements. If this condition does not hold, then permutations with less elements would by definition have lower median ranks. As such, this raises a KeyError if this case occurs.

Parameters:
  • permutations (list of list of object or list of list of list of object) - Permutations to gather the ranks for
  • isGraded (bool) - If True, both seq and seq2 are nested (where nested elements share a grade)
  • hashableElements (bool) - If True, elements can be hashed (improves speed)
Returns: list of list of object
A graded sequence based on the median ranks of elements. Grades occur if elements share the same median rank.

mergeSortInversionCount(sequence)

 

Calculate the number of inversions requried to sort sequence via mergesort. This function only works on a flat list. Note: Relies on python sort to do actual merging (but not the counting). More efficient this way, as the Python version is in C.

Parameters:
  • sequence (list) - Sequence to sort and count inversions
Returns: int, list
Number of inversions, sorted list

Variables Details [hide private]

DEFAULT_GRADES

Value:
frozenset(['Left', 'Max', 'Mean', 'Right', 'Zero'])

VALID_GRADE_CONTAINERS

Value:
(<type 'list'>, <type 'tuple'>, <type 'set'>)

__loader__

Value:
<zipimporter object "C:\Python27\lib\site-packages\inversiontest-1.1-p\
y2.7.egg\InversionTest\">