Package InversionTest :: Module InversionAnalysis :: Class GradedPosetSequence
[hide private]
[frames] | no frames]

Class GradedPosetSequence

object --+
         |
        GradedPosetSequence

An analysis of the inversion distribution of a graded poset This can be used to analyze the reference sequence against a sorted version of itself or against permutations of these elements in a sequence passed in (which may be graded or not). Note: When comparing two sequences, it should be noted that these inversion measures are symmetric with respect to which sequence is considered the reference versus the comparison.

Graded sequences are represented through nested lists, where each subsequence contains the elements of a grade. So then, the sequence: [[A,B,C], [1,2,3], [X,Y]] contains three grades, where A, B, and C have rank 1, [1,2,3] have rank 2, etc. Elements that share a grade cannot have inversions between them. The elements can be any valid Python objects that have valid comparison operations (including custom class instance). However, if any of the elements cannot be hashed, the hashableElements flag must be set false or problems will occur.

Note: Technically, while this code refers to ``permutations'', two sequences with the same elements but different grade structures are not technically permutations (though two with the same grade structure, regardless of structure, are certainly permutations). However, they are VERY close to permutations and may be considered analogous to permutations where you can't see the order of certain elements (i.e., censored observations).

Instance Methods [hide private]
 
__init__(self, sequence, isGraded=False, hashableElements=True, validate=False, defaultGrade=None)
Initialize the reference sequence
list of object or object
__iter__(self)
Iterate over the grades/elements
int
__len__(self)
Get the number of grades in the sequence
int or float
_calcGradedStatistic(self, operation, sequence=None, isGraded=False)
Calculate a series length statistic, adjusting for incomparable elements.
List of list of object
_calcOverloadedRanks(self, sequence, isGraded, otherSequence=None, isOtherGraded=False)
Calculate the grades with more than one element (overloaded grades).
list of list of object
_calcRankIntersections(self, overloadedRanks, overloadedRanks2)
Calculate the intersection of overloaded ranks between sequences.
float
_comparisonTest(self, test, sequences, sequences2=None, isGraded=False, measure='similarity', **kwds)
Perform a hypothesis test on either the similarity or the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq.
int
_getElementIndex(self, element, sequence, isGraded=False)
Get the index of the given value in the sequence Note: This hashes element indices, if possible
 
_populateHashRanks(self)
Populate hash table that reports the grade of any element.
float
cdf(self, sequence=None, isGraded=False, cdfType='adaptive')
Calculate a CDF value for inversions between the reference sequence abd a comparison sequence.
float
correlation(self, sequence=None, isGraded=False)
The correlation of the reference sequence to a comparison sequence.
float
exactCDF(self, sequence=None, isGraded=False)
Calculate an exact CDF value for inversions between the reference sequence abd a comparison sequence.
int
getNumElements(self)
Get the number of elements in this sequence
list of list of object
getOverloadedRanks(self)
Get the elements in grades with more than one element
int
inversions(self, sequence=None, isGraded=False)
The inversions between the reference sequence and the comparison sequence.
 
isGraded(self)
Return if this is a graded poset.
list of object or object
iterElements(self)
Iterate over the elements (flat iteration over grades)
int
maxInversions(self, sequence=None, isGraded=False)
The maximum possible inversions, given the number of elements and the number sharing each grade (e.g.
int
mean(self, sequence=None, isGraded=False)
The mean of inversion across permutations, given the number of elements and the number sharing each grade (e.g.
float
normalCDF(self, sequence=None, isGraded=False)
Calculate an approximate CDF value for inversions between the reference sequence abd a comparison sequence.
float
permutationMeanTest(self, sequences, sequences2, isGraded=False, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)
Perform a Permutation or Monte Carlo Mean Difference test on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq.
float
permutationRankTest(self, sequences, sequences2, isGraded=False, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)
Perform a Permutation or Monte Carlo rank test on a measure (similarity or 1-CDF) for the match between the each sequence and the reference seq.
float
signTest(self, sequences, sequences2=None, isGraded=False, measure='similarity', mu=0.5, alternative='greater')
Perform a Sign test on either the similarity or the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq.
float
similarity(self, sequence=None, isGraded=False)
The similarity of the reference sequence to a comparison sequence.
int
stdDev(self, sequence=None, isGraded=False)
The standard deviation of inversions across permutations, given the elements and the number sharing each grade (e.g.
 
validate(self)
Check that this sequence is valid, so that it has no repeated elements, that elements are hashable if sequence is hashable, and that a graded sequence contains iterated grades.
int
variance(self, sequence=None, isGraded=False)
The variance of inversions across permutations, given the number of elements and the number sharing each grade (e.g.
float
wilcoxonTest(self, sequences, sequences2=None, isGraded=False, cdfType='adaptive', alternative='greater')
Perform a Wilcoxon Rank Sum test on the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq Mu cannot be set on this test, as it would violate the assumption of symmetry.

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods [hide private]
bool
_calcNumElements(cls, sequence, isGraded)
Calculate the number of elements in the sequence.
 
validateSequence(cls, sequence, isGraded=False, hashableElements=True)
Check that a sequence is valid, so that it has no repeated elements, and that elements are hashable if sequence is hashable.
Class Variables [hide private]
  APPROXIMATION_RANKS_BOUND = 100
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, sequence, isGraded=False, hashableElements=True, validate=False, defaultGrade=None)
(Constructor)

 

Initialize the reference sequence

Parameters:
  • sequence (list of object or list of list of object) - Reference sequence, for comparing permutations against
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
  • hashableElements (bool) - If True, elements can be hashed (allowing slightly faster performance)
  • validate (bool) - If True, run basic validation on the sequence inside
Overrides: object.__init__

__iter__(self)

 

Iterate over the grades/elements

Returns: list of object or object
If nested, yield a grade (and all elements in it). If flat, yield an element.

__len__(self)
(Length operator)

 

Get the number of grades in the sequence

Returns: int
Number of grades

_calcGradedStatistic(self, operation, sequence=None, isGraded=False)

 

Calculate a series length statistic, adjusting for incomparable elements. This assumes that the operation can be evaluated on the total number of elements, then the operation is evaluated on the subsets of incomparable elements, which are subtracted off from the full-set operation value. If the 'sequence' param is provided, the grades for this sequence is subtracted off also (without double-counting). The max, mean, and variance can all be calculated using this approach.

Parameters:
  • operation (callable) - The operation to evaluate, in the form 'numeric = f(int)'
  • sequence (list of list of object or list of object) - A comparison sequence to evaluate grades for also. If None, ignored.
  • isGraded (bool) - If True, sequence is nested (where nested elements share a grade)
Returns: int or float
Value of the graded statistic

_calcNumElements(cls, sequence, isGraded)
Class Method

 

Calculate the number of elements in the sequence. This is trivial for a flat sequence and quick for a graded one.

Parameters:
  • sequence (list of list of object or list of object) - The sequence to count elements in
  • isGraded (bool) - If True, sequence is nested (where nested elements share a grade)
Returns: bool
The number of elements in the sequence (N)

_calcOverloadedRanks(self, sequence, isGraded, otherSequence=None, isOtherGraded=False)

 

Calculate the grades with more than one element (overloaded grades). If two sequences are provided, this only calculates overloaded ranks for elements that are contained in both sequences (in case one is missing elements) Note: If a sequence is not nested, it has no overloaded ranks by definition Note 2: The 'otherSequence' is only needed to handle missing elements

Parameters:
  • sequence (list of list of object or list of object) - Sequence to calculate the overloaded grades
  • isGraded (bool) - If True, sequence is nested (where nested elements share a grade)
  • otherSequence (list of list of object or list of object) - Optionally, a reference sequence to only count elements in both sequences
  • isOtherGraded (bool) - If True, sequence is nested (where nested elements share a grade)
Returns: List of list of object
List of subsets of elements that cannot be compared against each other

_calcRankIntersections(self, overloadedRanks, overloadedRanks2)

 

Calculate the intersection of overloaded ranks between sequences. This basically calculates "overloaded overloads" of ranks.

Parameters:
  • overloadedRanks (list of list of object param overloadedRanks2: Second set of overloaded (len > 1) ranks) - First set of overloaded (len > 1) ranks
  • overloadedRanks2 (list of list of object)
Returns: list of list of object
List of intersections between overloaded grades

_comparisonTest(self, test, sequences, sequences2=None, isGraded=False, measure='similarity', **kwds)

 

Perform a hypothesis test on either the similarity or the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq. If two sequences are provided, this performs a two sample test of an appropriate type.

Parameters:
  • test (callable) - The hypothesis test function, in the form: f(sample1, sample1=None, **kwds)
  • sequences (list of list of object or list of list of list of object) - Multiple sequences that are permutations of the elements of this sequence
  • sequences2 (list of list of object or list of list of list of object) - Multiple sequences that are permutations (second sample)
  • isGraded (bool) - If True, each sequence is nested (where nested elements share a grade)
  • measure (str) - The measure to calculate for each sequence, either SIMILARITY_METRIC or from CDF_TYPES set
  • kwds (dict) - Additional test-specific parameters passed to the test
Returns: float
Probability of the null hypothesis

_getElementIndex(self, element, sequence, isGraded=False)

 

Get the index of the given value in the sequence Note: This hashes element indices, if possible

Parameters:
  • element (object) - Value to get the position for
  • sequence (list of list of object) - Nested sequence which is used to determine sorting value of elements
  • isGraded (bool) - If True, sequence is graded (nested). Else, flat sequence.
Returns: int
Sorting key for the value

_populateHashRanks(self)

 

Populate hash table that reports the grade of any element. Slight hit to memory, but significant boost in speed for comparisons.

cdf(self, sequence=None, isGraded=False, cdfType='adaptive')

 

Calculate a CDF value for inversions between the reference sequence abd a comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence. This can use either an exact distribution or a normal approximation.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
  • cdfType (str) - The CDF function to use. If ADAPTIVE_CDF, automatically switches to a normal approximation when the # of elements exceeds a bound. If EXACT_CDF, calculates the PDF exactly and sums over it. This gets slow and is seldom necessary after about 100 elements. If NORMAL_CDF, uses the mean and variance to generate a normal distribution. Except for small numbers of elements or very large grades, this is advised.
Returns: float
P(x<=X) where x is a random variable for permutation inversions and X is the number of inversions between the reference and comparison sequences.

correlation(self, sequence=None, isGraded=False)

 

The correlation of the reference sequence to a comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence. This is just a rescaled form of similarity. This should also be equivalent to a Kendall's Tau correlation under standard conditions.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: float
The correlation of the reference sequence to the comparison sequence, in [-1,1]

exactCDF(self, sequence=None, isGraded=False)

 

Calculate an exact CDF value for inversions between the reference sequence abd a comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence. Note: The exact CDF calculation works very slowly as N gets greater than 100 elements.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: float
P(x<=X) where x is a random variable for permutation inversions and X is the number of inversions between the reference and comparison sequences.

getNumElements(self)

 

Get the number of elements in this sequence

Returns: int
Number of elements

getOverloadedRanks(self)

 

Get the elements in grades with more than one element

Returns: list of list of object
List of grades with their elements

inversions(self, sequence=None, isGraded=False)

 

The inversions between the reference sequence and the comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: int
If 'sequence' given, number of inversions to turn it into reference sequence. Else, the number of inversions to sort the reference sequence.

isGraded(self)

 

Return if this is a graded poset. If True, is graded.

iterElements(self)

 

Iterate over the elements (flat iteration over grades)

Returns: list of object or object
Yield all the elements in the poset

maxInversions(self, sequence=None, isGraded=False)

 

The maximum possible inversions, given the number of elements and the number sharing each grade (e.g. ties). If sequence param is given, the structure of the comparison sequence is considered also.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: int
Maximum possible number of inversions

mean(self, sequence=None, isGraded=False)

 

The mean of inversion across permutations, given the number of elements and the number sharing each grade (e.g. ties). If sequence param is given, the structure of the comparison sequence is considered also.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: int
Mean number of inversions across possible permutations

normalCDF(self, sequence=None, isGraded=False)

 

Calculate an approximate CDF value for inversions between the reference sequence abd a comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence. Note: The normal distribution gets quite accurate by N > 20 and should be used for larger sequences.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: float
P(x<=X) where x is a random variable for permutation inversions and X is the number of inversions between the reference and comparison sequences.

permutationMeanTest(self, sequences, sequences2, isGraded=False, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Perform a Permutation or Monte Carlo Mean Difference test on some measure (similarity or 1-CDF) for the match between the each sequence and the reference seq. This requires two samples, as these are shuffled for form the permutations.

Parameters:
  • sequences (list of list of object or list of list of list of object) - Multiple sequences that are permutations of the elements of this sequence
  • sequences2 (list of list of object or list of list of list of object) - Multiple sequences that are permutations (second sample)
  • isGraded (bool) - If True, each sequence is nested (where nested elements share a grade)
  • measure (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternative hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis

permutationRankTest(self, sequences, sequences2, isGraded=False, measure='adaptive', alternative='greater', pValue=0.95, iterations=100000, useStoppingRule=True, maxExactN=7)

 

Perform a Permutation or Monte Carlo rank test on a measure (similarity or 1-CDF) for the match between the each sequence and the reference seq. This requires two samples, as these are shuffled for form the permutations.

Parameters:
  • sequences (list of list of object or list of list of list of object) - Multiple sequences that are permutations of the elements of this sequence
  • sequences2 (list of list of object or list of list of list of object) - Multiple sequences that are permutations (second sample)
  • isGraded (bool) - If True, each sequence is nested (where nested elements share a grade)
  • measure (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternative hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
  • pValue (float) - The p-Value for the test to confirm, used for Monte Carlo early termination
  • iterations (int) - The max number of iterations to run for Monte Carlo
  • useStoppingRule (bool) - If True, use version of MonteCarlo with an unbiased early stopping rule
  • maxExactN (int) - The largest N=len(x)+len(y) to calculate an exact test value. For values higher than this, use a Monte Carlo approximation.
Returns: float
Probability of the null hypothesis

signTest(self, sequences, sequences2=None, isGraded=False, measure='similarity', mu=0.5, alternative='greater')

 

Perform a Sign test on either the similarity or the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq. This has the advantage of calculating based on the similarity, which is quicker than calculating the CDF probability of a particular inversion counts. If two sequences are provided, this performs a two sample paired test.

Parameters:
  • sequences (list of list of object or list of list of list of object) - Multiple sequences that are permutations of the elements of this sequence
  • sequences2 (list of list of object or list of list of list of object) - Multiple sequences that are permutations (second sample)
  • isGraded (bool) - If True, each sequence is nested (where nested elements share a grade)
  • measure (str) - The measure to calculate for each sequence, either SIMILARITY_METRIC or from CDF_TYPES set
  • mu (float) - Assumed mean of the sign test, to allow assymetric testing
  • alternative (str) - Alternate hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis

similarity(self, sequence=None, isGraded=False)

 

The similarity of the reference sequence to a comparison sequence. If the 'sequence' param is None, the comparison sequence is a fully sorted form of the reference sequence.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence (If None, equivalent to a sorted reference sequence)
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: float
The similarity of the reference sequence to the comparison sequence, in [0,1]

stdDev(self, sequence=None, isGraded=False)

 

The standard deviation of inversions across permutations, given the elements and the number sharing each grade (e.g. ties). If sequence param is not given, the structure of the comparison sequence is considered also.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: int
Standard deviation of inversions across possible permutations

variance(self, sequence=None, isGraded=False)

 

The variance of inversions across permutations, given the number of elements and the number sharing each grade (e.g. ties). If sequence param is given, the structure of the comparison sequence is considered also.

Parameters:
  • sequence (list of list of object or list of object or None) - Comparison sequence
  • isGraded (bool) - If true, sequence consists of subsequences containing each grade. Else, flat list.
Returns: int
Variance of inversions across possible permutations

wilcoxonTest(self, sequences, sequences2=None, isGraded=False, cdfType='adaptive', alternative='greater')

 

Perform a Wilcoxon Rank Sum test on the complements of the CDF values, which are a measure of the match between the each sequence and the reference seq Mu cannot be set on this test, as it would violate the assumption of symmetry. If two sequences are provided, this performs a two-sample paired test.

Parameters:
  • sequences (list of list of object or list of list of list of object) - Multiple sequences that are permutations of the elements of this sequence
  • sequences2 (list of list of object or list of list of list of object) - Multiple sequences that are permutations (second sample)
  • isGraded (bool) - If True, each sequence is nested (where nested elements share a grade)
  • cdfType (str) - The type of CDF to calculate for the inversion counts, from CDF_TYPES set
  • alternative (str) - Alternative hypothesis for the test, from StatisticalTests.TEST_HYPOTHESES set
Returns: float
Probability of the null hypothesis