match2seq

Match 2 sequences against each other, deleting all positions that differ. compareStructures() compares sequences of 2 structures and returns a residue mask for each of them.

Functions Overview

compareModels Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists.
compareSequences
del2mask convert list of (from, to) delete positions into a mask of 0 or 1
expandRepeats Expand a text fragment within a larger string so that it includes any sequence repetitions to its right or left edge.
expandRepeatsLeft recursively identify sequence repeats on left edge of s[start:end]
expandRepeatsRight recursively identify sequence repeats on right edge of s[start:end]
getEqual Gets only the postions of the sequences that are equal according to the OpCodes.
getEqualLists Extract information about regions in the sequences that are equal.
getOpCodes Compares two sequences and returns a list with the information needed to convert the first one sequence into the second.
getSkipLists Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code).

Classes Overview

SequenceMatcher SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable.

match2seq Module Details

biskit.match2seq.getOpCodes(seq_1, seq_2)[source]

Compares two sequences and returns a list with the information needed to convert the first one sequence into the second.

Parameters:
  • seq_1 ([ str ]) – list of single letters
  • seq_2 ([ str ]) – list of single letters
Returns:

Optimization code from difflib:: [(‘delete’, 0, 1, 0, 0), (‘equal’, 1, 4, 0, 3),

(‘insert’, 4, 4, 3, 4), (‘equal’, 4, 180, 4, 180)]

Return type:

[tuples]

biskit.match2seq.getSkipLists(seqDiff)[source]

Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code). Returns deletion codes in the format (start_pos, length).

Parameters:seqDiff ([tuples]) – opcodes
Returns:Lists of tuples containing regions of the sequences that should be deteted. Example:
strucDel_1 = [(0, 1), (180, 4)]
strucDel_2 = [(3, 1), (207, 4)]
Return type:[tuple], [tuple]
biskit.match2seq.getEqualLists(seqDiff)[source]

Extract information about regions in the sequences that are equal. Returns deletion codes in the format (start_pos, length).

Parameters:seqDiff ([tuples]) – opcodes
Returns:Lists of tuples containing regions of the sequences that are equal. Example:
strucEqual_1 = [(0, 216)]
strucEqual_2 = [(0, 216)]
Return type:[tuple], [tuple]
biskit.match2seq.expandRepeatsLeft(s, start, end, length=1)[source]

recursively identify sequence repeats on left edge of s[start:end]

biskit.match2seq.expandRepeatsRight(s, start, end, length=1)[source]

recursively identify sequence repeats on right edge of s[start:end]

biskit.match2seq.expandRepeats(s, start, size)[source]

Expand a text fragment within a larger string so that it includes any sequence repetitions to its right or left edge.

Example

ABC[BC]CCCDE -> A[BCCCC]DE

The idea here is to avoid alignment missmatches due to duplications. The above to sequences could be aligned in several ways, for example:

A–BC—DE AB—-C-DE ABCBCCCCDE or ABCBCCCCDE

We don’t know for sure which positions should be kept and which positions should be deleted in the longer string. So the most conservative approach is to remove the whole ambiguous fragment.

Parameters:
  • s (str) – input string
  • start (int) – start position of text fragment
  • size (int) – size of text fragment
Returns:

start and size of expanded fragment

Return type:

(int, int)

biskit.match2seq.getEqual(seqAA, seqNr, equalList)[source]

Gets only the postions of the sequences that are equal according to the OpCodes. This should not be nessesary but might be usefull to skip ‘replace’ OpCode.

Parameters:
  • seqAA ([str]) – list with the amino acid sequence in one letter code
  • seqNr ([int]) – list with the amino acid postitons
  • equalList ([tuple], [tuple]) – Lists of tuples containing regions of the sequences that are equal
Returns:

lists of amino acids and positions where equal

Return type:

[str], [int]

biskit.match2seq.del2mask(seq, *delpos)[source]

convert list of (from, to) delete positions into a mask of 0 or 1

biskit.match2seq.compareSequences(seqAA_1, seqAA_2)[source]
biskit.match2seq.compareModels(model_1, model_2)[source]

Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists.

Parameters:
Returns:

tuple of atom masks for model_1 and model_2:: e.g. ( [0001011101111111], [1110000111110111] )

Return type:

([1|0…],[1|0…])