match2seq¶
Match 2 sequences against each other, deleting all positions that differ. compareStructures() compares sequences of 2 structures and returns a residue mask for each of them.
Functions Overview
compareModels |
Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists. |
compareSequences |
|
del2mask |
convert list of (from, to) delete positions into a mask of 0 or 1 |
expandRepeats |
Expand a text fragment within a larger string so that it includes any sequence repetitions to its right or left edge. |
expandRepeatsLeft |
recursively identify sequence repeats on left edge of s[start:end] |
expandRepeatsRight |
recursively identify sequence repeats on right edge of s[start:end] |
getEqual |
Gets only the postions of the sequences that are equal according to the OpCodes. |
getEqualLists |
Extract information about regions in the sequences that are equal. |
getOpCodes |
Compares two sequences and returns a list with the information needed to convert the first one sequence into the second. |
getSkipLists |
Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code). |
Classes Overview
SequenceMatcher |
SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. |
match2seq Module Details
-
biskit.match2seq.
getOpCodes
(seq_1, seq_2)[source]¶ Compares two sequences and returns a list with the information needed to convert the first one sequence into the second.
Parameters: - seq_1 ([ str ]) – list of single letters
- seq_2 ([ str ]) – list of single letters
Returns: Optimization code from difflib:: [(‘delete’, 0, 1, 0, 0), (‘equal’, 1, 4, 0, 3),
(‘insert’, 4, 4, 3, 4), (‘equal’, 4, 180, 4, 180)]
Return type: [tuples]
-
biskit.match2seq.
getSkipLists
(seqDiff)[source]¶ Extracts information about what residues that have to be removed from sequence 1 (delete code) and sequence 2 (insert code). Returns deletion codes in the format (start_pos, length).
Parameters: seqDiff ([tuples]) – opcodes Returns: Lists of tuples containing regions of the sequences that should be deteted. Example: strucDel_1 = [(0, 1), (180, 4)] strucDel_2 = [(3, 1), (207, 4)]
Return type: [tuple], [tuple]
-
biskit.match2seq.
getEqualLists
(seqDiff)[source]¶ Extract information about regions in the sequences that are equal. Returns deletion codes in the format (start_pos, length).
Parameters: seqDiff ([tuples]) – opcodes Returns: Lists of tuples containing regions of the sequences that are equal. Example: strucEqual_1 = [(0, 216)] strucEqual_2 = [(0, 216)]
Return type: [tuple], [tuple]
-
biskit.match2seq.
expandRepeatsLeft
(s, start, end, length=1)[source]¶ recursively identify sequence repeats on left edge of s[start:end]
-
biskit.match2seq.
expandRepeatsRight
(s, start, end, length=1)[source]¶ recursively identify sequence repeats on right edge of s[start:end]
-
biskit.match2seq.
expandRepeats
(s, start, size)[source]¶ Expand a text fragment within a larger string so that it includes any sequence repetitions to its right or left edge.
Example
ABC[BC]CCCDE -> A[BCCCC]DE
The idea here is to avoid alignment missmatches due to duplications. The above to sequences could be aligned in several ways, for example:
A–BC—DE AB—-C-DE ABCBCCCCDE or ABCBCCCCDEWe don’t know for sure which positions should be kept and which positions should be deleted in the longer string. So the most conservative approach is to remove the whole ambiguous fragment.
Parameters: - s (str) – input string
- start (int) – start position of text fragment
- size (int) – size of text fragment
Returns: start and size of expanded fragment
Return type: (int, int)
-
biskit.match2seq.
getEqual
(seqAA, seqNr, equalList)[source]¶ Gets only the postions of the sequences that are equal according to the OpCodes. This should not be nessesary but might be usefull to skip ‘replace’ OpCode.
Parameters: - seqAA ([str]) – list with the amino acid sequence in one letter code
- seqNr ([int]) – list with the amino acid postitons
- equalList ([tuple], [tuple]) – Lists of tuples containing regions of the sequences that are equal
Returns: lists of amino acids and positions where equal
Return type: [str], [int]
-
biskit.match2seq.
del2mask
(seq, *delpos)[source]¶ convert list of (from, to) delete positions into a mask of 0 or 1
-
biskit.match2seq.
compareModels
(model_1, model_2)[source]¶ Initiates comparison of the sequences of two structure objects and returns two equal sequence lists (new_seqAA_1 and new_seqAA_2 should be identical) and the corresponding residue position lists.
Parameters: Returns: tuple of atom masks for model_1 and model_2:: e.g. ( [0001011101111111], [1110000111110111] )
Return type: ([1|0…],[1|0…])