Aligner

This module implements the aligner.

class Data(score: float)

Private data class for the Needleman-Wunsch+Gotoh sequence aligner.

__init__(score: float)
score: float

The current score.

p: float

\(P_{m,n}\) in [Gotoh1982].

q: float

\(Q_{m,n}\) in [Gotoh1982].

pSize: int

The size of the p gap. \(k\) in [Gotoh1982].

qSize: int

The size of the q gap. \(k\) in [Gotoh1982].

class Aligner(start_score: float = - 1.0, open_score: float = - 1.0, extend_score: float = - 0.5)

A generic Needleman-Wunsch+Gotoh sequence aligner.

This implementation uses Gotoh’s improvements to get \(\mathcal{O}(mn)\) running time and reduce memory requirements to essentially the backtracking matrix only. In Gotoh’s technique the gap weight formula must be of the special form \(w_k = uk + v\) (affine gap). \(k\) is the gap size, \(v\) is the gap opening score and \(u\) the gap extension score.

The aligner is type-agnostic and expects only to call the method Strategy.similarity() on the given strategy.

__init__(start_score: float = - 1.0, open_score: float = - 1.0, extend_score: float = - 0.5)
start_score: float

The gap opening score at the start of the string. Set this to 0 to find local alignments.

open_score: float

The gap opening score \(v\).

extend_score: float

The gap extension score \(u\).

align(strategy: super_collator.strategy.Strategy[super_collator.token.TT], tokens_a: Sequence[super_collator.token.Token[super_collator.token.TT]], tokens_b: Sequence[super_collator.token.Token[super_collator.token.TT]]) Tuple[Sequence[super_collator.token.Token[super_collator.token.TT]], float]

Align two sequences.

Returns

the aligned sequence (of MultiTokens) and the score

build_debug_matrix(matrix: List[List[super_collator.aligner.Data]], len_matrix: List[List[int]], ts_a: Sequence[super_collator.token.Token[super_collator.token.TT]], ts_b: Sequence[super_collator.token.Token[super_collator.token.TT]]) str

Build a human-readable debug matrix.

Parameters
  • matrix – the full scoring matrix

  • len_matrix – the backtracking matrix

  • ts_a – the first aligned string

  • ts_b – the second aligned string

Return str

the debug matrix as human readable string