Coverage for lingpy/algorithm/cython/_malign.py : 96%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
""" This module provides various alignment functions in an optimized version. """
seqA, seqB, scorer, gap ): """ Align two sequences using the Needleman-Wunsch algorithm.
Parameters ---------- seqA, seqB : list The sequences to be aligned, passed as list. scorer : dict A dictionary containing tuples of two segments as key and numbers as values. gap : int The gap penalty.
Returns ------- alignment : tuple A of the two aligned sequences, and the similarity score.
Notes ----- This function is a very straightforward implementation of the Needleman-Wunsch algorithm (:evobib:`Needleman1970`). We recommend to use the function if you want to test your own scoring dictionaries and profit from a fast implementation (as we use Cython, the implementation is indeed faster than pure Python implementations, as long as you use Python 3 and have Cython installed). If you want to test the NW algorithm without specifying a scoring dictionary, we recommend to have a look at our wrapper function with the same name in the :py:class:`~lingpy.align.pairwise` module.
"""
# get the lengths of the strings
# define general and specific integers # [autouncomment] cdef int i,j # [autouncomment] cdef int sim # stores the similarity score
# define values for the main loop # [autouncomment] cdef int gapA,gapB,match,penalty # for the loop
# define values for the traceback
# create matrix and traceback
# initialize matrix and traceback
# start the main loop
# get the penalty
# get the three scores
# evaluate the scores else:
# get the similarity
else:
seqA, seqB, normalized ): """ Return the edit-distance between two strings.
Parameters ---------- seqA, seqB : list The sequences to be aligned, passed as list. normalized : bool Indicate whether you want the normalized or the unnormalized edit distance to be returned.
Note ---- This function computes the edit distance between two type objects. We recommend to use it if you need a fast implementation. Otherwise, especially, if you want to pass strings, we recommend to have a look at the wrapper function with the same name in the :py:class:`~lingpy.align.pairwise` module.
Returns ------- dist : { int, } Either the normalized or the unnormalized edit distance.
"""
# [autouncomment] cdef int gapA,gapB,match # [autouncomment] cdef int i,j,sim # [autouncomment] cdef float dist
else:
else:
seqA, seqB, scorer, gap ): """ Align two sequences using the Smith-Waterman algorithm.
Parameters ---------- seqA, seqB : list The sequences to be aligned, passed as list. scorer : dict A dictionary containing tuples of two segments as key and numbers as values. gap : int The gap penalty.
Returns ------- alignment : tuple A of the two aligned sequences, and the similarity score.
Notes ----- This function is a very straightforward implementation of the Smith-Waterman algorithm (:evobib:`Smith1981`). We recommend to use the function if you want to test your own scoring dictionaries and profit from a fast implementation (as we use Cython, the implementation is indeed faster than pure Python implementations, as long as you use Python 3 and have Cython installed). If you want to test the SW algorithm without specifying a scoring dictionary, we recommend to have a look at our wrapper function with the same name in the :py:class:`~lingpy.align.pairwise` module.
""" # basic stuff # [autouncomment] cdef int i,j # [autouncomment] cdef float gapA,gapB
# get the lengths of the strings
# [autouncomment] cdef str s
# define values for the main loop
# define values for the traceback
# create matrix and traceback
# start the main loop
# get the penalty
# get the three scores
# evaluate the scores else:
# check for maximal score
# get the similarity
# start the traceback
else:
# return the alignment as a of prefix, alignment, and suffix ( almA[0:j], almA[j:jmax+jgap], almA[jmax+jgap:] ), ( almB[0:i], almB[i:imax+igap], almB[imax+igap:] ), sim )
seqA, seqB, scorer, gap ): """ Align two sequences using the Waterman-Eggert algorithm.
Parameters ---------- seqA, seqB : list The input sequences passed as a list. scorer : dict A dictionary containing tuples of two segments as key and numbers as values. gap : The gap penalty.
Notes ----- This function is a very straightforward implementation of the Waterman-Eggert algorithm (:evobib:`Waterman1987`). We recommend to use the function if you want to test your own scoring dictionaries and profit from a fast implementation (as we use Cython, the implementation is indeed faster than pure Python implementations, as long as you use Python 3 and have Cython installed). If you want to test the WE algorithm without specifying a scoring dictionary, we recommend to have a look at our wrapper function with the same name in the :py:class:`~lingpy.align.pairwise` module.
Returns ------- alignments : list A consisting of tuples. Each tuple gives the alignment of one of the subsequences of the input sequences. Each contains the aligned part of the first, the aligned part of the second sequence, and the score of the alignment.
""" # basic defs # [autouncomment] cdef int lenA,lenB,i,j,null,igap,jgap # [autouncomment] cdef float sim,gapA,gapB,match,max_score # [autouncomment] cdef str gap_char # [autouncomment] cdef list matrix,traceback,tracer,seqA_tokens,seqB_tokens,almA,almB
# get the lengths of the strings
# define values for the main loop
# define values for the traceback
# create a tracer for positions in the matrix
# create matrix and traceback
# start the main loop
# add zero to the tracer
# get the penalty
# get the three scores
# evaluate the scores else:
# assign the value to the tracer
# make of alignments
# start the while loop
# get the maximal value
# if max_val is zero, break
# get the index of the maximal value of the matrix
# convert to matrix coordinates
# store in imax and jmax
# start the traceback
# make values for almA and almB
#tracer[i * (lenA+1) + j] = 0 # set tracer to zero #tracer[i * (lenA+1) + j] = 0 # set tracer to zero almB.insert(i,gap_char) #tracer[i * (lenA+1) + j] = 0 # set tracer to zero igap += 1 else:
# store values
# change values to 0 in the tracer
# retrieve the aligned parts of the sequences
# return the alignment as a of prefix, alignment, and suffix
seqA, seqB, restricted_char = '' ): """ Carry out a structural alignment analysis using Dijkstra's algorithm.
Parameters ---------- seqA,seqB : str The input sequences. restricted_chars : (default = "") The characters which are used to separate secondary from primary segments in the input sequences. Currently, the use of restricted chars may fail to yield an alignment.
Notes ----- Structural alignment is hereby understood as an alignment of two sequences whose alphabets differ. The algorithm returns all alignments with minimal edit distance. Edit distance in this context refers to the number of edit operations that are needed in order to convert one sequence into the other, with repeated edit operations being penalized only once. """ # get basic variables # [autouncomment] cdef int maxScore,thisScore,newScore,fullScore # [autouncomment] cdef list out,queue,alm # [autouncomment] cdef str restA,restB # [autouncomment] cdef tuple residues
# get the max score
# set up the queue [ [], 0, seqA, seqB ] ]
# while loop
# get the first element of the queue
# start adding match else: else:
# check for better score
#
# start adding gap else: else:
# check for better score maxScore = fullScore
# add gap in a pass else: else:
# check for better score maxScore = fullScore
seqA, seqB, resA, resB, normalized ): r""" Return the restricted edit-distance between two strings.
Parameters ---------- seqA, seqB : list The two sequences passed as list. resA, resB : str The restrictions passed as a string with the same length as the corresponding sequence. We note a restriction if the strings show different symbols in their restriction string. If the symbols are identical, it is modeled as a non-restriction. normalized : bool Determine whether you want to return the normalized or the unnormalized edit distance.
Notes ----- Restrictions follow the definition of :evobib:`Heeringa2006`: Segments that are not allowed to match are given a penalty of :math:`\infty`. We model restrictions as strings, for example consisting of letters "c" and "v". So the sequence "woldemort" could be modeled as "cvccvcvcc", and when aligning it with the sequence "walter" and its restriction string "cvccvc", the matching of those segments in the sequences in which the segments of the restriction string differ, would be heavily penalized, thus prohibiting an alignment of "vowels" and "consonants" ("v" and "c"). """
# [autouncomment] cdef int gapA,gapB,match # [autouncomment] cdef int i,j,sim # [autouncomment] cdef float dist
# define alignments
# create matrix and traceback
else:
else:
almA += ['-'] almB += [seqB[i-1]] i -= 1 else: almA += [seqA[j-1]] almB += ['-'] j -= 1
|