pydna package

Submodules

pydna.amplify module

This module provides functions for PCR. Primers with 5’ tails as well as inverse PCR on circular templates are handled correctly.

class pydna.amplify.Amplicon(record, template=None, forward_primer=None, reverse_primer=None, saltc=None, forward_primer_concentration=None, reverse_primer_concentration=None, *args, **kwargs)[source]

Bases: pydna.dsdna.Dseqrecord

The Amplicon class holds information about a PCR reaction involving two primers and one template. This class is used by the Anneal class and is not meant to be instantiated directly.

Parameters:

forward_primer : SeqRecord(Biopython)

SeqRecord object holding the forward (sense) primer

reverse_primer : SeqRecord(Biopython)

SeqRecord object holding the reverse (antisense) primer

template : Dseqrecord

Dseqrecord object holding the template (circular or linear)

saltc : float, optional

saltc = monovalent cations (mM) (Na,K..) default value is 50mM This is used for Tm calculations.

forward_primer_concentration : float, optional

primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.

rc : float, optional

primer concentration (nM) default set to 1000nM = 1µM This is used for Tm calculations.

figure()[source]

This method returns a simple figure of the two primers binding to a part of the template.

5gctactacacacgtactgactg3
 |||||||||||||||||||||| tm 52.6 (dbd) 58.3
5gctactacacacgtactgactg...caagatagagtcagtaaccaca3
3cgatgatgtgtgcatgactgac...gttctatctcagtcattggtgt5
                          |||||||||||||||||||||| tm 49.1 (dbd) 57.7
                         3gttctatctcagtcattggtgt5
Returns:

figure:string :

A string containing a text representation of the primers annealing on the template (see example above).

Notes

tm in the figure above is the melting temperature (tm) for each primer calculated according to SantaLucia 1998 [1].

dbd is the tm calculation for enzymes with dsDNA binding domains like Pfu-Sso7d [2]. See [3] for more information.

References

[1]
  1. SantaLucia, “A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-neighbor Thermodynamics,” Proceedings of the National Academy of Sciences 95, no. 4 (1998): 1460.
[2]
  1. Nørholm, “A Mutant Pfu DNA Polymerprimerase Designed for Advanced Uracil-excision DNA Engineering,” BMC Biotechnology 10, no. 1 (2010): 21, doi:10.1186/1472-6750-10-21.
[3]http://www.thermoscientificbio.com/webtools/tmc/
flankdn(flankdnlength=50)[source]

Returns a Dseqrecord object containing flankdnlength bases downstream of the reverse primer footprint. Truncated if the template is not long enough.

                               <---- flankdn ------>

                3actactgactatctTAATAA5
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcgtacatgtactatcgtat
flankup(flankuplength=50)[source]

Returns a Dseqrecord object containing flankuplength bases upstream of the forward primer footprint, Truncated if the template is not long enough.

<--- flankup --->

          5TAATAAactactgactatct3
                 ||||||||||||||
acgcattcagctactgtactactgactatctatcg
program()[source]

Returns a string containing a text representation of two proposed PCR programs. The first program is adapted for Taq poymerase, while the second is adapted for Pfu-Sso7d or other polymerases with DNA binding domain.

Taq (rate 30 nt/s)
Three-step|         30 cycles     |      |SantaLucia 1998
94.0°C    |94.0°C                 |      |SaltC 50mM
__________|_____          72.0°C  |72.0°C|
04min00s  |30s  \         ________|______|
          |      \ 46.0°C/ 0min 1s|10min |
          |       \_____/         |      |
          |         30s           |      |4-8°C

Pfu-Sso7d (rate 15s/kb)
Three-step|          30 cycles   |      |Breslauer1986,SantaLucia1998
98.0°C    |98.0°C                |      |SaltC 50mM
__________|_____          72.0°C |72.0°C|Primer1C   1µM
00min30s  |10s  \ 61.0°C ________|______|Primer2C   1µM
          |      \______/ 0min 0s|10min |
          |        10s           |      |4-8°C
class pydna.amplify.Anneal(primers, template, limit=13)[source]

Bases: object

Parameters:

primers : iterable containing SeqRecord objects

Primer sequences 5’-3’.

template : Dseqrecord object

The template sequence 5’-3’.

limit : int, optional

limit length of the annealing part of the primers.

Examples

>>> import pydna
>>> template = pydna.Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = pydna.read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = pydna.read(">p2\ngtgctatcagatgatacagtcg", ds = False)
>>> ann = pydna.Anneal((p1,p2), template)
>>> print ann.report()
Template <unknown name> 51 nt linear:
Primer p1 anneals forward at position 23
Primer p2 anneals reverse at position 29
>>> ann.products
[Amplicon(51)]
>>> amplicon_list = ann.products
>>> amplicon = amplicon_list.pop()
>>> amplicon
Amplicon(51)
>>> print amplicon.figure()
5tacactcaccgtctatcattatc...cgactgtatcatctgatagcac3
                           |||||||||||||||||||||| tm 50.6 (dbd) 60.5
                          3gctgacatagtagactatcgtg5
5tacactcaccgtctatcattatc3
 ||||||||||||||||||||||| tm 49.4 (dbd) 58.8
3atgtgagtggcagatagtaatag...gctgacatagtagactatcgtg5
>>> amplicon.annotations['date'] = '02-FEB-2013'   # Set the date for this example to pass the doctest
>>> print amplicon
Dseqrecord
circular: False
size: 51
ID: 51bp U96+TO06Y6pFs74SQx8M1IVTBiY
Name: 51bp_PCR_prod
Description: Product_p1_p2
Number of features: 2
/date=02-FEB-2013
Dseq(-51)
taca..gcac
atgt..cgtg
>>> print amplicon.program()

Taq (rate 30 nt/s)
Three-step|         30 cycles     |      |SantaLucia 1998
94.0°C    |94.0°C                 |      |SaltC 50mM
__________|_____          72.0°C  |72.0°C|
04min00s  |30s  \         ________|______|
          |      \ 44.0°C/ 0min 1s|10min |
          |       \_____/         |      |
          |         30s           |      |4-8°C

Pfu-Sso7d (rate 15s/kb)
Three-step|          30 cycles   |      |Breslauer1986,SantaLucia1998
98.0°C    |98.0°C                |      |SaltC 50mM
__________|_____          72.0°C |72.0°C|Primer1C   1µM
00min30s  |10s  \ 62.0°C ________|______|Primer2C   1µM
          |      \______/ 0min 0s|10min |
          |        10s           |      |4-8°C

>>> 

Attributes

products: list A list of Amplicon objects, one for each primer pair that may form a PCR product.
products[source]
report()[source]

This method is an alias of str(Annealobj). Returns a short string representation.

class pydna.amplify.Primer(seq_obj, position, footprint, tail)[source]

Bases: Bio.SeqRecord.SeqRecord

This class holds information about a primer on a template

pydna.amplify.basictm(primer, *args, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the basic formula.

Tm = (wA+xT)*2 + (yG+zC)*4 assumed 50mM monovalent cations

w = number of A in primer
x = number of T in primer
y = number of G in primer
z = number of C in primer
Parameters:

primer : string

Primer sequence 5’-3’

Returns:

tm : int

Examples

>>> from pydna.amplify import basictm
>>> basictm("ggatcc")
20
>>>
pydna.amplify.pcr(*args, **kwargs)[source]

pcr is a convenience function for Anneal to simplify its usage, especially from the command line. If more than one PCR product is formed, an exception is raised.

args is any iterable of sequences or an iterable of iterables of sequences. args will be greedily flattened.

Parameters:

args : iterable containing sequence objects

Several arguments are also accepted.

limit : int = 13, optional

limit length of the annealing part of the primers.

Returns:

product : Dseqrecord

a Dseqrecord object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.

Notes

sequences in args could be of type:

  • string
  • Seq
  • SeqRecord
  • Dseqrecord

The last sequence will be interpreted as the template all preceeding sequences as primers.

This is a powerful function, use with care!

Examples

>>> import pydna
>>> template = pydna.Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = pydna.read(">p1\ntacactcaccgtctatcattatc", ds = False)
>>> p2 = pydna.read(">p2\ncgactgtatcatctgatagcac", ds = False).reverse_complement()
>>> pydna.pcr(p1, p2, template)
Amplicon(51)
>>> pydna.pcr([p1, p2], template)
Amplicon(51)
>>> pydna.pcr((p1,p2,), template)
Amplicon(51)
>>>
pydna.amplify.tmbreslauer86(primer, dnac=500.0, saltc=50, thermodynamics=False)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from Breslauer 1986. These data are no longer widely used.

Breslauer 1986, table 2 [4]

pair dH dS dG
AA/TT 9.1 24.0 1.9
AT/TA 8.6 23.9 1.5
TA/AT 6.0 16.9 0.9
CA/GT 5.8 12.9 1.9
GT/CA 6.5 17.3 1.3
CT/GA 7.8 20.8 1.6
GA/CT 5.6 13.5 1.6
CG/GC 11.9 27.8 3.6
GC/CG 11.1 26.7 3.1
GG/CC 11.0 26.6 3.1
Parameters:

primer : string

Primer sequence 5’-3’ in UPPERCASE

Returns:

tm : float

References

[4]K.J. Breslauer et al., “Predicting DNA Duplex Stability from the Base Sequence,” Proceedings of the National Academy of Sciences 83, no. 11 (1986): 3746.

Examples

>>> from pydna.amplify import tmbreslauer86
>>> tmbreslauer86("ACGTCATCGACACTATCATCGAC")
64.28863985851899
pydna.amplify.tmbresluc(primer, primerc=500.0, saltc=50, thermodynamics=False)[source]

Returns the tm for a primer using a formula adapted to polymerases with a DNA binding domain.

Parameters:

primer : string

primer sequence 5’-3’

primerc : float

concentration (nM)

saltc : float, optional

Monovalent cation concentration (mM)

thermodynamics : bool, optional

prints details of the thermodynamic data to stdout. For debugging only.

Returns:

tm : float

the tm of the primer

tm,dH,dS : tuple

tm and dH and dS used for the calculation

pydna.amplify.tmstaluc98(primer, dnac=50, saltc=50, **kwargs)[source]

Returns the melting temperature (Tm) of the primer using the nearest neighbour algorithm. Formula and thermodynamic data is taken from SantaLucia 1998 [1]. This implementation gives the same answer as the one provided by Biopython (See Examples).

Thermodynamic data used:

pair dH dS
AA/TT 7.9 22.2
AT/TA 7.2 20.4
TA/AT 7.2 21.3
CA/GT 8.5 22.7
GT/CA 8.4 22.4
CT/GA 7.8 21.0
GA/CT 8.2 22.2
CG/GC 10.6 27.2
GC/CG 9.8 24.4
GG/CC 8.0 19.9
Parameters:

primer : string

Primer sequence 5’-3’ in UPPERCASE

Returns:

tm : float

tm of the primer

Examples

>>> from pydna.amplify import tmstaluc98
>>> from Bio.SeqUtils.MeltingTemp import Tm_staluc
>>> tmstaluc98("ACGTCATCGACACTATCATCGAC")
54.55597724052518
>>> Tm_staluc("ACGTCATCGACACTATCATCGAC")
54.555977240525124
>>>

pydna.assembly module

This module provides functions for assembly of sequences by homologous recombination and other related techniques. Given a list of sequences (Dseqrecords), all sequences will be analyzed for overlapping regions of DNA (common substrings).

The assembly algorithm is based on graph theory where each overlapping region forms a node and sequences separating the overlapping regions form edges.

class pydna.assembly.Assembly(dsrecs, limit=25)[source]

Bases: object

Assembly of a list of linear DNA fragments into linear or circular constructs. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

Parameters:

dsrecs : list

a list of Dseqrecord objects.

Examples

>>> from pydna import Assembly, Dseqrecord
>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
>>> x = Assembly((a,b,c), limit=14)
>>> x.analyze_overlaps()
'6 sequences analyzed of which 3 have shared homologies with totally 3 overlaps'
>>> x.create_graph()
"A graph with 5', 3' and 3 internal nodes was created"
>>> x.assemble_circular_from_graph()
'1 circular products were formed'
>>> x
Assembly object:
Sequences........................: 33, 34, 35
Sequences with shared homologies.: 33, 34, 35
Homology limit (bp)..............: 14
Number of overlaps...............: 3
Nodes in graph...................: 5
Assembly protocol................: overlaps
Circular products................: 59
Linear products..................: No linear products
>>> x.circular_products
[Dseqrecord(o59)]
>>> x.circular_products[0].seq.watson
'CCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGtacgatgctatactg'
analyze_overlaps(limit=None)[source]

This method carries out a pairwise analysis for common sub strings (shared homologous sequences) among DNA fragments stored in the dsrecs property.

  • Common sub strings need to be equal to or longer than limit.
  • Common sub strings can be present anywhere within the sequences stored in the dsrecs property.

The analyzed_dsrec property will contain copies of the sequences in the dsrecs property that have common substrings. Common sub strings are added as features to the sequences in the analyzed_dsrec property.

Parameters:

limit : int

The shortest shared sub sequence to be considered. This will also set the limit property.

analyze_terminal_overlaps(limit=None)[source]

This method carries out a pairwise analysis for shared sub sequences equal to or longer than limit. The difference between this method and analyze_overlaps() is that shared sequences must start or end at the extremities of the source fragments.

Parameters:

limit : int

The shortest shared sub sequence to be considered. This will also set the limit property.

assemble_circular_from_graph()[source]

Assembles all circular products from the graph created by create_graph()

assemble_fusion_pcr_linear(limit=None)[source]

Convenience method to assembles all possible linear fusion PCR products. This method is a synonym of assemble_gibson_linear()

assemble_gibson_circular(limit=None)[source]

Convenience method to assembles all possible circular Gibson assembly products expected from in-vivo homologous recombination by the following steps:

  1. Analysis of source fragments using the analyze_terminal_overlaps() method
  2. Construction of a graph using the create_graph() method
  3. Assembly of possible circular products using the assemble_circular_from_graph() method

The property “circular_products” is initiated with a list of Fragment objects.

assemble_gibson_linear(limit=None)[source]

Convenience method to assembles all possible linear Gibson assembly products expected from in-vivo homologous recombination by the following steps:

  1. Analysis of source fragments using the analyze_terminal_overlaps() method
  2. Construction of a graph using the create_graph() method
  3. Assembly of possible circular products using the assemble_linear_from_graph() method

The property “linear_products” is initiated with a list of Fragment objects.

assemble_golden_gate(limit=None)[source]

Not implemented

assemble_hr_circular(limit=None)[source]

Convenience method to assembles all possible circular homologous recombination (hr) products expected from in-vivo homologous recombination by the following steps:

  1. Analysis of source fragments using the analyze_overlaps() method
  2. Construction of a graph using the create_graph() method
  3. Assembly of possible circular products using the assemble_circular_from_graph() method

The property “circular_products” is initiated with a list of Fragment objects.

assemble_hr_linear(limit=None)[source]

Convenience method to assembles all possible linear homologous recombination (hr) products expected from in-vivo homologous recombination by the following steps:

  1. Analysis of source fragments using the analyze_overlaps() method
  2. Construction of a graph using the create_graph() method
  3. Assembly of possible linear products using the assemble_linear_from_graph() method

The property “linear_products” is initiated with a list of Fragment objects.

assemble_linear_from_graph()[source]

Assembles all linear products from the graph created by create_graph()

assemble_slic(limit=None)[source]

Not implemented

create_graph()[source]

Creates a graph from analyzed source fragments. Nodes in the graph are shared sub sequences found as features in the source sequences. Edges are the sequences that connect the shared sub sequences in each source fragment. Two additional nodes called 5’ and 3’ are also added to represent DNA ends.

dsrecs = None

Sequences fed to this class is stored in this property

limit = None

The shortest common sub strings to be considered

max_nodes = None

The max number of nodes allowed. This can be reset to some other value

class pydna.assembly.Contig(record, source_fragments=, []*args, **kwargs)[source]

Bases: pydna.dsdna.Dseqrecord

This class holds information about a DNA assembly. This class is instantiated by the Assembly class and is not meant to be instantiated directly.

detailed_fig()[source]
detailed_figure()[source]

Synonym of detailed_fig()

small_fig()[source]

Returns a small ascii representation of the assembled fragments. Each fragment is represented by:

Size of common 5' substring|Name and size of DNA fragment| Size of common 5' substring

Linear:

frag20| 6
       \/
       /\
        6|frag23| 6
                 \/
                 /\
                  6|frag14

Circular:

 -|2577|61
|       \/
|       /\
|       61|5681|98
|               \/
|               /\
|               98|2389|557
|                       \/
|                       /\
|                       557-
|                          |
 --------------------------
small_figure()[source]

Synonym of small_fig()

class pydna.assembly.Fragment(record, start1=0, end1=0, start2=0, end2=0, alignment=0, i=0, *args, **kwargs)[source]

Bases: pydna.dsdna.Dseqrecord

This class holds information about a DNA fragment in an assembly. This class is instantiated by the Assembly class and is not meant to be instantiated directly.

pydna.download module

Provides a class for downloading sequences from genbank.

class pydna.download.Genbank(users_email, proxy=None, tool='pydna')[source]

Class to facilitate download from genbank.

Parameters:

users_email : string

Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.

proxy : string, optional

String containing a proxy url: “proxy = “http://umiho.proxy.com:3128

tool : string, optional

Default is “pydna”. This is to tell Genbank which tool you are using.

Examples

>>> import pydna 
>>> gb=pydna.Genbank("me@mail.se", proxy = "http://proxy.com:3128")                  
>>> rec = gb.nucleotide("L09137") # <- pUC19 from genbank                            
>>> print len(rec)                                                                   
2686                                                                                 
nucleotide(item, start=None, stop=None, strand='watson')[source]

Download a genbank nuclotide record.

Item is a string containing one genbank acession number [5] for a nucleotide file. Start and stop are intervals to be downloaded. This is useful as some genbank records are large. If strand is “c”, “C”, “crick”, “Crick”, “antisense”,”Antisense”, “2” or 2, the watson(antisense) strand is returned, otherwise the sense strand is returned.

Alternatively, item can be a string containing an url that returns a sequence in genbank or FASTA format.

The genbank E-utilities can provide such urls [6].

Result is returned as a Dseqrecord object.

Genbank nucleotide accession numbers have this format:

A12345 = 1 letter + 5 numerals
AB123456 = 2 letters + 6 numerals

References

[5]http://www.dsimb.inserm.fr/~fuchs/M2BI/AnalSeq/Annexes/Sequences/Accession_Numbers.htm
[6]http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch
test()[source]

Test downloading the pUC19 plasmid sequence from genbank returns True if successful. Can be used to test proxy and other network settings.

pydna.dsdna module

Provides two classes, Dseq and Dseqrecord, for handling double stranded DNA sequences. Dseq and Dseqrecord are subclasses of Biopythons Seq and SeqRecord classes, respectively. These classes support the notion of circular and linear DNA.

class pydna.dsdna.Dseq(watson, crick=None, ovhg=None, linear=None, circular=None, alphabet=IUPACAmbiguousDNA())[source]

Bases: Bio.Seq.Seq

Dseq is a class designed to hold information for a double stranded DNA fragment. Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Parameters:

watson : str

a string representing the watson (sense) DNA strand.

crick : str, optional

a string representing the crick (antisense) DNA strand.

ovhg : int, optional

A positive or negative number to describe the stagger between the watson and crick strands. see below for a detailed explanation.

linear : bool, optional

True indicates that sequence is linear, False that it is circular.

circular : bool, optional

True indicates that sequence is circular, False that it is linear.

alphabet : Bio.Alphabet, optional

Bio.Alphabet.IUPAC.IUPACAmbiguousDNA from the Biopython package is set as default.

Examples

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see pydna.dsdna.Dseqrecord).

There are three ways of creating a Dseq object directly:

Only one argument (string):

>>> import pydna
>>> pydna.Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> import pydna
>>> pydna.Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this! For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter controls the stagger at the five prime end:

ovhg=-2

XXXXX
  XXXXX

ovhg=-1

XXXXX
 XXXXX

ovhg=0

XXXXX
XXXXX

ovhg=1

 XXXXX
XXXXX

ovhg=2

  XXXXX
XXXXX

Example of creating Dseq objects with different amounts of stagger:

>>> pydna.Dseq(watson="agt",crick="actta",ovhg=-2)
Dseq(-7)
agt
  attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> pydna.Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

If the ovhg parameter is psecified a crick strand also needs to be supplied, otherwise an exception is raised.

>>> pydna.Dseq(watson="agt",ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
Exception: ovhg defined without crick strand!

The default alphabet is set to Biopython IUPACAmbiguousDNA

The shape of the fragment is set by either:

  1. linear = False, True
  2. circular = True, False

Note that both ends of the DNA fragment has to be compatible to set circular = True (or linear = False).

>>> pydna.Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> pydna.Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> pydna.Dseq("aaa", "ttt", linear = False ,ovhg=0)
Dseq(o3)
aaa
ttt
>>> pydna.Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt

Coercing to string

>>> a=pydna.Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> str(a)
'ggggtttcccc'

The double stranded part is accessible through the dsdata property:

>>> a.dsdata
'ttt'

A Dseq object can be longer that either the watson or crick strands.

<-- length -->
GATCCTTT
     AAAGCCTAG

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> import pydna
>>> d=pydna.Dseq(s, linear=True)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=pydna.Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=pydna.Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>
T4(nucleotides=None)[source]

Fill in of five prime protruding ends and chewing back of three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Default are all four nucleotides together.

Alias for the t4() method

Parameters:nucleotides : str

Examples

>>> import pydna
>>> a=pydna.Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>
circular[source]

The circular property

cut(*enzymes)[source]

Returns a list of linear Dseq fragments produced in the digestion. If there are no cuts, an empty list is returned.

Parameters:

enzymes : enzyme object or iterable of such objects

A Bio.Restriction.XXX restriction objects or iterable.

Returns:

frags : list

list of Dseq objects formed by the digestion

Examples

>>> from pydna import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<type 'list'>
>>> for frag in seq.cut(BamHI):
...     print frag.fig()
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>
fig()[source]

Returns a representation of the sequence, truncated if longer than 30 bp:

Examples

>>> import pydna
>>> a=pydna.Dseq("atcgcttactagcgtactgatcatctgact")
>>> a
Dseq(-30)
atcgcttactagcgtactgatcatctgact
tagcgaatgatcgcatgactagtagactga
>>> a+=Dseq("A")
>>> a
Dseq(-31)
atcg..actA
tagc..tgaT
fill_in(nucleotides=None)[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow [7]) and any combination of A, G, C or T. Default are all four nucleotides together.

Parameters:nucleotides : str

References

[7]http://en.wikipedia.org/wiki/Klenow_fragment#The_exo-_Klenow_fragment

Examples

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=pydna.Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> c=pydna.Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>
find(sub, start=0, end=9223372036854775807)[source]

Find method, like that of a python string.

This behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Returns -1 if the subsequence is NOT found.

Parameters:

sub : string or Seq object

a string or another Seq object to look for.

start : int, optional

slice start.

end : int, optional

slice end.

Examples

>>> import pydna
>>> seq = Dseq("atcgactgacgtgtt")
>>> seq
Dseq(-15)
atcgactgacgtgtt
tagctgactgcacaa
>>> seq.find("gac")
3
>>> seq = pydna.Dseq(watson="agt",crick="actta",ovhg=-2)
>>> seq
Dseq(-7)
agt
  attca
>>> seq.find("taa")
2
five_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

Examples

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>
linear[source]

The linear property

looped()[source]

Returns a circularized Dseq object. This can only be done if the two ends are compatible, otherwise a TypeError is raised.

Examples

>>> import pydna
>>> a=pydna.Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>
mung()[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease [8])

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> import pydna
>>> b=pydna.Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=pydna.Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt

References

[8]http://en.wikipedia.org/wiki/Mung_bean_nuclease
ovhg[source]

The ovhg property

rc()[source]

Alias of the reverse_complement method

reverse_complement()[source]

Returns a Dseq object where watson and crick have switched places.

Examples

>>> import pydna
>>> a=pydna.Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>
t4(*args, **kwargs)[source]

Alias for the T4() method

three_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> import pydna
>>> a=pydna.Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=pydna.Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>
tolinear()[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

Examples

>>> import pydna
>>> a=pydna.Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a.tolinear()
Dseq(-8)
catcgatc
gtagctag
>>>
class pydna.dsdna.Dseqrecord(record, circular=None, linear=None, n=1e-11, *args, **kwargs)[source]

Bases: Bio.SeqRecord.SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord [9] class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does. The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases. Dseqrecord objects can be read or parsed from sequences in Fasta, Embl or Genbank format.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length

<-- length -->
GATCCTTT
     AAAGCCTAG
Parameters:

record : string, Seq, SeqRecord, Dseq or other Dseqrecord object

This data will be used to form the seq property

circular : bool, optional

True or False reflecting the shape of the DNA molecule

linear : bool, optional

True or False reflecting the shape of the DNA molecule

References

[9]http://biopython.org/wiki/SeqRecord

Examples

>>> from pydna import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from Bio.Seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt
>>> a.seq.alphabet
IUPACAmbiguousDNA()
>>> b.seq.alphabet
IUPACAmbiguousDNA()
>>> c.seq.alphabet
IUPACAmbiguousDNA()
>>>
circular[source]

The circular property

cseguid()[source]

Returns the cSEGUID for the sequence. The cSEGUID is the SEGUID checksum calculated for the lexicographically minimal string rotation of the sequence or its reverse complement.

Only defined for circular sequences.

The cSEGUID checksum uniqely identifies a circular sequence.

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaat", circular=True)
>>> a.cseguid()
'oopV+6158nHJqedi8lsshIfcqYA'
>>> a=pydna.Dseqrecord("ataa", circular=True)
>>> a.cseguid()
'oopV+6158nHJqedi8lsshIfcqYA'
cut(*enzymes)[source]

Digest the Dseqrecord object with one or more restriction enzymes. returns a list of linear Dseqrecords. If there are no cuts, an empty list is returned.

See also Dseq.cut()

Parameters:

enzymes : enzyme object or iterable of such objects

A Bio.Restriction.XXX restriction object or iterable of such.

Returns:

Dseqrecord_frags : list

list of Dseqrecord objects formed by the digestion

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("ggatcc")
>>> from Bio.Restriction import BamHI
>>> a.cut(BamHI)
[Dseqrecord(-5), Dseqrecord(-5)]
>>> frag1, frag2 = a.cut(BamHI)
>>> frag1.seq
Dseq(-5)
g
cctag
>>> frag2.seq
Dseq(-5)
gatcc
    g
format(f='gb')[source]

Returns the sequence as a string using a format supported by Biopython SeqIO [10]. Default is “gb” which is short for Genbank.

References

[10]http://biopython.org/wiki/SeqIO

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.annotations['date'] = '02-FEB-2013'
>>> a
Dseqrecord(-3)
>>> print a.format()
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
linear[source]

The linear property

linearize(*enzymes)[source]

This method is similar to cut() but throws an exception if there is not excactly on cut i.e. none or more than one digestion products.

looped()[source]

Returns a circular version of the Dseqrecord object. The underlying Dseq object has to have compatible ends.

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>
rc()[source]

alias of the reverse_complement method

reverse_complement()[source]

Returns the reverse complement.

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>
seguid()[source]

Returns the SEGUID [11] for the sequence.

References

[11]http://wiki.christophchamp.com/index.php/SEGUID

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.seguid()
'YG7G6b2Kj/KtFOX63j8mRHHoIlE'
shifted(shift)[source]

Returns a circular Dseqrecord with a new origin <shift>. This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT   <-- watson strand
CTTTA   <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin at the | symbol:

G|AAAT
C|TTTA

new sequence:

AAATG
TTTAC

Shift is always positive and 0<shift<length, so in the example below, permissible values of shift are 1,2 and 3

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat
stamp()[source]

Adds a seguid stamp to the description property. This will show in the genbank format. The following string:

SEGUID <seguid>

will be appended to the description property of the Dseqrecord object (string).

The stamp can be verified with verify_stamp()

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.stamp()
>>> a.description
'<unknown description> SEGUID YG7G6b2Kj/KtFOX63j8mRHHoIlE'
>>> a.verify_stamp()
True
synced(ref, limit=25)[source]

This function returns a new circular sequence (Dseqrecord object), which has ben rotated in such a way that there is maximum overlap between the sequence and ref, which may be a string, Biopython Seq, SeqRecord object or another Dseqrecord object.

The reason for using this could be to rotate a recombinant plasmid so that it starts at the same position as the vector backbone after cloning.

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("gaat",circular=True)
>>> a.seq
Dseq(o4)
gaat
ctta
>>> d = a[2:] + a[:2]
>>> d.seq
Dseq(-4)
atga
tact
>>> insert=pydna.Dseqrecord("CCC")
>>> recombinant = (d+insert).looped()
>>> recombinant.seq
Dseq(o7)
atgaCCC
tactGGG
>>> recombinant.synced(a).seq
Dseq(o7)
gaCCCat
ctGGGta
tolinear()[source]

Returns a linear, blunt copy of a circular Dseqrecord object. The underlying Dseq object has to be circular.

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a.tolinear()
>>> b
Dseqrecord(-3)
>>>
verify_stamp()[source]

Verifies the SEGUID stamp in the description property is valid. True if stamp match the sequid calculated from the sequence. Exception raised if no stamp can be found.

>>> import pydna
>>> a=pydna.Dseqrecord("aaa")
>>> a.annotations['date'] = '02-FEB-2013'
>>> a.seguid()
'YG7G6b2Kj/KtFOX63j8mRHHoIlE'
>>> print a.format("gb")
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> a.stamp()
>>> a
Dseqrecord(-3)
>>> print a.format("gb")
LOCUS       .                          3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  <unknown description> SEGUID YG7G6b2Kj/KtFOX63j8mRHHoIlE
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> a.verify_stamp()
True
>>>
write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO for writing [12]. Default is “gb” which is short for Genbank. Note that Biopython SeqIO reads more formats than it writes.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format.

If obj is the Dseqrecord object, the default file name will be:

<obj.description>.<f>

Where <f> is “gb” by default. If the filename already exists and AND the sequence it contains is different, a new file name will be used so that the old file is not lost:

<obj.description>_NEW.<f>

References

[12]http://biopython.org/wiki/SeqIO
pydna.dsdna.parse(data, ds=True)[source]

This function returns all DNA sequences found in data. If no sequences are found, an empty list is returned. This is a greedy function, use carefully.

Parameters:

data : string or iterable

The data parameter is a string containing:

  1. an absolute path to a local file. The file will be read in text mode and parsed for EMBL, FASTA and Genbank sequences.
  2. an absolute path to a local directory. All files in the directory will be read and parsed as in 1.
  3. a string containing one or more sequences in EMBL, GENBANK, or FASTA format. Mixed formats are allowed.
  4. data can be a list or other iterable of 1 - 3

ds : bool

If True double stranded Dseqrecord objects are returned. If False single stranded Bio.SeqRecord [13] objects are returned.

Returns:

list :

contains Dseqrecord or SeqRecord objects

See also

read

References

[13]http://biopython.org/wiki/SeqRecord
pydna.dsdna.rc(sequence)[source]

returns the reverse complement of sequence (string) accepts mixed DNA/RNA

pydna.dsdna.read(data, ds=True)[source]

This function is similar the parse() function but expects one and only one sequence or and exception is thrown.

Parameters:

data : string

see below

ds : bool

Double stranded or single stranded DNA, Return “Dseqrecord” or “SeqRecord” objects.

Returns:

Dseqrecord :

contains the first Dseqrecord or SeqRecord object parsed.

See also

parse

Notes

The data parameter is similar to the data parameter for parse().

pydna.editor module

This module provides a class for opening a sequence using an editor that accepts a file as a command line argument.

ApE - A plasmid Editor [14] is and excellent editor for this purpose.

References

[14]http://biologylabs.utah.edu/jorgensen/wayned/ape/
class pydna.editor.Editor(shell_command_for_editor, tmpdir=None)[source]

The Editor class needs to be instantiated before use.

Parameters:

shell_command_for_editor : str

String containing the path to the editor

tmpdir : str, optional

String containing path to the temprary directory where sequence files are stored before opening.

Examples

>>> import pydna
>>> ape = pydna.Editor("tclsh8.6 /home/bjorn/.ApE/apeextractor/ApE.vfs/lib/app-AppMain/AppMain.tcl")
>>> ape.open("aaa") # This command opens the sequence in the ApE editor
open(*args, **kwargs)[source]

Open a sequence for editing in an external (DNA) editor.

Parameters:args : sequence or iterable of sequences

pydna.findsubstrings_suffix_arrays_python module

pydna.findsubstrings_suffix_arrays_python.common_sub_strings(stringx, stringy, limit=25)[source]

Finds all the common substrings between stringx and stringy longer than limit. This function is case sensitive.

returns a list of tuples describing the substrings The list is sorted longest -> shortest.

Examples

[(startx1,starty1,length1),(startx2,starty2,length2), ...]

startx1 = position in x where substring 1 starts starty1 = position in y where substring 1 starts length = lenght of substring

pydna.findsubstrings_suffix_arrays_python.terminal_overlap(stringx, stringy, limit=15)[source]

pydna.gel module

This module contain experimental functionality for smulation of agarose gel electrophoresis of DNA.

Note

This code is in a very early stage of development and documentation. Read the source code to get an idea for how it works.

pydna.gel.AgaroseGel(samples, *args, **kwargs)[source]
class pydna.gel.Gel(size, agarose=1)[source]

A Gel contains lanes into which DNA can be loaded and run. The Gel is exposured to see the location and intensity of DNA.

BORDER = 40
LANE_HEIGHT = 6
LANE_MARGIN = 6
LANE_WIDTH = 30.0
expose(exposure=0.1, aperture=2)[source]

Returns an image of the gel with the DNA highlighted

loadSample(lane, sample)[source]

Add list containing tuple of DNA length and concentrations to lane in gel.

run(time=30.0, voltage=20.0)[source]

Move loaded DNA down the gel at a rate dependent on voltage, DNA length and agarose concentration.

pydna.gel.gamma(x, t, k=1)[source]
pydna.gel.testColour(c)[source]

Convert integer to colour, which saturates if > 255

pydna.primer_design module

This module contain functions for primer design.

pydna.primer_design.assembly_primers(templates, minlength=16, maxlength=29, tot_length=40, target_tm=55.0, primerc=1000.0, saltc=50.0, formula=<function tmbresluc at 0x4046500>)[source]

This function return primer pairs that are useful for fusion of DNA sequences given in template. Given two sequences that we wish to fuse (a and b) to form fragment c.

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca   <-->  TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt   <-->  AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA  
     \___________________ c ______________________/

We can design tailed primers to fuse a and b by fusion PCR, Gibson assembly or in-vivo homologous recombination. The basic requirements for the primers for the three techniques are the same.

Design tailed primers incorporating a part of the next or previous fragment to be assembled.

agcctatcatcttggtctctgca     
|||||||||||||||||||||||     
                gagacgtAAATATA

|||||||||||||||||||||||     
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||

                ctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT     
||||||||||||||||||||||||||||||     
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical 
                   sequences
                 ____________        
                /            \        
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by any of the techniques mentioned earlier.

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA  
Parameters:

templates : list of Dseqrecord

list Dseqrecord object for which fusion primers should be constructed.

minlength : int, optional

Minimum length of the annealing part of the primer.

maxlength : int, optional

Maximum length (including tail) for designed primers.

tot_length : int, optional

Maximum total length of a the primers

target_tm : float, optional

target tm for the primers

primerc : float, optional

Concentration of each primer in nM, set to 1000.0 nM by default

saltc : float, optional

Salt concentration (monovalet cations) tmbresluc set to 50.0 mM by default

formula : function

formula used for tm calculation this is the name of a function. built in options are:

  1. pydna.amplify.tmbresluc() (default)
  2. pydna.amplify.basictm()
  3. pydna.amplify.tmstaluc98()
  4. pydna.amplify.tmbreslauer86()

These functions are imported from the pydna.amplify module, but can be substituted for some other custom made function.

Returns:

primer_pairs : list of tuples of Bio.Seqrecord objects

[(forward_primer_1, reverse_primer_1),
 (forward_primer_2, reverse_primer_2), ...]

Examples

>>> import pydna
>>> a=pydna.Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
>>> b=pydna.Dseqrecord("ccaaacccaccaggtaccttatgtaagtacttcaagtcgccagaagacttcttggtcaagttgcc")
>>> c=pydna.Dseqrecord("tgtactggtgctgaaccttgtatcaagttgggtgttgacgccattgccccaggtggtcgtttcgtt")
>>> primer_pairs = pydna.assembly_primers([a,b,c])
>>> p=[]                                      
>>> for t, (f,r) in zip([a,b,c], primer_pairs): p.append(pydna.pcr(f,r,t))  
>>> p
[Amplicon(87), Amplicon(110), Amplicon(90)]
>>> assemblyobj = pydna.Assembly(p)     
>>> assemblyobj
Assembly object:
Sequences........................: 87, 110, 90
Sequences with shared homologies.: No analyzed sequences
Homology limit (bp)..............: 25
Number of overlaps...............: No overlaps
Nodes in graph...................: No graph
Assembly protocol................: No protocol
Circular products................: No circular products
Linear products..................: No linear products
>>>    
>>> assemblyobj.assemble_gibson_linear()
'5 linear products were formed with the following sizes (bp): 195, 155, 150, 47, 45'
>>> assemblyobj.linear_products
[Dseqrecord(-195), Dseqrecord(-155), Dseqrecord(-150), Dseqrecord(-47), Dseqrecord(-45)]
>>> assemblyobj.linear_products[0].seguid()
'1eNv3d/1PqDPP8qJZIVoA45Www8'
>>> (a+b+c).seguid()
'1eNv3d/1PqDPP8qJZIVoA45Www8'
>>> pydna.eq(a+b+c, assemblyobj.linear_products[0])
True
>>>
>>> print assemblyobj.linear_products[0].small_fig()
87bp_PCR_prod|47
              \/
              /\
              47|110bp_PCR_prod|45
                                \/
                                /\
                                45|90bp_PCR_prod
>>>                                
pydna.primer_design.cloning_primers(template, minlength=16, maxlength=29, fp=None, rp=None, fp_tail='', rp_tail='', target_tm=55.0, primerc=1000.0, saltc=50.0, formula=<function tmbresluc at 0x4046500>)[source]

This function can design primers for PCR amplification of a given sequence. This function accepts a Dseqrecord object containing the template sequence and returns a tuple cntaining two ::mod`Bio.SeqRecord.SeqRecord` objects describing the primers.

Primer tails can optionally be given in the form of strings.

An predesigned primer can be given, either the forward or reverse primers. In this case this function tries to design a primer with a Tm to match the given primer.

Parameters:

template : Dseqrecord

a Dseqrecord object.

minlength : int, optional

Minimum length of the annealing part of the primer

maxlength : int, optional

Maximum length (including tail) for designed primers.

fp, rp : SeqRecord, optional

optional Biopython SeqRecord objects containing one primer each.

fp_tail, rp_tail : string, optional

optional tails to be added to the forwars or reverse primers

target_tm : float, optional

target tm for the primers

primerc : float, optional

Concentration of each primer in nM, set to 1000.0 nM by default

saltc : float, optional

Salt concentration (monovalet cations) tmbresluc set to 50.0 mM by default

formula : function

formula used for tm calculation this is the name of a function. built in options are:

  1. pydna.amplify.tmbresluc() (default)
  2. pydna.amplify.basictm()
  3. pydna.amplify.tmstaluc98()
  4. pydna.amplify.tmbreslauer86()

These functions are imported from the pydna.amplify module, but can be substituted for some other custom made function.

Returns:

fp, rp : tuple

fp is a :mod:Bio.SeqRecord object describing the forward primer rp is a :mod:Bio.SeqRecord object describing the reverse primer

Examples

>>> import pydna
>>> t=pydna.Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
>>> t
Dseqrecord(-64)
>>> pf,pr = pydna.cloning_primers(t)
>>> pf
SeqRecord(seq=Seq('atgactgctaacccttc', Alphabet()), id='pfw64', name='pfw64', description='pfw64', dbxrefs=[])
>>> pr
SeqRecord(seq=Seq('catcgtaagtttcgaac', Alphabet()), id='prv64', name='prv64', description='prv64', dbxrefs=[])
>>> pcr_prod = pydna.pcr(pf, pr, t)
>>> pcr_prod
Amplicon(64)
>>>               
>>> print pcr_prod.figure()
5atgactgctaacccttc...gttcgaaacttacgatg3
                     ||||||||||||||||| tm 42.4 (dbd) 52.9
                    3caagctttgaatgctac5
5atgactgctaacccttc3
 ||||||||||||||||| tm 44.5 (dbd) 54.0
3tactgacgattgggaag...caagctttgaatgctac5
>>> pf,pr = pydna.cloning_primers(t, fp_tail="GGATCC", rp_tail="GAATTC")
>>> pf
SeqRecord(seq=Seq('GGATCCatgactgctaacccttc', Alphabet()), id='pfw64', name='pfw64', description='pfw64', dbxrefs=[])
>>> pr
SeqRecord(seq=Seq('GAATTCcatcgtaagtttcgaac', Alphabet()), id='prv64', name='prv64', description='prv64', dbxrefs=[])
>>> pcr_prod = pydna.pcr(pf, pr, t)
>>> print pcr_prod.figure()
      5atgactgctaacccttc...gttcgaaacttacgatg3
                           ||||||||||||||||| tm 42.4 (dbd) 52.9
                          3caagctttgaatgctacCTTAAG5
5GGATCCatgactgctaacccttc3
       ||||||||||||||||| tm 44.5 (dbd) 54.0
      3tactgacgattgggaag...caagctttgaatgctac5
>>> print pcr_prod.seq
GGATCCatgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatgGAATTC
>>>
>>> from Bio.Seq import Seq
>>> from Bio.SeqRecord import SeqRecord
>>> pf = SeqRecord(Seq("atgactgctaacccttccttggtgttg"))
>>> pf,pr = pydna.cloning_primers(t, fp = pf, fp_tail="GGATCC", rp_tail="GAATTC")
>>> pf
SeqRecord(seq=Seq('GGATCCatgactgctaacccttccttggtgttg', Alphabet()), id='pfw64', name='pfw64', description='pfw64', dbxrefs=[])
>>> pr
SeqRecord(seq=Seq('GAATTCcatcgtaagtttcgaacgaaatgtcgtc', Alphabet()), id='prv64', name='prv64', description='prv64', dbxrefs=[])
>>> ampl = pydna.pcr(pf,pr,t)
>>> print ampl.figure()
      5atgactgctaacccttccttggtgttg...gacgacatttcgttcgaaacttacgatg3
                                     |||||||||||||||||||||||||||| tm 57.5 (dbd) 72.2
                                    3ctgctgtaaagcaagctttgaatgctacCTTAAG5
5GGATCCatgactgctaacccttccttggtgttg3
       ||||||||||||||||||||||||||| tm 59.0 (dbd) 72.3
      3tactgacgattgggaaggaaccacaac...ctgctgtaaagcaagctttgaatgctac5
>>> 

pydna.utils module

This module provides miscellaneous functions.

pydna.utils.ChenFoxLyndon(s)[source]

Decompose s into Lyndon words according to the Chen-Fox-Lyndon theorem. The arguments are the same as for ChenFoxLyndonBreakpoints but the return values are subsequences of s rather than indices of breakpoints.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.ChenFoxLyndonBreakpoints(s)[source]

Find starting positions of Chen-Fox-Lyndon decomposition of s. The decomposition is a set of Lyndon words that start at 0 and continue until the next position. 0 itself is not output, but the final breakpoint at the end of s is. The argument s must be of a type that can be indexed (e.g. a list, tuple, or string). The algorithm follows Duval, J. Algorithms 1983, but uses 0-based indexing rather than Duval’s choice of 1-based indexing.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.SmallestRotation(s)[source]

Find the rotation of s that is smallest in lexicographic order. Duval 1983 describes how to modify his algorithm to do so but I think it’s cleaner and more general to work from the ChenFoxLyndon output.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

pydna.utils.copy_features(source_sr, target_sr, limit=10)[source]

This function tries to copy all features in source_seq and copy them to target_seq. Source_sr and target_sr are objects with a features property, such as Dseqrecord or Biopython SeqRecord.

Parameters:

source_seq : SeqRecord or Dseqrecord

The sequence to copy features from

target_seq : SeqRecord or Dseqrecord

The sequence to copy features to

Returns:

bool : True

This function acts on target_seq in place. No data is returned.

pydna.utils.cseguid(seq)[source]

Returns the cSEGUID for the sequence. The cSEGUID is the SEGUID checksum calculated for the lexicographically minimal string rotation of a DNA sequence. Only defined for circular sequences.

pydna.utils.eq(*args, **kwargs)[source]

Compares two or more DNA sequences for equality i.e. they represent the same DNA molecule. Comparisons are case insensitive.

Parameters:

args : iterable

iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.

circular : bool, optional

Consider all molecules circular or linear

linear : bool, optional

Consider all molecules circular or linear

Returns:

eq : bool

Returns True or False

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

  • They have the same sequence (case insensitive)
  • One sequence is the reverse complement of the other (case insensitive)

Two circular sequences are considered equal if they are circular permutations:

  1. They have the same lengt, AND
  2. One sequence or can be found in the concatenation of the other sequence with itself, OR
  3. The reverse complement can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna import eq, Dseqrecord
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True
pydna.utils.pairwise(iterable)[source]

s -> (s0,s1), (s1,s2), (s2, s3), ...

pydna.utils.shift_origin(seq, shift)[source]

Shift the origin of seq which is assumed to be a circular sequence.

Parameters:

seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence to be shifted.

Returns:

new_seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence with a new origin.

Examples

>>> import pydna
>>> pydna.shift_origin("taaa",1)
'aaat'
>>> pydna.shift_origin("taaa",0)
'taaa'
>>> pydna.shift_origin("taaa",2)
'aata'
>>> pydna.shift_origin("gatc",2)
'tcga'
pydna.utils.sync(seq, ref)[source]

Synchronize two circular sequences. This function tries to rotate the circular sequence seq so that it has a maximum overlap with ref.

Parameters:

seq : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence to be shifted.

ref : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

The reference sequence.

Returns:

sequence : string, Biopython Seq, Biopython SeqRecord, Dseq or Dseqrecord

sequence with a new origin.

Examples

>>> import pydna
>>> pydna.sync("taaatc","aaataa")
'aaatct'
>>> pydna.sync("taaatc","aaataa")
'aaatct'
>>> pydna.sync("taaat","aaataa")
'aaatt'

Module contents

Python-dna

The Python-dna package.

copyright:Copyright 2013 by Björn Johansson. All rights reserved.
license:This code is part of the Python-dna distribution and governed by its license. Please see the LICENSE.txt file that should have been included as part of this package.