Biopython Tutorial and Cookbook
Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck,
Michiel de Hoon, Peter Cock, Tiago Antão
Last Update – 3 April 2009 (Biopython 1.50
beta
)
Contents
Introduction
What is Biopython?
Installing Biopython
FAQ
Quick Start – What can you do with Biopython?
General overview of what Biopython provides
Working with sequences
A usage example
Parsing sequence file formats
Connecting with biological databases
What to do next
Sequence objects
Sequences and Alphabets
Sequences act like strings
Slicing a sequence
Turning Seq objects into strings
Concatenating or adding sequences
Nucleotide sequences and (reverse) complements
Transcription
Translation
Translation Tables
MutableSeq objects
Working with directly strings
Sequence Input/Output
Parsing or Reading Sequences
Parsing sequences from the net
Sequence files as Dictionaries
Writing Sequence Files
Sequence Alignment Input/Output
Parsing or Reading Sequence Alignments
Writing Alignments
BLAST
Running BLAST locally
Running BLAST over the Internet
Saving BLAST output
Parsing BLAST output
The BLAST record class
Deprecated BLAST parsers
Dealing with PSI-BLAST
Dealing with RPS-BLAST
Accessing NCBI’s Entrez databases
Entrez Guidelines
EInfo: Obtaining information about the Entrez databases
ESearch: Searching the Entrez databases
EPost: Uploading a list of identifiers
ESummary: Retrieving summaries from primary IDs
EFetch: Downloading full records from Entrez
ELink: Searching for related items in NCBI Entrez
EGQuery: Obtaining counts for search terms
ESpell: Obtaining spelling suggestions
Specialized parsers
Using a proxy
Examples
Using the history and WebEnv
Swiss-Prot and ExPASy
Parsing Swiss-Prot files
Parsing Prosite records
Parsing Prosite documentation records
Parsing Enzyme records
Accessing the ExPASy server
Scanning the Prosite database
Going 3D: The PDB module
Structure representation
Disorder
Hetero residues
Some random usage examples
Common problems in PDB files
Other features
Bio.PopGen: Population genetics
GenePop
Coalescent simulation
Other applications
Future Developments
Supervised learning methods
The Logistic Regression Model
k
-Nearest Neighbors
Naïve Bayes
Maximum Entropy
Markov Models
Graphics including GenomeDiagram
GenomeDiagram
Chromosomes
Cookbook – Cool things to do with it
Sequence parsing plus simple plots
Dealing with alignments
Substitution Matrices
BioSQL – storing sequences in a relational database
InterPro
The Biopython testing framework
Running the tests
Writing tests
Writing doctests
Advanced
The SeqRecord and SeqFeature classes
Parser Design
Substitution Matrices
Where to go from here – contributing to Biopython
Bug Reports + Feature Requests
Mailing lists and helping newcomers
Contributing Documentation
Maintaining a distribution for a platform
Contributing Unit Tests
Contributing Code
Appendix: Useful stuff about Python
What the heck is a handle?
This document was translated from L
A
T
E
X by
H
E
V
E
A
.