Author: | Oliver Beckstein |
---|---|
Year: | 2010 |
Copyright: | GNU Public License v3 |
Analysis of native contacts q over a trajectory.
The “fraction of native contacts” q(t) is a number between 0 and 1 and calculated as the total number of native contacts for a given time frame divided by the total number of contacts in the reference structure.
Classes are available for two somewhat different ways to perform a contact analysis:
See also
See http://lorentz.dynstr.pasteur.fr/joel/adenylate.php for an example of contact analysis applied to MinActionPath trajectories of AdK (although this was not performed with MDAnalysis — it’s provided as a very good illustrative example).
As an example we analyze the opening (“unzipping”) of salt bridges when the AdK enzyme opens up; this is one of the example trajectories in MDAnalysis.
import MDAnalysis
import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import PSF,DCD
# example trajectory (transition of AdK from closed to open)
u = MDAnalysis.Universe(PSF,DCD)
# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and OE*/OD* in ASP/GLU.
# You might want to think a little bit harder about the problem before using this for real work.
sel_basic = "(resname ARG or resname LYS) and (name NH* or name NZ)"
sel_acidic = "(resname ASP or resname GLU) and (name OE* or name OD*)"
# reference groups (first frame of the trajectory, but you could also use a separate PDB, eg crystal structure)
acidic = u.selectAtoms(sel_acidic)
basic = u.selectAtoms(sel_basic)
# set up analysis of native contacts ("salt bridges"); salt bridges have a distance <6 A
CA1 = MDAnalysis.analysis.contacts.ContactAnalysis1(u, selection=(sel_acidic, sel_basic), refgroup=(acidic, basic), radius=6.0, outfile="qsalt.dat")
# iterate through trajectory and perform analysis of "native contacts" q
# (force=True ignores any previous results, force=True is useful when testing)
CA1.run(force=True)
# plot time series q(t) [possibly do "import pylab; pylab.clf()" do clear the figure first...]
CA1.plot(filename="adk_saltbridge_contact_analysis1.pdf", linewidth=3, color="blue")
# or plot the data in qsalt.dat yourself.
CA1.plot_qavg(filename="adk_saltbridge_contact_analysis1_matrix.pdf")
The first graph shows that when AdK opens, about 20% of the salt bridges that existed in the closed state disappear when the enzyme opens. They open in a step-wise fashion (made more clear by the movie http://sbcb.bioch.ox.ac.uk/oliver/Movies/AdK/AdK_zipper_cartoon.avi (divx, on Mac use http://perian.org)).
The output graphs can be made prettier but if you look at the code itself then you’ll quickly figure out what to do. The qavg plot is the matrix of all contacts, averaged over the trajectory. This plot makes more sense for an equilibrium trajectory than for the example above but is is included for illustration.
See the docs for ContactAnalysis1 for another example.
Analyze a single DIMS transition of AdK between its closed and open conformation and plot the trajectory projected on q1-q2:
import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import *
C = MDAnalysis.analysis.contacts.ContactAnalysis(PSF, DCD)
C.run()
C.plot()
Compare the resulting pathway to the MinActionPath result for AdK.
Perform a native contact analysis (“q1-q2”).
The analysis of the trajectory is performed with the ContactAnalysis.run() method. The result is stored in ContactAnalysis.timeseries. It is a numpy array which contains the frame number at index 0, q1 and q2 at index 1 and 2, and the total number of contacts in 3 and 4.
frame q1 q2 n1 n2
The total number of contacts in the reference states 1 and 2 are stored in ContactAnalysis.nref (index 0 and 1).
Calculate native contacts from two reference structures.
Arguments : |
|
---|
The function calculates the percentage of native contacts q1 and q2 along a trajectory. “Contacts” are defined as the number of Ca atoms within radius of a primary Ca. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory) and q2 is the fraction of contacts relative to the conformation 2.
The timeseries is written to a bzip2-compressed file in targetdir named “basename(trajectory)*infix*_q1q2.dat.bz2” and is also accessible as the attribute ContactAnalysis.timeseries.
Load the data file.
Return True if default output file already exists.
Disable with force=True (will always return False)
Plot q1-q2.
Calculate native contacts relative to state n.
If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
Return distance array with True for contacts.
If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
Analyze trajectory and produce timeseries.
Stores results in ContactAnalysis.timeseries (if store=True) and writes them to a bzip2-compressed data file.
Perform a very flexible native contact analysis with respect to a single reference.
This analysis class allows one to calculate the fraction of native contacts q between two arbitrary groups of atoms with respect to an arbitrary reference structure. For instance, as a reference one could take a crystal structure of a complex, and as the two groups atoms one selects two molecules A and B in the complex. Then the question to be answered by q is, is which percentage of the contacts between A and B persist during the simulation.
First prepare AtomGroup selections for the reference atoms; this example uses some arbitrary selections:
ref = Universe('crystal.pdb')
refA = re.selectAtoms('name CA and segid A and resid 6:100')
refB = re.selectAtoms('name CA and segid B and resid 1:40')
Load the trajectory:
u = Universe(topology, trajectory)
We then need two selection strings selA and selB that, when applied as u.selectAtoms(selA) produce a list of atoms that is equivalent to the reference (i.e. u.selectAtoms(selA) must select the same atoms as refA in this example):
selA = 'name CA and resid 1:95' # corresponds to refA
selB = 'name CA and resid 150:189' # corresponds to refB
Note
It is the user’s responsibility to provide a reference group (or groups) that describe equivalent atoms to the ones selected by selection.
Now we are ready to set up the analysis:
CA1 = ContactAnalysis1(u, selection=(selA,selB), refgroup=(refA,refB), radius=8.0, outfile="q.dat")
If the groups do not match in length then a ValueError is raised.
The analysis across the whole trajectory is performed with
CA1.run()
Results are saved to outfile (framenumber q N per line) and can also be plotted with
CA1.plot() # plots the time series q(t)
CA1.plot_qavg() # plots the matrix of average contacts <q>
Description of computed values in the output file:
Calculate native contacts within a group or between two groups.
Arguments : |
|
---|---|
Keywords : |
|
The function calculates the percentage of native contacts q1 along a trajectory. “Contacts” are defined as the number of atoms within radius of a given other atom. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory).
The timeseries is written to a file outfile and is also accessible as the attribute ContactAnalysis1.timeseries.
Load the data file.
Return True if default output file already exists.
Disable with force=True (will always return False)
Plot q(t).
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to pylab.plot().
Plot ContactAnalysis1.qavg, the matrix of average native contacts.
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to pylab.imshow().
Calculate native contacts relative to reference state.
q is the matrix of contacts (e.g. q).
If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
Return distance array with True for contacts.
d is the matrix of distances. The method uses the value of ContactAnalysis1.radius to determine if a distance < radius is considered a contact.
If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
Analyze trajectory and produce timeseries.
Stores results in ContactAnalysis1.timeseries (if store=True) and writes them to a data file. The average q is written to a second data file.