3.1.2. Native contacts analysis — MDAnalysis.analysis.contacts
¶
Author: | Oliver Beckstein |
---|---|
Year: | 2010 |
Copyright: | GNU Public License v3 |
Analysis of native contacts q over a trajectory.
- a “contact” exists between two atoms i and j if the distance between them is smaller than a given radius
- a “native contact” exists between i and j if a contact exists and if the contact also exists between the equivalent atoms in a reference structure or conformation
The “fraction of native contacts” q(t) is a number between 0 and 1 and calculated as the total number of native contacts for a given time frame divided by the total number of contacts in the reference structure.
Classes are available for two somewhat different ways to perform a contact analysis:
- Contacts between two groups of atoms are defined with
ContactAnalysis1
), which allows one to calculate q(t) over time. This is especially useful in order to look at native contacts during an equilibrium simulation where one can also look at the average matrix of native contacts (seeContactAnalysis1.plot_qavg()
). - Contacts are defined within one group in a protein (e.g. all C-alpha atoms)
but relative to two different conformations 1 and 2, using
ContactAnalysis
. This allows one to do a q1-q2 analysis that shows how native contacts of state 1 change in comparison to native contacts of state 2. Transition pathways have been analyzed in terms of these two variables q1 and q2 that relate to the native contacts in the end states of the transition.
See also
See http://lorentz.dynstr.pasteur.fr/joel/adenylate.php for an example of contact analysis applied to MinActionPath trajectories of AdK (although this was not performed with MDAnalysis — it’s provided as a very good illustrative example).
Examples
3.1.2.1. One-dimensional contact analysis¶
As an example we analyze the opening (“unzipping”) of salt bridges when the AdK enzyme opens up; this is one of the example trajectories in MDAnalysis.
import MDAnalysis
import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import PSF,DCD
# example trajectory (transition of AdK from closed to open)
u = MDAnalysis.Universe(PSF,DCD)
# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and OE*/OD* in ASP/GLU.
# You might want to think a little bit harder about the problem before using this for real work.
sel_basic = "(resname ARG or resname LYS) and (name NH* or name NZ)"
sel_acidic = "(resname ASP or resname GLU) and (name OE* or name OD*)"
# reference groups (first frame of the trajectory, but you could also use a separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)
# set up analysis of native contacts ("salt bridges"); salt bridges have a distance <6 A
CA1 = MDAnalysis.analysis.contacts.ContactAnalysis1(u, selection=(sel_acidic, sel_basic), refgroup=(acidic,
basic), radius=6.0, outfile="qsalt.dat")
# iterate through trajectory and perform analysis of "native contacts" q
# (force=True ignores any previous results, force=True is useful when testing)
CA1.run(force=True)
# plot time series q(t) [possibly do "import pylab; pylab.clf()" do clear the figure first...]
CA1.plot(filename="adk_saltbridge_contact_analysis1.pdf", linewidth=3, color="blue")
# or plot the data in qsalt.dat yourself.
CA1.plot_qavg(filename="adk_saltbridge_contact_analysis1_matrix.pdf")
The first graph shows that when AdK opens, about 20% of the salt bridges that existed in the closed state disappear when the enzyme opens. They open in a step-wise fashion (made more clear by the movie http://sbcb.bioch.ox.ac.uk/oliver/Movies/AdK/AdK_zipper_cartoon.avi (divx, on Mac use http://perian.org)).
The output graphs can be made prettier but if you look at the code itself then you’ll quickly figure out what to do. The qavg plot is the matrix of all contacts, averaged over the trajectory. This plot makes more sense for an equilibrium trajectory than for the example above but is is included for illustration.
See the docs for ContactAnalysis1
for another example.
3.1.2.2. Two-dimensional contact analysis (q1-q2)¶
Analyze a single DIMS transition of AdK between its closed and open conformation and plot the trajectory projected on q1-q2:
import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import *
C = MDAnalysis.analysis.contacts.ContactAnalysis(PSF, DCD)
C.run()
C.plot()
Compare the resulting pathway to the MinActionPath result for AdK.
3.1.2.2.1. Classes¶
-
class
MDAnalysis.analysis.contacts.
ContactAnalysis
(topology, trajectory, ref1=None, ref2=None, radius=8.0, targetdir='.', infix='', force=False, selection='name CA', centroids=False)[source]¶ Perform a native contact analysis (“q1-q2”).
The analysis of the trajectory is performed with the
ContactAnalysis.run()
method. The result is stored inContactAnalysis.timeseries
. It is a numpy array which contains the frame number at index 0, q1 and q2 at index 1 and 2, and the total number of contacts in 3 and 4.frame q1 q2 n1 n2
The total number of contacts in the reference states 1 and 2 are stored in
ContactAnalysis.nref
(index 0 and 1).Calculate native contacts from two reference structures.
Parameters: - *topology* – psf or pdb file
- *trajectory* – dcd or xtc/trr file
- *ref1* – structure of the reference conformation 1 (pdb); if
None
the first frame of the trajectory is chosen - *ref2* – structure of the reference conformation 2 (pdb); if
None
the last frame of the trajectory is chosen - *radius* – contacts are deemed any Ca within radius [8 A]
- *targetdir* – output files are saved there [.]
- *infix* – additional tag string that is inserted into the output filename of the data file [“”]
- *selection* – MDAnalysis selection string that selects the particles of interest; the default is to only select the C-alpha atoms in ref1 and *ref*2 [“name CA”]
Note
If selection produces more than one atom per residue then you will get multiple contacts per residue unless you also set centroids =
True
- *centroids* – If set to
True
, use the centroids for the selected atoms on a per-residue basis to compute contacts. This allows, for instance defining the sidechains as selection and then computing distances between sidechain centroids.
The function calculates the percentage of native contacts q1 and q2 along a trajectory. “Contacts” are defined as the number of Ca atoms (or per-residue centroids of a user defined selection) within radius of a primary Ca. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory) and q2 is the fraction of contacts relative to the conformation 2.
The timeseries is written to a bzip2-compressed file in targetdir named “basename(trajectory)*infix*_q1q2.dat.bz2” and is also accessible as the attribute
ContactAnalysis.timeseries
.-
get_distance_array
(g, **kwargs)[source]¶ Calculate the self_distance_array for atoms in group g.
Keywords: - results
passed on to
MDAnalysis.lib.distances.self_distance_array()
as a preallocated array- centroids
True
: calculate per-residue centroids from the selected atoms;False
: consider each atom separately;None
: use the class default for centroids [None
]
-
output_exists
(force=False)[source]¶ Return True if default output file already exists.
Disable with force=True (will always return False)
-
qN
(q, n, out=None)[source]¶ Calculate native contacts relative to state n.
If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
-
class
MDAnalysis.analysis.contacts.
ContactAnalysis1
(*args, **kwargs)[source]¶ Perform a very flexible native contact analysis with respect to a single reference.
-
class
ContactAnalysis1
(topology, trajectory[, selection[, refgroup[, radius[, outfile]]]])¶
-
class
MDAnalysis.analysis.contacts.
ContactAnalysis1
(universe[, selection[, refgroup[, radius[, outfile]]]])[source]
This analysis class allows one to calculate the fraction of native contacts q between two arbitrary groups of atoms with respect to an arbitrary reference structure. For instance, as a reference one could take a crystal structure of a complex, and as the two groups atoms one selects two molecules A and B in the complex. Then the question to be answered by q is, is which percentage of the contacts between A and B persist during the simulation.
First prepare
AtomGroup
selections for the reference atoms; this example uses some arbitrary selections:ref = Universe('crystal.pdb') refA = re.select_atoms('name CA and segid A and resid 6:100') refB = re.select_atoms('name CA and segid B and resid 1:40')
Load the trajectory:
u = Universe(topology, trajectory)
We then need two selection strings selA and selB that, when applied as
u.select_atoms(selA)
produce a list of atoms that is equivalent to the reference (i.e.u.select_atoms(selA)
must select the same atoms asrefA
in this example):selA = 'name CA and resid 1:95' # corresponds to refA selB = 'name CA and resid 150:189' # corresponds to refB
Note
It is the user’s responsibility to provide a reference group (or groups) that describe equivalent atoms to the ones selected by selection.
Now we are ready to set up the analysis:
CA1 = ContactAnalysis1(u, selection=(selA,selB), refgroup=(refA,refB), radius=8.0, outfile="q.dat")
If the groups do not match in length then a
ValueError
is raised.The analysis across the whole trajectory is performed with
CA1.run()
Results are saved to outfile (
framenumber q N
per line) and can also be plotted withCA1.plot() # plots the time series q(t) CA1.plot_qavg() # plots the matrix of average contacts <q>
Description of computed values in the output file:
- N
- number of native contacts
- q
- fraction of native contacts relative to the reference
Calculate native contacts within a group or between two groups.
Parameters: - *topology* – psf or pdb file
- *trajectory* – dcd or xtc/trr file
- *universe* – instead of a topology/trajectory combination, one can also supply
a
MDAnalysis.Universe
Keywords: - selection
selection string that determines which distances are calculated; if this is a tuple or list with two entries then distances are calculated between these two different groups [“name CA or name B*”]
- refgroup
reference group, either a single
AtomGroup
(if there is only a single selection) or a list of two such groups. The reference contacts are directly computed from refgroup and hence the atoms in the reference group(s) must be equivalent to the ones produced by the selection on the input trajectory.- radius
contacts are deemed any atoms within radius [8.0 A]
- outfile
name of the output file; with the gz or bz2 suffix, a compressed file is written. The average <q> is written to a second, gzipped file that has the same name with ‘array’ included. E.g. for the default name “q1.dat.gz” the <q> file will be “q1.array.gz”. The format is the matrix in column-row format, i.e. selection 1 residues are the columns and selection 2 residues are rows. The file can be read with
np.loadtxt()
. [“q1.dat.gz”]
The function calculates the percentage of native contacts q1 along a trajectory. “Contacts” are defined as the number of atoms within radius of a given other atom. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory).
The timeseries is written to a file outfile and is also accessible as the attribute
ContactAnalysis1.timeseries
.-
ContactAnalysis1.
output_exists
(force=False)[source]¶ Return True if default output file already exists.
Disable with force=True (will always return False)
-
ContactAnalysis1.
plot
(filename=None, **kwargs)[source]¶ Plot q(t).
-
plot
([filename, ...])[source]
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to
pylab.plot()
.-
-
ContactAnalysis1.
plot_qavg
(filename=None, **kwargs)[source]¶ Plot
ContactAnalysis1.qavg
, the matrix of average native contacts.-
plot_qavg
([filename, ...])[source]
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to
pylab.imshow()
.-
-
ContactAnalysis1.
qN
(q, out=None)[source]¶ Calculate native contacts relative to reference state.
q is the matrix of contacts (e.g.
q
).If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
-
ContactAnalysis1.
qarray
(d, out=None)[source]¶ Return distance array with True for contacts.
d is the matrix of distances. The method uses the value of
ContactAnalysis1.radius
to determine if adistance < radius
is considered a contact.If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.
This method is typically only used internally.
-
ContactAnalysis1.
run
(store=True, force=False, start_frame=1, end_frame=None, step_value=1)[source]¶ Analyze trajectory and produce timeseries.
Stores results in
ContactAnalysis1.timeseries
(if store=True) and writes them to a data file. The average q is written to a second data file. start_frameThe value of the first frame number in the trajectory to be used (default: frame 1)- end_frame
- The value of the last frame number in the trajectory to be used (default: None – use all frames)
- step_value
- The number of frames to skip during trajectory iteration (default: use every frame)
-
class