3.1.2. Native contacts analysis — MDAnalysis.analysis.contacts

Author:Oliver Beckstein
Year:2010
Copyright:GNU Public License v3

Analysis of native contacts q over a trajectory.

  • a “contact” exists between two atoms i and j if the distance between them is smaller than a given radius
  • a “native contact” exists between i and j if a contact exists and if the contact also exists between the equivalent atoms in a reference structure or conformation

The “fraction of native contacts” q(t) is a number between 0 and 1 and calculated as the total number of native contacts for a given time frame divided by the total number of contacts in the reference structure.

Classes are available for two somewhat different ways to perform a contact analysis:

  1. Contacts between two groups of atoms are defined with ContactAnalysis1), which allows one to calculate q(t) over time. This is especially useful in order to look at native contacts during an equilibrium simulation where one can also look at the average matrix of native contacts (see ContactAnalysis1.plot_qavg()).
  2. Contacts are defined within one group in a protein (e.g. all C-alpha atoms) but relative to two different conformations 1 and 2, using ContactAnalysis. This allows one to do a q1-q2 analysis that shows how native contacts of state 1 change in comparison to native contacts of state 2. Transition pathways have been analyzed in terms of these two variables q1 and q2 that relate to the native contacts in the end states of the transition.

See also

See http://lorentz.dynstr.pasteur.fr/joel/adenylate.php for an example of contact analysis applied to MinActionPath trajectories of AdK (although this was not performed with MDAnalysis — it’s provided as a very good illustrative example).

3.1.2.1. Examples

3.1.2.1.1. One-dimensional contact analysis

As an example we analyze the opening (“unzipping”) of salt bridges when the AdK enzyme opens up; this is one of the example trajectories in MDAnalysis.

import MDAnalysis
import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import PSF,DCD

# example trajectory (transition of AdK from closed to open)
u = MDAnalysis.Universe(PSF,DCD)

# crude definition of salt bridges as contacts between NH/NZ in ARG/LYS and OE*/OD* in ASP/GLU.
# You might want to think a little bit harder about the problem before using this for real work.
sel_basic = "(resname ARG or resname LYS) and (name NH* or name NZ)"
sel_acidic = "(resname ASP or resname GLU) and (name OE* or name OD*)"

# reference groups (first frame of the trajectory, but you could also use a separate PDB, eg crystal structure)
acidic = u.select_atoms(sel_acidic)
basic = u.select_atoms(sel_basic)

# set up analysis of native contacts ("salt bridges"); salt bridges have a distance <6 A
CA1 = MDAnalysis.analysis.contacts.ContactAnalysis1(u, selection=(sel_acidic, sel_basic), refgroup=(acidic,
basic), radius=6.0, outfile="qsalt.dat")

# iterate through trajectory and perform analysis of "native contacts" q
# (force=True ignores any previous results, force=True is useful when testing)
CA1.run(force=True)

# plot time series q(t) [possibly do "import pylab; pylab.clf()" do clear the figure first...]
CA1.plot(filename="adk_saltbridge_contact_analysis1.pdf", linewidth=3, color="blue")

# or plot the data in qsalt.dat yourself.
CA1.plot_qavg(filename="adk_saltbridge_contact_analysis1_matrix.pdf")

The first graph shows that when AdK opens, about 20% of the salt bridges that existed in the closed state disappear when the enzyme opens. They open in a step-wise fashion (made more clear by the movie http://sbcb.bioch.ox.ac.uk/oliver/Movies/AdK/AdK_zipper_cartoon.avi (divx, on Mac use http://perian.org)).

The output graphs can be made prettier but if you look at the code itself then you’ll quickly figure out what to do. The qavg plot is the matrix of all contacts, averaged over the trajectory. This plot makes more sense for an equilibrium trajectory than for the example above but is is included for illustration.

See the docs for ContactAnalysis1 for another example.

3.1.2.1.2. Two-dimensional contact analysis (q1-q2)

Analyze a single DIMS transition of AdK between its closed and open conformation and plot the trajectory projected on q1-q2:

import MDAnalysis.analysis.contacts
from MDAnalysis.tests.datafiles import *
C = MDAnalysis.analysis.contacts.ContactAnalysis(PSF, DCD)
C.run()
C.plot()

Compare the resulting pathway to the MinActionPath result for AdK.

3.1.2.2. Classes

class MDAnalysis.analysis.contacts.ContactAnalysis(topology, trajectory, ref1=None, ref2=None, radius=8.0, targetdir='.', infix='', force=False, selection='name CA', centroids=False)[source]

Perform a native contact analysis (“q1-q2”).

The analysis of the trajectory is performed with the ContactAnalysis.run() method. The result is stored in ContactAnalysis.timeseries. It is a numpy array which contains the frame number at index 0, q1 and q2 at index 1 and 2, and the total number of contacts in 3 and 4.

frame  q1 q2  n1 n2

The total number of contacts in the reference states 1 and 2 are stored in ContactAnalysis.nref (index 0 and 1).

Calculate native contacts from two reference structures.

Arguments:
topology

psf or pdb file

trajectory

dcd or xtc/trr file

ref1

structure of the reference conformation 1 (pdb); if None the first frame of the trajectory is chosen

ref2

structure of the reference conformation 2 (pdb); if None the last frame of the trajectory is chosen

radius

contacts are deemed any Ca within radius [8 A]

targetdir

output files are saved there [.]

infix

additional tag string that is inserted into the output filename of the data file [“”]

selection

MDAnalysis selection string that selects the particles of interest; the default is to only select the C-alpha atoms in ref1 and *ref*2 [“name CA”]

Note

If selection produces more than one atom per residue then you will get multiple contacts per residue unless you also set centroids = True

centroids

If set to True, use the centroids for the selected atoms on a per-residue basis to compute contacts. This allows, for instance defining the sidechains as selection and then computing distances between sidechain centroids.

The function calculates the percentage of native contacts q1 and q2 along a trajectory. “Contacts” are defined as the number of Ca atoms (or per-residue centroids of a user defined selection) within radius of a primary Ca. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory) and q2 is the fraction of contacts relative to the conformation 2.

The timeseries is written to a bzip2-compressed file in targetdir named “basename(trajectory)*infix*_q1q2.dat.bz2” and is also accessible as the attribute ContactAnalysis.timeseries.

get_distance_array(g, **kwargs)[source]

Calculate the self_distance_array for atoms in group g.

Keywords:
results

passed on to MDAnalysis.lib.distances.self_distance_array() as a preallocated array

centroids

True: calculate per-residue centroids from the selected atoms; False: consider each atom separately; None: use the class default for centroids [None]

load(filename)[source]

Load the data file.

output_exists(force=False)[source]

Return True if default output file already exists.

Disable with force=True (will always return False)

plot(**kwargs)[source]

Plot q1-q2.

qN(q, n, out=None)[source]

Calculate native contacts relative to state n.

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

qarray(d, out=None)[source]

Return distance array with True for contacts.

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

run(store=True, force=False)[source]

Analyze trajectory and produce timeseries.

Stores results in ContactAnalysis.timeseries (if store=True) and writes them to a bzip2-compressed data file.

class MDAnalysis.analysis.contacts.ContactAnalysis1(*args, **kwargs)[source]

Perform a very flexible native contact analysis with respect to a single reference.

class ContactAnalysis1(topology, trajectory[, selection[, refgroup[, radius[, outfile]]]])
class MDAnalysis.analysis.contacts.ContactAnalysis1(universe[, selection[, refgroup[, radius[, outfile]]]])[source]

This analysis class allows one to calculate the fraction of native contacts q between two arbitrary groups of atoms with respect to an arbitrary reference structure. For instance, as a reference one could take a crystal structure of a complex, and as the two groups atoms one selects two molecules A and B in the complex. Then the question to be answered by q is, is which percentage of the contacts between A and B persist during the simulation.

First prepare AtomGroup selections for the reference atoms; this example uses some arbitrary selections:

ref = Universe('crystal.pdb')
refA = re.select_atoms('name CA and segid A and resid 6:100')
refB = re.select_atoms('name CA and segid B and resid 1:40')

Load the trajectory:

u = Universe(topology, trajectory)

We then need two selection strings selA and selB that, when applied as u.select_atoms(selA) produce a list of atoms that is equivalent to the reference (i.e. u.select_atoms(selA) must select the same atoms as refA in this example):

selA = 'name CA and resid 1:95'     # corresponds to refA
selB = 'name CA and resid 150:189'  # corresponds to refB

Note

It is the user’s responsibility to provide a reference group (or groups) that describe equivalent atoms to the ones selected by selection.

Now we are ready to set up the analysis:

CA1 = ContactAnalysis1(u, selection=(selA,selB), refgroup=(refA,refB), radius=8.0, outfile="q.dat")

If the groups do not match in length then a ValueError is raised.

The analysis across the whole trajectory is performed with

CA1.run()

Results are saved to outfile (framenumber q N per line) and can also be plotted with

CA1.plot()        # plots the time series q(t)
CA1.plot_qavg()   # plots the matrix of average contacts <q>

Description of computed values in the output file:

N
number of native contacts
q
fraction of native contacts relative to the reference

Calculate native contacts within a group or between two groups.

Arguments:
topology

psf or pdb file

trajectory

dcd or xtc/trr file

universe

instead of a topology/trajectory combination, one can also supply a MDAnalysis.Universe

Keywords:
selection

selection string that determines which distances are calculated; if this is a tuple or list with two entries then distances are calculated between these two different groups [“name CA or name B*”]

refgroup

reference group, either a single AtomGroup (if there is only a single selection) or a list of two such groups. The reference contacts are directly computed from refgroup and hence the atoms in the reference group(s) must be equivalent to the ones produced by the selection on the input trajectory.

radius

contacts are deemed any atoms within radius [8.0 A]

outfile

name of the output file; with the gz or bz2 suffix, a compressed file is written. The average <q> is written to a second, gzipped file that has the same name with ‘array’ included. E.g. for the default name “q1.dat.gz” the <q> file will be “q1.array.gz”. The format is the matrix in column-row format, i.e. selection 1 residues are the columns and selection 2 residues are rows. The file can be read with numpy.loadtxt(). [“q1.dat.gz”]

The function calculates the percentage of native contacts q1 along a trajectory. “Contacts” are defined as the number of atoms within radius of a given other atom. q1 is the fraction of contacts relative to the reference state 1 (typically the starting conformation of the trajectory).

The timeseries is written to a file outfile and is also accessible as the attribute ContactAnalysis1.timeseries.

ContactAnalysis1.load(filename)[source]

Load the data file.

ContactAnalysis1.output_exists(force=False)[source]

Return True if default output file already exists.

Disable with force=True (will always return False)

ContactAnalysis1.plot(filename=None, **kwargs)[source]

Plot q(t).

plot([filename, ...])[source]

If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to pylab.plot().

ContactAnalysis1.plot_qavg(filename=None, **kwargs)[source]

Plot ContactAnalysis1.qavg, the matrix of average native contacts.

plot_qavg([filename, ...])[source]

If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to pylab.imshow().

ContactAnalysis1.qN(q, out=None)[source]

Calculate native contacts relative to reference state.

q is the matrix of contacts (e.g. q).

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

This method is typically only used internally.

ContactAnalysis1.qarray(d, out=None)[source]

Return distance array with True for contacts.

d is the matrix of distances. The method uses the value of ContactAnalysis1.radius to determine if a distance < radius is considered a contact.

If out is supplied as a pre-allocated array of the correct shape then it is filled instead of allocating a new one in order to increase performance.

This method is typically only used internally.

ContactAnalysis1.run(store=True, force=False, start_frame=1, end_frame=None, step_value=1)[source]

Analyze trajectory and produce timeseries.

Stores results in ContactAnalysis1.timeseries (if store=True) and writes them to a data file. The average q is written to a second data file. start_frame

The value of the first frame number in the trajectory to be used (default: frame 1)
end_frame
The value of the last frame number in the trajectory to be used (default: None – use all frames)
step_value
The number of frames to skip during trajectory iteration (default: use every frame)