3.3.1. Generation and Analysis of HOLE pore profiles — MDAnalysis.analysis.hole

Author:Lukas Stelzl, Oliver Beckstein
Year:2011-2012
Copyright:GNU Public License v2

With the help of this module, HOLE can be run on frames in a trajectory. Data can be combined and analyzed. HOLE [Smart1993] [Smart1996] must be installed separately and can be obtained from http://www.smartsci.uk/hole/.

References

[Smart1993]O.S. Smart, J.M. Goodfellow and B.A. Wallace. The Pore Dimensions of Gramicidin A. Biophysical Journal 65:2455-2460, 1993.
[Smart1996]O.S. Smart, J.G. Neduvelil, X. Wang, B.A. Wallace, and M.S.P. Sansom. HOLE: A program for the analysis of the pore dimensions of ion channel structural models. J.Mol.Graph., 14:354–360, 1996. URL http://www.smartsci.uk/hole/

Examples

3.3.1.1. Single structure

Gramicidin A (gA) channel:

from MDAnalysis.analysis.hole import HOLE, HOLEtraj
from MDAnalysis.tests.datafiles import PDB_HOLE

H = HOLE(PDB_HOLE, executable="~/hole2/bin/hole")  # set path to your hole binary
H.run()
H.collect()
H.plot(linewidth=3, color="black", label=False)

3.3.1.2. Trajectory

Analyzing a trajectory:

u = MDAnalysis.Universe(psf, trajectory)
H = HOLEtraj(u, ...)
H.run()
H.plot3D()

The profiles are available as the attribute HOLEtraj.profiles (H.profiles in the example) and are indexed by frame number but can also be indexed by an arbitrary order parameter as shown in the next example.

3.3.1.3. Trajectory with RMSD as order parameter

In order to classify the HOLE profiles the RMSD to a reference structure is calculated for each trajectory frame (e.g. using the MDAnalysis.analysis.rms.RMSD analysis class). Then the HOLE profiles can be ordered by the RMSD, which acts as an order parameter.

import MDAnalysis.analysis.hole
import MDAnalysis.analysis.rms
MDAnalysis.start_logging()
ref = MDAnalysis.Universe("ref.psf", "ref.pdb")         # reference structure
u = MDAnalysis.Universe("ref.psf", "traj.xtc")          # trajectory

# calculate RMSD
R = MDAnalysis.analysis.rms.RMSD(u, reference=ref, select="protein", mass_weighted=True)
R.run()

# HOLE analysis with order parameters
H = MDAnalysis.analysis.hole.HOLEtraj(u, cvect=[0,0,1], orderparameters=R.rmsd[:,2])
H.run()

The HOLEtraj.profiles dictionary will have the order parameter as key for each frame. The plot functions will automatically sort the profiles by ascending order parameter. To access the individual profiles one can simply iterate over the sorted profiles (see HOLEtraj.sorted_profiles_iter())

for q, profile in H:
   print "orderparameter = %g" % q
   print "min(R) = %g" % profile.radius.min()

3.3.1.3.1. Data structures

A profile is stored as a numpy.recarray:

frame   rxncoord   radius
frame
integer frame number (only important when HOLE itself reads a trajectory)
rxncoord
the distance along the pore axis, in Å
radius
the pore radius, in Å

The HOLE.profiles or HOLEtraj.profiles dictionary holds one profile for each key. By default the keys are the frame numbers but HOLEtraj can take the optional orderparameters keyword argument and load an arbitrary order parameter for each frame. In that case, the key becomes the orderparameter.

Note

The profiles dict is not ordered and hence one typically needs to manually order the keys first. Furthermore, duplicate keys are not possible: In the case of duplicate orderparameters, the last one read will be stored in the dict.

3.3.1.3.2. Analysis

class MDAnalysis.analysis.hole.HOLE(filename, **kwargs)[source]

Run HOLE on a single frame or a DCD trajectory.

Only a subset of all HOLE control parameters is supported and can be set with keyword arguments.

HOLE (as a FORTRAN77 program) has a number of limitations when it comes to filename lengths (must be shorter than the empirically found HOLE.HOLE_MAX_LENGTH). This class tries to work around them by creating temporary symlinks to files when needed but this can still fail when permissions are not correctly set on the current directory.

Running HOLE with the HOLE class is a 3-step process:

  1. set up the class with all desired parameters
  2. run HOLE with HOLE.run()
  3. collect the data from the output file with HOLE.collect()

The class also provides some simple plotting functions of the collected data such as HOLE.plot() or HOLE.plot3D().

New in version 0.7.7.

Set up parameters to run HOLE on PDB filename.

Parameters:

*filename*

The filename is used as input for HOLE in the “COORD” card of the input file. It specifies the name of a PDB co-ordinate file to be used. This must be in Brookhaven protein databank format or something closely approximating this. Both ATOM and HETATM records are read. Note that if water molecules or ions are present in the channel these can be ignored on read by the use of the ignore_residues keyword.

Wildcard pattern. A new feature (in release 2.1 of HOLE) was the option to include a wild card (*) in the filename. e.g., filename = “ab*.pdb” will apply hole to all files in the directory whose name starts with ab and ends with .pdb. This is intended to aid the analysis of multiple copies of the same molecule - produced during molecular dynamics or other method. The hole procedure will be applied to each file in turn with the same setup conditions (initial point, sampling distance etc.). Graphics files will contain a combination of the individual runs, one after another. Note that the pdb files are read independently so that they need not have an identical number of atoms or atom order etc. (though they should be sufficiently similar for a HOLE run from identical starting conditions to be useful).

See also

An alternative way to load in multiple files is a direct read from a CHARMM binary dynamics DCD coordinate file - using the dcd keyword or use HOLEtraj.

Keywords:
dcd

DCD trajectory (must be supplied together with a matching PDB file filename) and then HOLE runs its analysis on each frame.

It does multiple HOLE runs on positions taken from a CHARMM binary dynamics format .DCD trajectory file. The dcd file must have exactly the same number of atoms in exactly the same order as the pdb file specified by filename. Note that if this option is used the pdb file is used as a template only - the coordinates are ignored. Note that structural parameters determined for each individual structure are written in a tagged format so that it is possible to extract the information from the text output file using a grep command. The reading of the file can be controlled by the step keyword and/or setting HOLE.dcd_iniskip to the number of frames to be skipped initially.

Note

HOLE is very picky and does not read all DCD-like formats. If in doubt, look into the logfile for error diagnostics.

At the moment, DCDs generated with MDAnalysis are not accepted by HOLE — use HOLEtraj, which works with anything that MDAnalysis can read.

logfile

name of the file collecting HOLE’s output (which can be parsed using HOLE.collect() [“hole.out”]

sphpdb

name of the HOLE sph file, a PDB-like file containig the coordinates of the pore centers [“hole.sph”]

step

step size for going through the trajectory (skips step - 1 frames) [1]

cpoint

coordinates of a point inside the pore, e.g. [12.3, 0.7, 18.55]. If None then HOLE’s own simple search algorithm is used.

This specifies a point which lies within the channel, for simple channels such as gramicidin results do not show great sensitivity to the exact point taken. An easy way to produce an initial point is to use molecular graphics to find two atoms which lie either side of the pore and to average their co-ordinates. Or if the channel structure contains water molecules or counter ions then take the coordinates of one of these (and use the ignore_residues keyword to ignore them in the pore radius calculation).

If this card is not specified then HOLE now (from version 2.2) attempts to make a guess where the channel will be. The procedure assumes the channel is reasonably symmetric. The initial guess on cpoint will be the centroid of all alpha carbon atoms (name ‘CA’ in pdb file). This is then refined by a crude grid search up to 5 Å from the original position. This procedure works most of the time but is clearly far from infallible — results should be careful checked (with molecular graphics) if it is used. [None]

cvect

Search direction, should be parallel to the pore axis, e.g. [0,0,1] for the z-axis.

If this keyword is None then HOLE now attempts to make a guess where the channel will be. The procedure assumes the channel is reasonably symmetric. The guess will be either along the X axis (1,0,0), Y axis (0,1,0) or Z axis (0,0,1). If the structure is not aligned on one of these axis the results will clearly be approximate. If a guess is used then results should be carefully checked. [None]

sample

distance of sample points in Å

Specifies the distance between the planes used in the HOLE procedure. The default value is 0.2 Å, this should be reasonable for most purposes. However, if you wish to visualize a very tight constriction then specify a smaller value. [0.2]

dotden

density of facettes for generating a 3D pore representation

This number controls the density of dots which will be used by the program. A sphere of dots is placed on each centre determined in the Monte Carlo procedure. Only dots which do not lie within any other sphere are considered. The actual number of dots written is therefore controlled by dotden and sample. dotden should be set to between 5 (few dots per sphere) and 35 (large number of dots per sphere). [15]

endrad

Radius which is considered to be the end of the pore. This keyword can be used to specify the radius above which the program regards a result as indicating that the end of the pore has been reached. The default value is 22.0 Å. This may need to be increased for large channels or reduced for small. [22.0]

shorto

Determines the output of output in the logfile; for automated processing this must be < 3.

  • 0: Full text output
  • 1: All text output given except “run in progress” (i.e., detailed contemporary description of what HOLE is doing).
  • 2: Ditto plus no graph type output - only leaving minimum radius and conductance calculations.
  • 3: All text output other than input card mirroring and error messages turned off.

ignore_residues

sequence of three-letter residues that are not taken into account during the calculation; wildcards are not supported [ ["SOL","WAT", "TIP", "HOH", "K  ", "NA ", "CL "] ]

radius

Path to the radii; if set to None then a set of default radii, SIMPLE2_RAD, is used (an extension of simple.rad from the HOLE distribution)

This specifies the name for the file specifying van der Waals radii for each atom. A number of files with different values are supplied with HOLE.

sphpdb

This keyword specifies the filename for output of the sphere centre information in pdb form. Its typical suffix is ”.sph”. The co-ordinates are set to the sphere centres and the occupancies are the sphere radii. All centres are assigned the atom name QSS and residue name SPH and the residue number is set to the storage number of the centre. The file can be imported into molecular graphics programs but are likely to be bonded together in a awful manner - as the points are very close to one another. In VMD sph objects are best displayed as “Points”. Displaying .sph objects rather than rendered or dot surfaces can be useful to analyze the distance of particular atoms from the sphere-centre line.

Most usefully .sph files can be used to produce molecular graphical output from a hole run. This is achieved by using the sph_process program to read the .sph file. [“hole.sph”]

executable

Path to the hole executable (e.g. /opt/hole/exe/hole); the other programs sph_process and sos_triangle are assumed to live in the same directory as hole. If hole is found on the PATH then the bare executable name is sufficient. [“hole”]

profiles

After running HOLE.collect(), this dict contains all the HOLE profiles, indexed by the frame number. If only a single frame was analyzed then this will be HOLE.profiles[0]. Note that the order is random; one needs to sort the keys first.

HOLE_MAX_LENGTH = 70

Maximum number of characters in a filename (limitation of HOLE)

__delattr__

x.__delattr__(‘name’) <==> del x.name

__format__()

default object formatter

__getattribute__

x.__getattribute__(‘name’) <==> x.name

__hash__
__reduce__()

helper for pickle

__reduce_ex__()

helper for pickle

__repr__
__setattr__

x.__setattr__(‘name’, value) <==> x.name = value

__sizeof__() → int

size of object in memory, in bytes

__str__
check_and_fix_long_filename(filename, tmpdir='.')[source]

Return filename suitable for HOLE.

HOLE is limited to filenames <= HOLE.HOLE_MAX_LENGTH. This method

  1. returns filename if HOLE can process it

  2. returns a relative path (see os.path.relpath()) if that shortens the path sufficiently

  3. creates a symlink to filename (os.symlink()) in a safe temporary directory and returns the path of the symlink. The temporary directory and the symlink are stored in HOLE.tempfiles and HOLE.tempdirs and deleted when the HOLE instance is deleted or garbage collected.

    By default the temporary directory is created inside the current directory in order to keep that path name short. This can be changed with the tmpdir keyword (e.g. one can use “/tmp”).

collect(**kwargs)[source]

Parse the output from a HOLE run into numpy recarrays.

Can deal with outputs containing multiple frames. Output format:

frame  rxncoord   radius

The method saves the result as HOLE.profiles, a dictionary indexed by the frame number. Each entry is a numpy.recarray.

If the keyword outdir is supplied (e.g. ”.”) then each profile is saved to a gzipped data file.

Keywords:
run

identifier, free form [1]

outdir

save output data under outdir/run if set to any other value but None [None]

create_vmd_surface(filename='hole.vmd', **kwargs)[source]

Process HOLE output to create a smooth pore surface suitable for VMD.

Takes the sphpdb file and feeds it to sph_process and sos_triangle as described under Visualization of HOLE results.

Load the output file filename into VMD by issuing in the tcl console

source hole.vmd

The level of detail is determined by HOLE.dotden (which can be overriden by keyword dotden).

The surface will be colored so that parts that are inaccessible to water (pore radius < 1.15 Å) are red, water accessible parts (1.15 Å > pore radius < 2.30 Å) are green and wide areas (pore radius > 2.30 Å are blue).

default_ignore_residues = ['SOL', 'WAT', 'TIP', 'HOH', 'K ', 'NA ', 'CL ']

List of residues that are ignore by default. Can be changed with the ignore_residues keyword.

min_radius()

Return the minimum radius over all profiles as a function of q

plot(**kwargs)

Plot profiles in a 1D graph.

Lines are coloured according to the colour map cmap.

Keywords:
step

only plot every step profile [1]

cmap

matplotlib color map to continuously color graphs [jet]

color

color or iterable of colors to specify graph colors; None will use cmap instead [None]

linestyle

linestyle supported by matplotlib.pyplot.plot(); an iterable will be applied in turn [‘-‘]

yshift

displace each R(zeta) profile by yshift in the y-direction [0]

frames

integer or list: only plot these specific frame(s); the default is to plot everything (see also step). [None]

label

ignored and replaced with the frame number; if it is set to “_nolegend_” or False then no legend is displayed

ax

matplotlib.axes.Axes instance to plot into [None]

All other kwargs are passed to matplotlib.pyplot.plot().

plot3D(**kwargs)

Stacked graph of profiles.

Lines are coloured according to the colour map cmap.

Keywords:
step

only plot every step profile [1]

cmap

matplotlib color map [jet]

rmax

only display radii up to rmax [None]

ylabel

label of the reaction coordinate axis [“frames”]

Based on Stack Overflow post 3d plots using matplotlib.

run(**kwargs)[source]

Run HOLE on the input file.

save(filename='hole.pickle')

Save profiles as a Python pickle file filename.

Load profiles dictionary with

import cPickle
profiles = cPickle.load(open(filename))
sorted_profiles_iter()

Return an iterator over profiles sorted by frame/order parameter q.

The iterator produces tuples (q, profile).

class MDAnalysis.analysis.hole.HOLEtraj(universe, **kwargs)[source]

Analyze all frames in a trajectory.

The HOLE class provides a direct interface to HOLE. HOLE itself has limited support for analysing trajectories but cannot deal with all the trajectory formats understood by MDAnalysis. This class can take any universe and feed it to HOLE. By default it sequentially creates a PDB for each frame and runs HOLE on the frame.

Set up the class.

Parameters:
  • *universe* – The input trajectory as part of a Universe. trajectory is converted to a sequence of PDB files and HOLE is run on each individual file. (Use the start, stop, and step keywords to slice the trajectory.)
  • *orderparameters* – Sequence or text file with list of numbers corresponding to the frames in the trajectory.
  • stop, step (*start*,) – frame indices to slice the trajectory as universe.trajectory[start, stop, step]
  • *selection* – selection string for select_atoms() to select the group of atoms that is to be analysed by HOLE [“protein”]
  • *cpoint*

    Point inside the pore.

    If set to True then cpoint is guessed as the center_of_geometry() of the selection from the first frame of the trajectory.

    If cpoint is not set or set to None then HOLE guesses it with its own algorithm (for each individual frame). [None]

  • *kwargs* – All other keywords are passed on to HOLE (see there for description).
profiles

After running HOLE.collect(), this dict contains all the HOLE profiles, indexed by the frame number or the order parameter (if orderparameters was supplied). Note that the order is random; one needs to sort the keys first.

__delattr__

x.__delattr__(‘name’) <==> del x.name

__format__()

default object formatter

__getattribute__

x.__getattribute__(‘name’) <==> x.name

__hash__
__reduce__()

helper for pickle

__reduce_ex__()

helper for pickle

__repr__
__setattr__

x.__setattr__(‘name’, value) <==> x.name = value

__sizeof__() → int

size of object in memory, in bytes

__str__
guess_cpoint(**kwargs)[source]

Guess a point inside the pore.

This method simply uses the center of geometry of the protein selection as a guess. selection is “protein” by default.

min_radius()

Return the minimum radius over all profiles as a function of q

plot(**kwargs)

Plot profiles in a 1D graph.

Lines are coloured according to the colour map cmap.

Keywords:
step

only plot every step profile [1]

cmap

matplotlib color map to continuously color graphs [jet]

color

color or iterable of colors to specify graph colors; None will use cmap instead [None]

linestyle

linestyle supported by matplotlib.pyplot.plot(); an iterable will be applied in turn [‘-‘]

yshift

displace each R(zeta) profile by yshift in the y-direction [0]

frames

integer or list: only plot these specific frame(s); the default is to plot everything (see also step). [None]

label

ignored and replaced with the frame number; if it is set to “_nolegend_” or False then no legend is displayed

ax

matplotlib.axes.Axes instance to plot into [None]

All other kwargs are passed to matplotlib.pyplot.plot().

plot3D(**kwargs)

Stacked graph of profiles.

Lines are coloured according to the colour map cmap.

Keywords:
step

only plot every step profile [1]

cmap

matplotlib color map [jet]

rmax

only display radii up to rmax [None]

ylabel

label of the reaction coordinate axis [“frames”]

Based on Stack Overflow post 3d plots using matplotlib.

run(**kwargs)[source]

Run HOLE on the whole trajectory and collect profiles.

Keyword arguments start, stop, and step can be used to only analyse part of the trajectory.

run_hole(pdbfile, **kwargs)[source]

Run HOLE on a single PDB file pdbfile.

save(filename='hole.pickle')

Save profiles as a Python pickle file filename.

Load profiles dictionary with

import cPickle
profiles = cPickle.load(open(filename))
sorted_profiles_iter()

Return an iterator over profiles sorted by frame/order parameter q.

The iterator produces tuples (q, profile).

MDAnalysis.analysis.hole.SIMPLE2_RAD = '\nremark: Time-stamp: <2005-11-21 13:57:55 oliver> [OB]\nremark: van der Waals radii: AMBER united atom\nremark: from Weiner et al. (1984), JACS, vol 106 pp765-768\nremark: Simple - Only use one value for each element C O H etc.\nremark: van der Waals radii\nremark: general last\nVDWR C??? ??? 1.85\nVDWR O??? ??? 1.65\nVDWR S??? ??? 2.00\nVDWR N??? ??? 1.75\nVDWR H??? ??? 1.00\nVDWR H? ??? 1.00\nVDWR P??? ??? 2.10\nremark: ASN, GLN polar H (odd names for these atoms in xplor)\nVDWR E2? GLN 1.00\nVDWR D2? ASN 1.00\nremark: amber lone pairs on sulphurs\nVDWR LP?? ??? 0.00\nremark: for some funny reason it wants radius for K even though\nremark: it is on the exclude list\nremark: Use Pauling hydration radius (Hille 2001) [OB]\nVDWR K? ??? 1.33\nVDWR NA? ??? 0.95\nVDWR CL? ??? 1.81\nremark: funny hydrogens in gA structure [OB]\nVDWR 1H? ??? 1.00\nVDWR 2H? ??? 1.00\nVDWR 3H? ??? 1.00\nremark: need bond rad for molqpt option\nBOND C??? 0.85\nBOND N??? 0.75\nBOND O??? 0.7\nBOND S??? 1.1\nBOND H??? 0.5\nBOND P??? 1.0\nBOND ???? 0.85\n'

Built-in HOLE radii (based on simple.rad from the HOLE distribution): van der Waals radii are AMBER united atom from Weiner et al. (1984), JACS, vol 106 pp765-768. Simple - Only use one value for each element C O H etc. Added radii for K+, NA+, CL- (Pauling hydration radius from Hille 2002). The data file can be written with the convenience function write_simplerad2().

3.3.1.3.3. Utilities

MDAnalysis.analysis.hole.write_simplerad2(filename='simple2.rad')[source]

Write the built-in radii in SIMPLE2_RAD to filename.

Does nothing if filename already exists.

MDAnalysis.analysis.hole.seq2str(v)[source]

Return sequence v as a string of numbers with spaces as separators.

In the special case of None, the empty string “” is returned.

class MDAnalysis.analysis.hole.ApplicationError[source]

Raised when an external application failed.

The error code is specific for the application.

New in version 0.7.7.