3.3.1. Generation and Analysis of HOLE pore profiles — MDAnalysis.analysis.hole
¶
Author: | Lukas Stelzl, Oliver Beckstein |
---|---|
Year: | 2011-2012 |
Copyright: | GNU Public License v2 |
With the help of this module, HOLE can be run on frames in a trajectory. Data can be combined and analyzed. HOLE [Smart1993] [Smart1996] must be installed separately.
References
[Smart1993] | O.S. Smart, J.M. Goodfellow and B.A. Wallace. The Pore Dimensions of Gramicidin A. Biophysical Journal 65:2455-2460, 1993. |
[Smart1996] | O.S. Smart, J.G. Neduvelil, X. Wang, B.A. Wallace, and M.S.P. Sansom. HOLE: A program for the analysis of the pore dimensions of ion channel structural models. J.Mol.Graph., 14:354–360, 1996. URL http://hole.biop.ox.ac.uk/hole. |
3.3.1.1. Examples¶
3.3.1.1.1. Single structure¶
Gramicidin A (gA) channel:
from MDAnalysis.analysis.hole import HOLE, HOLEtraj
from MDAnalysis.tests.datafiles import PDB_HOLE
H = HOLE(PDB_HOLE, executable="~/hole2/bin/hole") # set path to your hole binary
H.run()
H.collect()
H.plot(linewidth=3, color="black", label=False)
3.3.1.1.2. Trajectory¶
Analyzing a trajectory:
u = MDAnalysis.Universe(psf, trajectory)
H = HOLEtraj(u, ...)
H.run()
H.plot3D()
The profiles are available as the attribute HOLEtraj.profiles
(H.profiles
in the example) and are indexed by frame number but
can also be indexed by an arbitrary order parameter as shown in the
next example.
3.3.1.1.3. Trajectory with RMSD as order parameter¶
In order to classify the HOLE profiles the RMSD to a reference
structure is calculated for each trajectory frame (e.g. using the
MDAnalysis.analysis.rms.RMSD
analysis class). Then the HOLE
profiles can be ordered by the RMSD, which acts as an order
parameter.
import MDAnalysis.analysis.hole
import MDAnalysis.analysis.rms
MDAnalysis.start_logging()
ref = MDAnalysis.Universe("ref.psf", "ref.pdb") # reference structure
u = MDAnalysis.Universe("ref.psf", "traj.xtc") # trajectory
# calculate RMSD
R = MDAnalysis.analysis.rms.RMSD(u, reference=ref, select="protein", mass_weighted=True)
R.run()
# HOLE analysis with order parameters
H = MDAnalysis.analysis.hole.HOLEtraj(u, cvect=[0,0,1], orderparameters=R.rmsd[:,2])
H.run()
The HOLEtraj.profiles
dictionary will have the order parameter as key
for each frame. The plot functions will automatically sort the profiles by
ascending order parameter. To access the individual profiles one can simply
iterate over the sorted profiles (see HOLEtraj.sorted_profiles_iter()
)
for q, profile in H:
print "orderparameter = %g" % q
print "min(R) = %g" % profile.radius.min()
3.3.1.2. Data structures¶
A profile is stored as a numpy.recarray
:
frame rxncoord radius
- frame
- integer frame number (only important when HOLE itself reads a trajectory)
- rxncoord
- the distance along the pore axis, in Å
- radius
- the pore radius, in Å
The HOLE.profiles
or HOLEtraj.profiles
dictionary
holds one profile for each key. By default the keys are the frame
numbers but HOLEtraj
can take the optional orderparameters
keyword argument and load an arbitrary order parameter for each
frame. In that case, the key becomes the orderparameter.
Note
The profiles dict is not ordered and hence one typically needs to manually order the keys first. Furthermore, duplicate keys are not possible: In the case of duplicate orderparameters, the last one read will be stored in the dict.
3.3.1.3. Analysis¶
-
class
MDAnalysis.analysis.hole.
HOLE
(filename, **kwargs)[source]¶ Run HOLE on a single frame or a DCD trajectory.
Only a subset of all HOLE control parameters is supported and can be set with keyword arguments.
HOLE (as a FORTRAN77 program) has a number of limitations when it comes to filename lengths (must be shorter than the empirically found
HOLE.HOLE_MAX_LENGTH
). This class tries to work around them by creating temporary symlinks to files when needed but this can still fail when permissions are not correctly set on the current directory.Running HOLE with the
HOLE
class is a 3-step process:- set up the class with all desired parameters
- run HOLE with
HOLE.run()
- collect the data from the output file with
HOLE.collect()
The class also provides some simple plotting functions of the collected data such as
HOLE.plot()
orHOLE.plot3D()
.New in version 0.7.7.
Set up parameters to run HOLE on PDB filename.
Arguments: filename
The filename is used as input for HOLE in the “COORD” card of the input file. It specifies the name of a PDB co-ordinate file to be used. This must be in Brookhaven protein databank format or something closely approximating this. Both ATOM and HETATM records are read. Note that if water molecules or ions are present in the channel these can be ignored on read by the use of the ignore_residues keyword.
Wildcard pattern. A new feature (in release 2.1 of HOLE) was the option to include a wild card (
*
) in the filename. e.g., filename = “ab*.pdb” will apply hole to all files in the directory whose name starts withab
and ends with.pdb
. This is intended to aid the analysis of multiple copies of the same molecule - produced during molecular dynamics or other method. The hole procedure will be applied to each file in turn with the same setup conditions (initial point, sampling distance etc.). Graphics files will contain a combination of the individual runs, one after another. Note that the pdb files are read independently so that they need not have an identical number of atoms or atom order etc. (though they should be sufficiently similar for a HOLE run from identical starting conditions to be useful).See also
An alternative way to load in multiple files is a direct read from a CHARMM binary dynamics DCD coordinate file - using the dcd keyword or use
HOLEtraj
.Keywords: - dcd
DCD trajectory (must be supplied together with a matching PDB file filename) and then HOLE runs its analysis on each frame.
It does multiple HOLE runs on positions taken from a CHARMM binary dynamics format .DCD trajectory file. The dcd file must have exactly the same number of atoms in exactly the same order as the pdb file specified by filename. Note that if this option is used the pdb file is used as a template only - the coordinates are ignored. Note that structural parameters determined for each individual structure are written in a tagged format so that it is possible to extract the information from the text output file using a grep command. The reading of the file can be controlled by the step keyword and/or setting
HOLE.dcd_iniskip
to the number of frames to be skipped initially.Note
HOLE is very picky and does not read all DCD-like formats. If in doubt, look into the logfile for error diagnostics.
At the moment, DCDs generated with MDAnalysis are not accepted by HOLE — use
HOLEtraj
, which works with anything that MDAnalysis can read.
logfile
name of the file collecting HOLE’s output (which can be parsed using
HOLE.collect()
[“hole.out”]sphpdb
name of the HOLE sph file, a PDB-like file containig the coordinates of the pore centers [“hole.sph”]
step
step size for going through the trajectory (skips step - 1 frames) [1]
cpoint
coordinates of a point inside the pore, e.g.
[12.3, 0.7, 18.55]
. IfNone
then HOLE’s own simple search algorithm is used.This specifies a point which lies within the channel, for simple channels such as gramicidin results do not show great sensitivity to the exact point taken. An easy way to produce an initial point is to use molecular graphics to find two atoms which lie either side of the pore and to average their co-ordinates. Or if the channel structure contains water molecules or counter ions then take the coordinates of one of these (and use the ignore_residues keyword to ignore them in the pore radius calculation).
If this card is not specified then HOLE now (from version 2.2) attempts to make a guess where the channel will be. The procedure assumes the channel is reasonably symmetric. The initial guess on cpoint will be the centroid of all alpha carbon atoms (name ‘CA’ in pdb file). This is then refined by a crude grid search up to 5 Å from the original position. This procedure works most of the time but is clearly far from infallible — results should be careful checked (with molecular graphics) if it is used. [
None
]cvect
Search direction, should be parallel to the pore axis, e.g.
[0,0,1]
for the z-axis.If this keyword is
None
then HOLE now attempts to make a guess where the channel will be. The procedure assumes the channel is reasonably symmetric. The guess will be either along the X axis (1,0,0), Y axis (0,1,0) or Z axis (0,0,1). If the structure is not aligned on one of these axis the results will clearly be approximate. If a guess is used then results should be carefully checked. [None
]- sample
distance of sample points in Å
Specifies the distance between the planes used in the HOLE procedure. The default value is 0.2 Å, this should be reasonable for most purposes. However, if you wish to visualize a very tight constriction then specify a smaller value. [0.2]
- dotden
density of facettes for generating a 3D pore representation
This number controls the density of dots which will be used by the program. A sphere of dots is placed on each centre determined in the Monte Carlo procedure. Only dots which do not lie within any other sphere are considered. The actual number of dots written is therefore controlled by dotden and sample. dotden should be set to between 5 (few dots per sphere) and 35 (large number of dots per sphere). [15]
- endrad
Radius which is considered to be the end of the pore. This keyword can be used to specify the radius above which the program regards a result as indicating that the end of the pore has been reached. The default value is 22.0 Å. This may need to be increased for large channels or reduced for small. [22.0]
shorto
Determines the output of output in the logfile; for automated processing this must be < 3.
- 0: Full text output
- 1: All text output given except “run in progress” (i.e., detailed contemporary description of what HOLE is doing).
- 2: Ditto plus no graph type output - only leaving minimum radius and conductance calculations.
- 3: All text output other than input card mirroring and error messages turned off.
ignore_residues
sequence of three-letter residues that are not taken into account during the calculation; wildcards are not supported [
["SOL","WAT", "TIP", "HOH", "K ", "NA ", "CL "]
]- radius
Path to the radii; if set to None then a set of default radii,
SIMPLE2_RAD
, is used (an extension ofsimple.rad
from the HOLE distribution)This specifies the name for the file specifying van der Waals radii for each atom. A number of files with different values are supplied with HOLE.
sphpdb
This keyword specifies the filename for output of the sphere centre information in pdb form. Its typical suffix is ”.sph”. The co-ordinates are set to the sphere centres and the occupancies are the sphere radii. All centres are assigned the atom name QSS and residue name SPH and the residue number is set to the storage number of the centre. The file can be imported into molecular graphics programs but are likely to be bonded together in a awful manner - as the points are very close to one another. In VMD sph objects are best displayed as “Points”. Displaying .sph objects rather than rendered or dot surfaces can be useful to analyze the distance of particular atoms from the sphere-centre line.
Most usefully .sph files can be used to produce molecular graphical output from a hole run. This is achieved by using the sph_process program to read the .sph file. [“hole.sph”]
executable
Path to the hole executable (e.g.
/opt/hole/exe/hole
); the other programs sph_process and sos_triangle are assumed to live in the same directory as hole. If hole is found on thePATH
then the bare executable name is sufficient. [“hole”]-
profiles
¶ After running
HOLE.collect()
, this dict contains all the HOLE profiles, indexed by the frame number. If only a single frame was analyzed then this will beHOLE.profiles[0]
. Note that the order is random; one needs to sort the keys first.
-
HOLE_MAX_LENGTH
= 70¶ Maximum number of characters in a filename (limitation of HOLE)
-
check_and_fix_long_filename
(filename, tmpdir='.')[source]¶ Return filename suitable for HOLE.
HOLE is limited to filenames <=
HOLE.HOLE_MAX_LENGTH
. This methodreturns filename if HOLE can process it
returns a relative path (see
os.path.relpath()
) if that shortens the path sufficientlycreates a symlink to filename (
os.symlink()
) in a safe temporary directory and returns the path of the symlink. The temporary directory and the symlink are stored inHOLE.tempfiles
andHOLE.tempdirs
and deleted when theHOLE
instance is deleted or garbage collected.By default the temporary directory is created inside the current directory in order to keep that path name short. This can be changed with the tmpdir keyword (e.g. one can use “/tmp”).
-
collect
(**kwargs)[source]¶ Parse the output from a HOLE run into numpy recarrays.
Can deal with outputs containing multiple frames. Output format:
frame rxncoord radius
The method saves the result as
HOLE.profiles
, a dictionary indexed by the frame number. Each entry is anumpy.recarray
.If the keyword outdir is supplied (e.g. ”.”) then each profile is saved to a gzipped data file.
Keywords: - run
identifier, free form [1]
- outdir
save output data under outdir/run if set to any other value but
None
[None
]
-
create_vmd_surface
(filename='hole.vmd', **kwargs)[source]¶ Process HOLE output to create a smooth pore surface suitable for VMD.
Takes the
sphpdb
file and feeds it to sph_process and sos_triangle as described under Visualization of HOLE results.Load the output file filename into VMD by issuing in the tcl console
source hole.vmd
The level of detail is determined by
HOLE.dotden
(which can be overriden by keyword dotden).The surface will be colored so that parts that are inaccessible to water (pore radius < 1.15 Å) are red, water accessible parts (1.15 Å > pore radius < 2.30 Å) are green and wide areas (pore radius > 2.30 Å are blue).
-
default_ignore_residues
= ['SOL', 'WAT', 'TIP', 'HOH', 'K ', 'NA ', 'CL ']¶ List of residues that are ignore by default. Can be changed with the ignore_residues keyword.
-
min_radius
()¶ Return the minimum radius over all profiles as a function of q
-
plot
(**kwargs)¶ Plot profiles in a 1D graph.
Lines are coloured according to the colour map cmap.
Keywords: - step
only plot every step profile [1]
- cmap
matplotlib color map to continuously color graphs [jet]
- color
color or iterable of colors to specify graph colors;
None
will use cmap instead [None
]- linestyle
linestyle supported by
matplotlib.pyplot.plot()
; an iterable will be applied in turn [‘-‘]- yshift
displace each R(zeta) profile by yshift in the y-direction [0]
- frames
integer or list: only plot these specific frame(s); the default is to plot everything (see also step). [
None
]- label
ignored and replaced with the frame number; if it is set to “_nolegend_” or
False
then no legend is displayed- ax
matplotlib.axes.Axes
instance to plot into [None
]
All other kwargs are passed to
matplotlib.pyplot.plot()
.
-
plot3D
(**kwargs)¶ Stacked graph of profiles.
Lines are coloured according to the colour map cmap.
Keywords: - step
only plot every step profile [1]
- cmap
matplotlib color map [jet]
- rmax
only display radii up to rmax [
None
]- ylabel
label of the reaction coordinate axis [“frames”]
Based on Stack Overflow post 3d plots using matplotlib.
-
save
(filename='hole.pickle')¶ Save
profiles
as a Python pickle file filename.Load profiles dictionary with
import cPickle profiles = cPickle.load(open(filename))
-
sorted_profiles_iter
()¶ Return an iterator over profiles sorted by frame/order parameter q.
The iterator produces tuples
(q, profile)
.
-
class
MDAnalysis.analysis.hole.
HOLEtraj
(universe, **kwargs)[source]¶ Analyze all frames in a trajectory.
The
HOLE
class provides a direct interface to HOLE. HOLE itself has limited support for analysing trajectories but cannot deal with all the trajectory formats understood by MDAnalysis. This class can take any universe and feed it to HOLE. By default it sequentially creates a PDB for each frame and runs HOLE on the frame.Set up the class.
Arguments: - universe
The input trajectory as part of a
Universe
. trajectory is converted to a sequence of PDB files and HOLE is run on each individual file. (Use the start, stop, and step keywords to slice the trajectory.)- orderparameters
Sequence or text file with list of numbers corresponding to the frames in the trajectory.
- start, stop, step
frame indices to slice the trajectory as
universe.trajectory[start, stop, step]
- selection
selection string for
selectAtoms()
to select the group of atoms that is to be analysed by HOLE [“protein”]- cpoint
Point inside the pore.
If set to
True
then cpoint is guessed as thecenterOfGeometry()
of the selection from the first frame of the trajectory.If cpoint is not set or set to
None
then HOLE guesses it with its own algorithm (for each individual frame). [None
]- kwargs
All other keywords are passed on to
HOLE
(see there for description).
-
profiles
¶ After running
HOLE.collect()
, this dict contains all the HOLE profiles, indexed by the frame number or the order parameter (if orderparameters was supplied). Note that the order is random; one needs to sort the keys first.
-
guess_cpoint
(**kwargs)[source]¶ Guess a point inside the pore.
This method simply uses the center of geometry of the protein selection as a guess. selection is “protein” by default.
-
min_radius
()¶ Return the minimum radius over all profiles as a function of q
-
plot
(**kwargs)¶ Plot profiles in a 1D graph.
Lines are coloured according to the colour map cmap.
Keywords: - step
only plot every step profile [1]
- cmap
matplotlib color map to continuously color graphs [jet]
- color
color or iterable of colors to specify graph colors;
None
will use cmap instead [None
]- linestyle
linestyle supported by
matplotlib.pyplot.plot()
; an iterable will be applied in turn [‘-‘]- yshift
displace each R(zeta) profile by yshift in the y-direction [0]
- frames
integer or list: only plot these specific frame(s); the default is to plot everything (see also step). [
None
]- label
ignored and replaced with the frame number; if it is set to “_nolegend_” or
False
then no legend is displayed- ax
matplotlib.axes.Axes
instance to plot into [None
]
All other kwargs are passed to
matplotlib.pyplot.plot()
.
-
plot3D
(**kwargs)¶ Stacked graph of profiles.
Lines are coloured according to the colour map cmap.
Keywords: - step
only plot every step profile [1]
- cmap
matplotlib color map [jet]
- rmax
only display radii up to rmax [
None
]- ylabel
label of the reaction coordinate axis [“frames”]
Based on Stack Overflow post 3d plots using matplotlib.
-
run
(**kwargs)[source]¶ Run HOLE on the whole trajectory and collect profiles.
Keyword arguments start, stop, and step can be used to only analyse part of the trajectory.
-
save
(filename='hole.pickle')¶ Save
profiles
as a Python pickle file filename.Load profiles dictionary with
import cPickle profiles = cPickle.load(open(filename))
-
sorted_profiles_iter
()¶ Return an iterator over profiles sorted by frame/order parameter q.
The iterator produces tuples
(q, profile)
.
-
MDAnalysis.analysis.hole.
SIMPLE2_RAD
= '\nremark: Time-stamp: <2005-11-21 13:57:55 oliver> [OB]\nremark: van der Waals radii: AMBER united atom\nremark: from Weiner et al. (1984), JACS, vol 106 pp765-768\nremark: Simple - Only use one value for each element C O H etc.\nremark: van der Waals radii\nremark: general last\nVDWR C??? ??? 1.85\nVDWR O??? ??? 1.65\nVDWR S??? ??? 2.00\nVDWR N??? ??? 1.75\nVDWR H??? ??? 1.00\nVDWR H? ??? 1.00\nVDWR P??? ??? 2.10\nremark: ASN, GLN polar H (odd names for these atoms in xplor)\nVDWR E2? GLN 1.00\nVDWR D2? ASN 1.00\nremark: amber lone pairs on sulphurs\nVDWR LP?? ??? 0.00\nremark: for some funny reason it wants radius for K even though\nremark: it is on the exclude list\nremark: Use Pauling hydration radius (Hille 2001) [OB]\nVDWR K? ??? 1.33\nVDWR NA? ??? 0.95\nVDWR CL? ??? 1.81\nremark: funny hydrogens in gA structure [OB]\nVDWR 1H? ??? 1.00\nVDWR 2H? ??? 1.00\nVDWR 3H? ??? 1.00\nremark: need bond rad for molqpt option\nBOND C??? 0.85\nBOND N??? 0.75\nBOND O??? 0.7\nBOND S??? 1.1\nBOND H??? 0.5\nBOND P??? 1.0\nBOND ???? 0.85\n'¶ Built-in HOLE radii (based on
simple.rad
from the HOLE distribution): van der Waals radii are AMBER united atom from Weiner et al. (1984), JACS, vol 106 pp765-768. Simple - Only use one value for each element C O H etc. Added radii for K+, NA+, CL- (Pauling hydration radius from Hille 2002). The data file can be written with the convenience functionwrite_simplerad2()
.
3.3.1.4. Utilities¶
-
MDAnalysis.analysis.hole.
write_simplerad2
(filename='simple2.rad')[source]¶ Write the built-in radii in
SIMPLE2_RAD
to filename.Does nothing if filename already exists.