3.2.1. Hydrogen Bond analysis — MDAnalysis.analysis.hbonds.hbond_analysis

Author:David Caplan, Lukas Grossar, Oliver Beckstein
Year:2010-2012
Copyright:GNU Public License v3

Given a Universe (simulation trajectory with 1 or more frames) measure all hydrogen bonds for each frame between selections 1 and 2.

The HydrogenBondAnalysis class is modeled after the VMD HBONDS plugin.

Options:
  • update_selections (True): update selections at each frame?
  • selection_1_type (“both”): selection 1 is the: “donor”, “acceptor”, “both”
  • donor-acceptor distance (Å): 3.0
  • Angle cutoff (degrees): 120.0
  • forcefield to switch between default values for different force fields
  • donors and acceptors atom types (to add additional atom names)

3.2.1.1. Output

The results are
  • the identities of donor and acceptor heavy-atoms,
  • the distance between the heavy atom acceptor atom and the hydrogen atom that is bonded to the heavy atom donor,
  • the angle donor-hydrogen-acceptor angle (180º is linear).

Hydrogen bond data are returned per frame, which is stored in HydrogenBondAnalysis.timeseries (In the following description, # indicates comments that are not part of the output.):

results = [
    [ # frame 1
       [ # hbond 1
          <donor index>, <acceptor index>, <donor string>, <acceptor string>, <distance>, <angle>
       ],
       [ # hbond 2
          <donor index>, <acceptor index>, <donor string>, <acceptor string>, <distance>, <angle>
       ],
       ....
    ],
    [ # frame 2
      [ ... ], [ ... ], ...
    ],
    ...
]

Note

For historic reasons, the donor index and acceptor index are a 1-based indices. To get the Atom.index (the 0-based index typically used in MDAnalysis simply subtract 1. For instance, to find an atom in Universe.atoms by index from the output one would use u.atoms[index-1].

Using the HydrogenBondAnalysis.generate_table() method one can reformat the results as a flat “normalised” table that is easier to import into a database for further processing. HydrogenBondAnalysis.save_table() saves the table to a pickled file. The table itself is a numpy.recarray.

3.2.1.2. Detection of hydrogen bonds

Hydrogen bonds are recorded based on a geometric criterion:

  1. The distance between acceptor and hydrogen is less than or equal to distance (default is 3 Å).
  2. The angle between donor-hydrogen-acceptor is greater than or equal to angle (default is 120º).

The cut-off values angle and distance can be set as keywords to HydrogenBondAnalysis.

Donor and acceptor heavy atoms are detected from atom names. The current defaults are appropriate for the CHARMM27 and GLYCAM06 force fields as defined in Table Default atom names for hydrogen bonding analysis.

Hydrogen atoms bonded to a donor are searched with one of two algorithms, selected with the detect_hydrogens keyword.

distance

Searches for all hydrogens (name “H*” or name “[123]H” or type “H”) in the same residue as the donor atom within a cut-off distance of 1.2 Å.

heuristic

Looks at the next three atoms in the list of atoms following the donor and selects any atom whose name matches (name “H*” or name “[123]H”). For

The distance search is more rigorous but slower and is set as the default. Until release 0.7.6, only the heuristic search was implemented.

Changed in version 0.7.6: Distance search added (see HydrogenBondAnalysis._get_bonded_hydrogens_dist()) and heuristic search improved (HydrogenBondAnalysis._get_bonded_hydrogens_list())

Default heavy atom names for CHARMM27 force field.
group donor acceptor comments
main chain N O  
water OH2, OW OH2, OW SPC, TIP3P, TIP4P (CHARMM27,Gromacs)
ARG NE, NH1, NH2    
ASN ND2 OD1  
ASP   OD1, OD2  
CYS SG    
CYH   SG possible false positives for CYS
GLN NE2 OE1  
GLU   OE1, OE2  
HIS ND1, NE2 ND1, NE2 presence of H determines if donor
HSD ND1 NE2  
HSE NE2 ND1  
HSP ND1, NE2    
LYS NZ    
MET   SD see e.g. [Gregoret1991]
SER OG OG  
THR OG1 OG1  
TRP NE1    
TYR OH OH  
Heavy atom types for GLYCAM06 force field.
element donor acceptor
N N,NT,N3 N,NT
O OH,OW O,O2,OH,OS,OW,OY
S   SM

Donor and acceptor names for the CHARMM27 force field will also work for e.g. OPLS/AA (tested in Gromacs). Residue names in the table are for information only and are not taken into account when determining acceptors and donors. This can potentially lead to some ambiguity in the assignment of donors/acceptors for residues such as histidine or cytosine.

For more information about the naming convention in GLYCAM06 have a look at the Carbohydrate Naming Convention in Glycam.

The lists of donor and acceptor names can be extended by providing lists of atom names in the donors and acceptors keywords to HydrogenBondAnalysis. If the lists are entirely inappropriate (e.g. when analysing simulations done with a force field that uses very different atom names) then one should either use the value “other” for forcefield to set no default values, or derive a new class and set the default list oneself:

class HydrogenBondAnalysis_OtherFF(HydrogenBondAnalysis):
      DEFAULT_DONORS = {"OtherFF": tuple(set([...]))}
      DEFAULT_ACCEPTORS = {"OtherFF": tuple(set([...]))}

Then simply use the new class instead of the parent class and call it with forcefield = “OtherFF”. Please also consider to contribute the list of heavy atom names to MDAnalysis.

References

[Gregoret1991]L.M. Gregoret, S.D. Rader, R.J. Fletterick, and F.E. Cohen. Hydrogen bonds involving sulfur atoms in proteins. Proteins, 9(2):99–107, 1991. 10.1002/prot.340090204.

3.2.1.3. Example

All protein-water hydrogen bonds can be analysed with

import MDAnalysis.analysis.hbonds

u = MDAnalysis.Universe(PSF, PDB, permissive=True)
h = MDAnalysis.analysis.hbonds.HydrogenBondAnalysis(u, 'protein', 'resname TIP3', distance=3.0, angle=120.0)
h.run()

The results are stored as the attribute HydrogenBondAnalysis.timeseries; see Output for the format and further options.

Note

Due to the way HydrogenBondAnalysis is implemented, it is more efficient to have the second selection (selection2) be the larger group, e.g. the water when looking at water-protein H-bonds or the whole protein when looking at ligand-protein interactions.

3.2.1.4. Classes

class MDAnalysis.analysis.hbonds.hbond_analysis.HydrogenBondAnalysis(universe, selection1='protein', selection2='all', selection1_type='both', update_selection1=True, update_selection2=True, filter_first=True, distance_type='hydrogen', distance=3.0, angle=120.0, forcefield='CHARMM27', donors=None, acceptors=None, start=None, stop=None, step=None, verbose=False, detect_hydrogens='distance')[source]

Perform a hydrogen bond analysis

The analysis of the trajectory is performed with the HydrogenBondAnalysis.run() method. The result is stored in HydrogenBondAnalysis.timeseries. See run() for the format.

The default atom names are taken from the CHARMM 27 force field files, which will also work for e.g. OPLS/AA in Gromacs, and GLYCAM06.

Donors (associated hydrogens are deduced from topology)
CHARMM 27
N of the main chain, water OH2/OW, ARG NE/NH1/NH2, ASN ND2, HIS ND1/NE2, SER OG, TYR OH, CYS SG, THR OG1, GLN NE2, LYS NZ, TRP NE1
GLYCAM06
N,NT,N3,OH,OW
Acceptors
CHARMM 27
O of the main chain, water OH2/OW, ASN OD1, ASP OD1/OD2, CYH SG, GLN OE1, GLU OE1/OE2, HIS ND1/NE2, MET SD, SER OG, THR OG1, TYR OH
GLYCAM06
N,NT,O,O2,OH,OS,OW,OY,P,S,SM

Changed in version 0.7.6: DEFAULT_DONORS/ACCEPTORS is now embedded in a dict to switch between default values for different force fields.

Set up calculation of hydrogen bonds between two selections in a universe.

The timeseries is accessible as the attribute HydrogenBondAnalysis.timeseries.

Some initial checks are performed. If there are no atoms selected by selection1 or selection2 or if no donor hydrogens or acceptor atoms are found then a SelectionError is raised for any selection that does not update (update_selection1 and update_selection2 keywords). For selections that are set to update, only a warning is logged because it is assumed that the selection might contain atoms at a later frame (e.g. for distance based selections).

If no hydrogen bonds are detected or if the initial check fails, look at the log output (enable with MDAnalysis.start_logging() and set verbose = True). It is likely that the default names for donors and acceptors are not suitable (especially for non-standard ligands). In this case, either change the forcefield or use customized donors and/or acceptors.

Note

In order to speed up processing, atoms are filtered by a coarse distance criterion before a detailed hydrogen bonding analysis is performed (filter_first = True). If one of your selections is e.g. the solvent then update_selection1 (or update_selection2) must also be True so that the list of candidate atoms is updated at each step: this is now the default.

If your selections will essentially remain the same for all time steps (i.e. residues are not moving farther than 3 x distance), for instance, if no water or large conformational changes are involved or if the optimization is disabled (filter_first = False) then you can improve performance by setting the update_selection keywords to False.

Arguments:
universe

Universe object

selection1

Selection string for first selection [‘protein’]

selection2

Selection string for second selection [‘all’]

selection1_type

Selection 1 can be ‘donor’, ‘acceptor’ or ‘both’. Note that the value for selection1_type automatically determines how selection2 handles donors and acceptors: If selection1 contains ‘both’ then selection2 will also contain both. If selection1 is set to ‘donor’ then selection2 is ‘acceptor’ (and vice versa). [‘both’].

update_selection1

Update selection 1 at each frame? [False]

update_selection2

Update selection 2 at each frame? [False]

filter_first

Filter selection 2 first to only atoms 3*distance away [True]

distance

Distance cutoff for hydrogen bonds; only interactions with a H-A distance <= distance (and the appropriate D-H-A angle, see angle) are recorded. (Note: distance_type can change this to the D-A distance.) [3.0]

angle

Angle cutoff for hydrogen bonds; an ideal H-bond has an angle of 180º. A hydrogen bond is only recorded if the D-H-A angle is >= angle. The default of 120º also finds fairly non-specific hydrogen interactions and a possibly better value is 150º. [120.0]

forcefield

Name of the forcefield used. Switches between different DEFAULT_DONORS and DEFAULT_ACCEPTORS values. Available values: “CHARMM27”, “GLYCAM06”, “other” [“CHARMM27”]

donors

Extra H donor atom types (in addition to those in DEFAULT_DONORS), must be a sequence.

acceptors

Extra H acceptor atom types (in addition to those in DEFAULT_ACCEPTORS), must be a sequence.

start

starting frame-index for analysis, None is the first one, 0. start and stop are 0-based frame indices and are used to slice the trajectory (if supported) [None]

stop

last trajectory frame for analysis, None is the last one [None]

step

read every step between start and stop, None selects 1. Note that not all trajectory readers perform well with a step different from 1 [None]

verbose

If set to True enables per-frame debug logging. This is disabled by default because it generates a very large amount of output in the log file. (Note that a logger must have been started to see the output, e.g. using MDAnalysis.start_logging().)

detect_hydrogens

Determine the algorithm to find hydrogens connected to donor atoms. Can be “distance” (default; finds all hydrogens in the donor’s residue within a cutoff of the donor) or “heuristic” (looks for the next few atoms in the atom list). “distance” should always give the correct answer but “heuristic” is faster, especially when the donor list is updated each for each frame. [“distance”]

distance_type

Measure hydrogen bond lengths between donor and acceptor heavy attoms (“heavy”) or between donor hydrogen and acceptor heavy atom (“hydrogen”). If using “heavy” then one should set the distance cutoff to a higher value such as 3.5 Å. [“hydrogen”]

Raises:

SelectionError is raised for each static selection without the required donors and/or acceptors.

Changed in version 0.7.6: New verbose keyword (and per-frame debug logging disabled by default).

New detect_hydrogens keyword to switch between two different algorithms to detect hydrogens bonded to donor. “distance” is a new, rigorous distance search within the residue of the donor atom, “heuristic” is the previous list scan (improved with an additional distance check).

New forcefield keyword to switch between different values of DEFAULT_DONORS/ACCEPTORS to accomodate different force fields. Also has an option “other” for no default values.

Changed in version 0.8: The new default for update_selection1 and update_selection2 is now True (see Issue 138). Set to False if your selections only need to be determined once (will increase performance).

Changed in version 0.9.0: New keyword distance_type to select between calculation between heavy atoms or hydrogen-acceptor. It defaults to the previous behavior (i.e. “hydrogen”).

Changed in version 0.11.0: Initial checks for selections that potentially raise SelectionError.

timesteps

List of the times of each timestep. This can be used together with timeseries to find the specific time point of a hydrogen bond existence, or see table.

timeseries

Results of the hydrogen bond analysis, stored for each frame. In the following description, # indicates comments that are not part of the output:

results = [
    [ # frame 1
       [ # hbond 1
          <donor index>, <acceptor index>, <donor string>, <acceptor string>, <distance>, <angle>
       ],
       [ # hbond 2
          <donor index>, <acceptor index>, <donor string>, <acceptor string>, <distance>, <angle>
       ],
       ....
    ],
    [ # frame 2
      [ ... ], [ ... ], ...
    ],
    ...
]

The time of each step is not stored with each hydrogen bond frame but in timesteps.

Note

The index is a 1-based index. To get the Atom.index (the 0-based index typically used in MDAnalysis simply subtract 1. For instance, to find an atom in Universe.atoms by index one would use u.atoms[index-1].

table

A normalised table of the data in HydrogenBondAnalysis.timeseries, generated by HydrogenBondAnalysis.generate_table(). It is a numpy.recarray with the following columns:

  1. “time”
  2. “donor_idx”
  3. “acceptor_idx”
  4. “donor_resnm”
  5. “donor_resid”
  6. “donor_atom”
  7. “acceptor_resnm”
  8. “acceptor_resid”
  9. “acceptor_atom”
  10. “distance”
  11. “angle”

It takes up more space than timeseries but it is easier to analyze and to import into databases (e.g. using recsql).

Note

The index is a 1-based index. To get the Atom.index (the 0-based index typically used in MDAnalysis simply subtract 1. For instance, to find an atom in Universe.atoms by index one would use u.atoms[index-1].

_get_bonded_hydrogens(atom, **kwargs)[source]

Find hydrogens bonded to atom.

This method is typically not called by a user but it is documented to facilitate understanding of the internals of HydrogenBondAnalysis.

Returns:list of hydrogens (can be a AtomGroup) or empty list [] if none were found.

Changed in version 0.7.6: Can switch algorithm by using the detect_hydrogens keyword to the constructor. kwargs can be used to supply arguments for algorithm.

_get_bonded_hydrogens_dist(atom)[source]

Find hydrogens bonded within cutoff to atom.

  • hydrogens are detected by either name (“H*”, “[123]H*”) or type (“H”); this is not fool-proof as the atom type is not always a character but the name pattern should catch most typical occurrences.
  • The distance from atom is calculated for all hydrogens in the residue and only those within a cutoff are kept. The cutoff depends on the heavy atom (more precisely, on its element, which is taken as the first letter of its name atom.name[0]) and is parameterized in HydrogenBondAnalysis.r_cov. If no match is found then the default of 1.5 Å is used.

The performance of this implementation could be improved once the topology always contains bonded information; it currently uses the selection parser with an “around” selection.

New in version 0.7.6.

_get_bonded_hydrogens_list(atom, **kwargs)[source]

Find “bonded” hydrogens to the donor atom.

At the moment this relies on the assumption that the hydrogens are listed directly after the heavy atom in the topology. If this is not the case then this function will fail.

Hydrogens are detected by name H*, [123]H* and they have to be within a maximum distance from the heavy atom. The cutoff distance depends on the heavy atom and is parameterized in HydrogenBondAnalysis.r_cov.

Changed in version 0.7.6: Added detection of [123]H and additional check that a selected hydrogen is bonded to the donor atom (i.e. its distance to the donor is less than the covalent radius stored in HydrogenBondAnalysis.r_cov or the default 1.5 Å).

Changed name to _get_bonded_hydrogens_list() and added kwargs so that it can be used instead of _get_bonded_hydrogens_dist().

DEFAULT_ACCEPTORS = {'GLYCAM06': ('SM', 'OH', 'NT', 'O', 'N', 'OY', 'OW', 'OS', 'O2'), 'other': (), 'CHARMM27': ('OH', 'OG', 'OD1', 'OD2', 'OG1', 'O', 'ND1', 'NE2', 'OE2', 'OW', 'SG', 'OE1', 'OH2', 'SD')}

default atom names that are treated as hydrogen acceptors (see Default heavy atom names for CHARMM27 force field.) Use the keyword acceptors to add a list of additional acceptor names.

DEFAULT_DONORS = {'GLYCAM06': ('OW', 'N3', 'NT', 'OH', 'N'), 'other': (), 'CHARMM27': ('OH', 'OG', 'NE2', 'OG1', 'NE', 'N', 'ND1', 'NZ', 'NH1', 'NH2', 'OW', 'ND2', 'SG', 'OH2', 'NE1')}

Use the keyword donors to add a list of additional donor names.

calc_angle(d, h, a)[source]

Calculate the angle (in degrees) between two atoms with H at apex.

calc_eucl_distance(a1, a2)[source]

Calculate the Euclidean distance between two atoms.

count_by_time()[source]

Counts the number of hydrogen bonds per timestep.

Returns:a class:numpy.recarray
count_by_type()[source]

Counts the frequency of hydrogen bonds of a specific type.

Processes HydrogenBondAnalysis.timeseries and returns a numpy.recarray containing atom indices, residue names, residue numbers (for donors and acceptors) and the fraction of the total time during which the hydrogen bond was detected.

Returns:a class:numpy.recarray
generate_table()[source]

Generate a normalised table of the results.

The table is stored as a numpy.recarray in the attribute table and can be used with e.g. recsql.

Columns:
  1. “time”
  2. “donor_idx”
  3. “acceptor_idx”
  4. “donor_resnm”
  5. “donor_resid”
  6. “donor_atom”
  7. “acceptor_resnm”
  8. “acceptor_resid”
  9. “acceptor_atom”
  10. “distance”
  11. “angle”
r_cov = defaultdict(<function <lambda> at 0x2af820fdac08>, {'P': 1.58, 'S': 1.55, 'O': 1.31, 'N': 1.31})

A collections.defaultdict of covalent radii of common donors (used in :meth`_get_bonded_hydrogens_list` to check if a hydrogen is sufficiently close to its donor heavy atom). Values are stored for N, O, P, and S. Any other heavy atoms are assumed to have hydrogens covalently bound at a maximum distance of 1.5 Å.

run(**kwargs)[source]

Analyze trajectory and produce timeseries.

Stores the hydrogen bond data per frame as HydrogenBondAnalysis.timeseries (see there for output format).

The method accepts a number of keywords, amongst them quiet (default False), which silences the porgress output (see ProgressMeter) and verbose (which can be used to change the value provided with the class constructor).

See also

HydrogenBondAnalysis.generate_table() for processing the data into a different format.

Changed in version 0.7.6: Results are not returned, only stored in timeseries and duplicate hydrogen bonds are removed from output (can be suppressed with remove_duplicates = False)

Changed in version 0.11.0: Accept quiet keyword. Analysis will now proceed through frames even if no donors or acceptors were found in a particular frame.

save_table(filename='hbond_table.pickle')[source]

Saves table to a pickled file.

Load with

import cPickle
table = cPickle.load(open(filename))

See also

cPickle module and numpy.recarray

timesteps_by_type()[source]

Frames during which each hydrogen bond existed, sorted by hydrogen bond.

Processes HydrogenBondAnalysis.timeseries and returns a numpy.recarray containing atom indices, residue names, residue numbers (for donors and acceptors) and a list of timesteps at which the hydrogen bond was detected.

Returns:a class:numpy.recarray