5.1. Coordinate/Trajectory Readers and Writers — MDAnalysis.coordinates
¶
The coordinates submodule contains code to read coordinates, either
single frames (e.g. the PDB module) or trajectories (such as the DCD
reader). All readers are supposed to expose a Reader
object
that presents a common Trajectory API to other code.
The Universe
contains the API
entry point attribute
Universe.trajectory
that points to the actual Reader
object; all Readers are supposed to be accessible through this entry
point in the same manner (“duck typing”).
In order to write coordinates, a factory function is provided
(MDAnalysis.coordinates.core.writer()
) which is made available
as MDAnalysis.Writer()
) that returns a Writer appropriate for
the desired file format (as indicated by the filename
suffix). Furthermore, a trajectory
Reader
can also have a method
Writer()
that returns an
appropriate Writer
for the file
format of the trajectory.
In analogy to MDAnalysis.coordinates.core.writer()
, there is
also a MDAnalysis.coordinates.core.reader()
function available
to return a trajectory Reader
instance although this is often not needed because the
Universe
class can choose an
appropriate reader automatically.
5.1.1. Using Readers¶
Normally, one does not explicitly need to select a reader. This is handled
automatically when creating a Universe
and
the appropriate reader for the file type is selected (typically by the file
extension but this choice can be overriden with the format
argument to
Universe
).
5.1.2. Using Writers¶
A typical approach is to generate a new trajectory from an old one, e.g. to only keep the protein:
u = MDAnalysis.Universe(PDB, XTC)
protein = u.selectAtoms("protein")
with MDAnalysis.Writer("protein.xtc", protein.numberOfAtoms()) as W:
for ts in u.trajectory:
W.write(protein)
Using the with()
statement will automatically close the trajectory when
the last frame has been written.
5.1.3. Supported coordinate formats¶
The table below lists the coordinate file formats understood by MDAnalysis. The
emphasis is on formats that are used in popular molecular dynamics codes. By
default, MDAnalysis figures out formats by looking at the extension. Thus, a
DCD file always has to end with ”.dcd” to be recognized as such unless the
format is explicitly specified with the format keyword to
Universe
or
load_new()
. A number of files are
also recognized when they are compressed with gzip or
bzip2 such as ”.xyz.bz2”.
Name | extension | IO | remarks |
---|---|---|---|
CHARMM, NAMD | dcd | r/w | standard CHARMM binary trajectory; endianness is
autodetected. Fixed atoms may not be handled
correctly (requires testing). Module
MDAnalysis.coordinates.DCD |
LAMMPS | dcd | r/w | CHARMM-style binary trajectory; endianness is
autodetected. Units are appropriate for LAMMPS.
Module MDAnalysis.coordinates.LAMMPS |
LAMMPS [1] | data | r | Single frame of coordinates read from .data files |
Gromacs | xtc | r/w | Compressed (lossy) xtc trajectory format. Module
MDAnalysis.coordinates.XTC |
Gromacs | trr | r/w | Full precision trr trajectory. Coordinates and
velocities are processed. Module
MDAnalysis.coordinates.TRR |
XYZ [1] | xyz | r/w | Generic white-space separate XYZ format; can be
compressed (gzip or bzip2). Module
MDAnalysis.coordinates.XYZ |
GAMESS | gms, log, out | r | Generic semi-formatted GAMESS output log; can be
compressed (gzip or bzip2). Module
MDAnalysis.coordinates.GMS |
AMBER | trj, mdcrd | r | formatted (ASCII) trajectories; the presence of a
periodic box is autodetected (experimental).
Module MDAnalysis.coordinates.TRJ |
AMBER | ncdf | r/w | binary (NetCDF) trajectories are fully supported with
optional netcdf4-python module (coordinates and
velocities). Module MDAnalysis.coordinates.TRJ |
Brookhaven [1] | pdb | r/w | a simplified PDB format (as used in MD simulations)
is read by default; the full format can be read by
supplying the permissive=False flag to
MDAnalysis.Universe . Multiple frames (MODEL)
are supported but require the multiframe keyword.
Module MDAnalysis.coordinates.PDB |
XPDB | pdb | r | Extended PDB format (can use 5-digit residue
numbers). To use, specify the format “XPBD”
explicitly: Universe(..., format="XPDB") .
Module MDAnalysis.coordinates.PDB |
PDBQT [1] | pdbqt | r/w | file format used by AutoDock with atom types t
and partial charges q. Module:
MDAnalysis.coordinates.PDBQT |
PQR [1] | pqr | r/w | PDB-like but whitespace-separated files with charge
and radius information. Module
MDAnalysis.coordinates.PQR |
GROMOS96 [1] | gro | r/w | basic GROMOS96 format (velocities as well). Module
MDAnalysis.coordinates.GRO |
CHARMM CARD [1] | crd | r/w | “CARD” coordinate output from CHARMM; deals with
either standard or EXTended format. Module
MDAnalysis.coordinates.CRD |
DESRES [1] | dms | r | DESRES Molecular Structure file format reader.
Module MDAnalysis.coordinates.DMS |
IBIsCO/YASP | trz | r/w | Binary IBIsCO or YASP trajectories Module
MDAnalysis.coordinates.TRZ |
MOL2 | mol2 | r/w | Text-based Tripos molecular structure format
MDAnalysis.coordinates.MOL2 |
[1] | (1, 2, 3, 4, 5, 6, 7, 8) This format can also be used to provide basic topology
information (i.e. the list of atoms); it is possible to create a
full Universe by simply
providing a file of this format: u = Universe(filename) |
5.1.4. Trajectory API¶
The Trajectory API defines how classes have to be structured that allow reading and writing of coordinate files. By following the API it is possible to seamlessly enhance the I/O capabilities of MDAnalysis. The actual underlying I/O code can be written in C or python or a mixture thereof.
Typically, each format resides in its own module, named by the format specifier (and using upper case by convention).
Reader and Writer classes are derived from base classes in
MDAnalysis.coordinates.base
.
5.1.4.1. History¶
- 2010-04-30 Draft [orbeckst]
- 2010-08-20 added single frame writers to API [orbeckst]
- 2010-10-09 added write() method to Writers [orbeckst]
- 2010-10-19 use close() instead of close_trajectory() [orbeckst]
- 2010-10-30 clarified Writer write() methods (see also Issue 49)
- 2011-02-01 extended call signatur of Reader class
- 2011-03-30 optional Writer() method for Readers
- 2011-04-18 added time and frame managed attributes to Reader
- 2011-04-20 added volume to Timestep
- 2012-02-11 added _velocities to Timestep
- 2012-05-24 multiframe keyword to distinguish trajectory from single frame writers
- 2012-06-04 missing implementations of Reader.__getitem__ should raise
TypeError
- 2013-08-02 Readers/Writers must conform to the Python Context Manager API
- 2015-01-15 Timestep._init_unitcell() method added
5.1.4.2. Registry¶
In various places, MDAnalysis tries to automatically select appropriate formats (e.g. by looking at file extensions). In order to allow it to choose the correct format, all I/O classes must be registered in one of three dictionaries with the format (typically the file extension in upper case):
- Trajectory reader classes must be added to
MDAnalysis.coordinates._trajectory_readers
. - Trajectory writer classes must be added to
MDAnalysis.coordinates._trajectory_writers
. - Single-frame writer classes must be added to to
MDAnalysis.coordinates._frame_writers
.
5.1.4.3. Timestep class¶
A Timestep instance holds data for the current frame. It is updated whenever a new frame of the trajectory is read.
Timestep classes are derived from
MDAnalysis.coordinates.base.Timestep
, which is the primary
implementation example (and used directly for the DCDReader).
5.1.4.3.1. Methods¶
__init__(arg)
depending on the type of arg, Timestep instances are created in different ways:
int
- empty Timestep for arg atoms (allocate arrays etc)
Timestep
- makes a deep copy of the arg
numpy.ndarray
- update the Timesteps positions array with the contents of arg
Anything else raises an exception; in particular it is not possible to create a “empty”
Timestep
instance.__getitem__(atoms)
- position(s) of atoms; can be a slice or numpy array and then returns coordinate array
__len__()
- number of coordinates (atoms) in the frame
__iter__()
- iterator over all coordinates
copy()
- deep copy of the instance
_init_unitcell
- hook that returns empty data structure for the unitcell representation of this particular file format; called from within
__init__()
to initializeTimestep._unitcell
.
5.1.4.3.2. Attributes¶
numatoms
- number of atoms in the frame
frame
- current frame number (1-based)
dimensions
system box dimensions (x, y, z, alpha, beta, gamma) (typically implemented as a property because it needs to translate whatever is in the underlying
_unitcell
attribute. Also comes with a setter that takes a MDAnalysis box so that one can doTimestep.dimensions = [A, B, C, alpha, beta, gamma]which then converts automatically to the underlying representation and stores it in
Timestep._unitcell
.volume
- system box volume (derived as the determinant of the box vectors of
dimensions
)
5.1.4.3.3. Private attributes¶
These attributes are set directly by the underlying trajectory
readers. Normally the user should not have to directly access those (although in
some cases it is convenient to directly use _pos
).
_pos
- raw coordinates, a
numpy.float32
array;X = _pos[:,0], Y = _pos[:,1], Z = _pos[:,2]
_unitcell
native unit cell description; the format depends on the underlying trajectory format. A user should use the
dimensions
attribute to access the data in a canonical format instead of accessingTimestep._unitcell
directly.The method
Timestep._init_unitcell()
is a hook to initialize this attribute.
Optional attributes (only implemented by some readers); if an optional
attribute does not exist, a AttributeError
is raised and the calling
code should handle this gracefully.
_velocities
- raw velocities, a
numpy.float32
array containing velocities (similar to_pos
)
5.1.4.4. Trajectory Reader class¶
Trajectory readers are derived from MDAnalysis.coordinates.base.Reader
.
Typically, many methods and attributes are overriden.
5.1.4.4.1. Methods¶
The MDAnalysis.coordinates.DCD.DCDReader
class is the primary
implementation example.
Mandatory methods
The following methods must be implemented in a Reader class.
__init__(filename, **kwargs)
open filename; other kwargs are processed as needed and the Reader is free to ignore them. Typically, MDAnalysis supplies as much information as possible to the Reader in kwargs; at the moment the following data are supplied in keywords when a trajectory is loaded from within
MDAnalysis.Universe
:
- numatoms: the number of atoms (known from the topology)
__iter__()
allow iteration from beginning to end:
for ts in trajectory: print ts.frameclose()
- close the file and cease I/O
__del__()
- ensure that the trajectory is closed
next()
- advance to next time step or raise
IOError
when moving past the last framerewind()
- reposition to first frame
__entry__()
- entry method of a Context Manager (returns self)
__exit__()
- exit method of a Context Manager, should call
close()
.
Optional methods
Not all trajectory formats support the following methods, either because the data are not available or the methods have not been implemented. Code should deal with missing methods gracefully.
__len__()
- number of frames in trajectory
__getitem__(arg)
advance to time step arg = frame and return
Timestep
; or if arg is a slice, then return an iterator over that part of the trajectory.The first functionality allows one to randomly access frames in the trajectory:
universe.trajectory[314]would load frame 314 into the current
Timestep
.Using slices allows iteration over parts of a trajectory
for ts in universe.trajectory[1000:2000]: process_frame(ts) # do somethingor skipping frames
for ts in universe.trajectory[1000::100]: process_frame(ts) # do somethingThe last example starts reading the trajectory at frame 1000 and reads every 100th frame until the end.
The performance of the
__getitem__()
method depends on the underlying trajectory reader and if it can implement random access to frames. In many cases this is not easily (or reliably) implementable and thus one is restricted to sequential iteration.If the Reader is not able to provide random access to frames then it should raise
TypeError
on indexing. It is possible to partially implement__getitem__
(as done onMDAnalysis.coordinates.base.Reader.__getitem__
where slicing the full trajectory is equivalent toMDAnalysis.coordinates.base.Reader.__iter__
(which is always implemented) and other slices raiseTypeError
.Note
__getitem__
uses 0-based indices for frames so that indexing and slicing works exactly as in Python. However, theTimestep.frame
attribute (the “frame number”) is 1-based. Thus, the first frame in a trajectory can be accessed astrajectory[0]
(frame index 0) and the corresponding frame number is 1 (trajectory.ts.frame == 1
).Writer(filename, **kwargs)
returns a
Writer
which is set up with the same parameters as the trajectory that is being read (e.g. time step, length etc), which facilitates copying and simple on-the-fly manipulation.If no Writer is defined then a
NotImplementedError
is raised.The kwargs can be used to customize the Writer as they are typically passed through to the init method of the Writer, with sensible defaults filled in; the actual keyword arguments depend on the Writer.
timeseries(atomGroup, [start[,stop[,skip[,format]]]])
- returns a subset of coordinate data
correl(timeseriesCollection[,start[,stop[,skip]]])
- populate a
MDAnalysis.core.Timeseries.TimeseriesCollection
object with observable timeseries computed from the trajectory
5.1.4.4.2. Attributes¶
filename
- filename of the trajectory
numatoms
- number of atoms (coordinate sets) in a frame (constant)
numframes
- total number of frames (if known) –
None
if not knownfixed
- bool, saying if there are fixed atoms (e.g. dcds)
skip
- step size for iterating through the trajectory [1]
skip_timestep
- number of integrator steps between frames + 1 (i.e. the stride at which the MD simulation was sampled)
delta
- integrator time step (in native units); hence the “length” of a trajctory frame is
skip_timestep*delta
time unitsperiodic
- contains box information for periodic boundary conditions
ts
- the
Timestep
object; typically customized for each trajectory format and derived frombase.Timestep
.units
- dictionary with keys time, length, speed, force and the appropriate unit (e.g. ‘AKMA’ and ‘Angstrom’ for Charmm dcds, ‘ps’ and ‘nm’ for Gromacs trajectories,
None
and ‘Angstrom’ for PDB). Any field not used should be set toNone
.format
- string that identifies the file format, e.g. “DCD”, “PDB”, “CRD”, “XTC”, “TRR”; this is typically the file extension in upper case.
dt
- time between frames in ps; a managed attribute (read only) that computes on the fly
skip_timestep * delta
and converts to the MDAnalysis base unit for time (pico seconds by default)totaltime
- total length of the trajectory =
numframes * dt
time
- time of the current time step, in MDAnalysis time units (ps)
frame
- frame number of the current time step (1-based)
Optional attributes
compressed
- string that identifies the compression (e.g. “gz” or “bz2”) or
None
.
5.1.4.5. Trajectory Writer class¶
Trajectory writers are derived from
MDAnalysis.coordinates.base.Writer
. They are used to write
multiple frames to a trajectory file. Every time the
write()
method is called,
another frame is appended to the trajectory.
Typically, many methods and attributes are overriden.
Signature:
W = TrajectoryWriter(filename,numatoms,**kwargs)
W.write_next_timestep(Timestep)
or:
W.write(AtomGroup) # write a selection
W.write(Universe) # write a whole universe
W.write(Timestep) # same as write_next_timestep()
5.1.4.5.1. Methods¶
__init__(filename,numatoms[,start[,step[,delta[,remarks]]]])
- opens filename and writes header if required by format
write(obj)
- write Timestep data in obj
write_next_timestep([timestep])
- write data in timestep to trajectory file
convert_dimensions_to_unitcell(timestep)
- take the dimensions from the timestep and convert to the native unitcell representation of the format
close()
- close file and finish I/O
__del__()
- ensures that close() is called
5.1.4.5.2. Attributes¶
filename
- name of the trajectory file
start, stop, step
- first and last frame number (1-based) and step
units
- dictionary with keys time, length, speed, force and the appropriate unit (e.g. ‘AKMA’ and ‘Angstrom’ for Charmm dcds, ‘ps’ and ‘nm’ for Gromacs trajectories,
None
and ‘Angstrom’ for PDB). Any field not used should be set toNone
.format
- string that identifies the file format, e.g. “DCD”, “PDB”, “CRD”, “XTC”, “TRR”
Optional
ts
Timestep
instance
5.1.4.6. Single Frame Writer class¶
A single frame writer is a special case of a trajectory writer in that it writes only a single coordinate frame to a file, for instance, a pdb or gro file. Unlike trajectory formats, which only contains coordinates, single frame formats contain much more information (e.g. atom and residue names and numbers) and hence it is possible to write selections of atoms in a meaningful way.
Signature:
W = FrameWriter(filename, **kwargs)
W.write(AtomGroup)
W.write(Universe)
The blanket kwargs is required so that one can pass the same kind of
arguments (filename and numatoms) as for the Trajectory writers. In
this way, the simple writer()
factory function can be used for all writers.
5.1.5. Reader/Writer registry¶
The following data structures connect reader/writer classes to their format identifiers. They are documented for programmers who want to enhance MDAnalysis; the casual user is unlikely to access them directly.
Some formats can either write single frames or trajectories (such as
PDB). In theses cases, the kind of writer is selected with the
multiframe keyword to MDAnalysis.coordinates.core.get_writer()
(or the writer itself).
-
MDAnalysis.coordinates.__init__.
_trajectory_readers
= {'CHAIN': <class 'MDAnalysis.coordinates.base.ChainReader'>, 'XPDB': <class 'MDAnalysis.coordinates.PDB.ExtendedPDBReader'>, 'TRJ': <class 'MDAnalysis.coordinates.TRJ.TRJReader'>, 'TRZ': <class 'MDAnalysis.coordinates.TRZ.TRZReader'>, 'MDCRD': <class 'MDAnalysis.coordinates.TRJ.TRJReader'>, 'LAMMPS': <class 'MDAnalysis.coordinates.LAMMPS.DCDReader'>, 'DCD': <class 'MDAnalysis.coordinates.DCD.DCDReader'>, 'PDB': <class 'MDAnalysis.coordinates.PDB.PDBReader'>, 'PDBQT': <class 'MDAnalysis.coordinates.PDBQT.PDBQTReader'>, 'MOL2': <class 'MDAnalysis.coordinates.MOL2.MOL2Reader'>, 'PQR': <class 'MDAnalysis.coordinates.PQR.PQRReader'>, 'DMS': <class 'MDAnalysis.coordinates.DMS.DMSReader'>, 'XTC': <class 'MDAnalysis.coordinates.xdrfile.XTC.XTCReader'>, 'CRD': <class 'MDAnalysis.coordinates.CRD.CRDReader'>, 'GMS': <class 'MDAnalysis.coordinates.GMS.GMSReader'>, 'GRO': <class 'MDAnalysis.coordinates.GRO.GROReader'>, 'TRR': <class 'MDAnalysis.coordinates.xdrfile.TRR.TRRReader'>, 'DATA': <class 'MDAnalysis.coordinates.LAMMPS.DATAReader'>, 'XYZ': <class 'MDAnalysis.coordinates.XYZ.XYZReader'>, 'NCDF': <class 'MDAnalysis.coordinates.TRJ.NCDFReader'>}¶ standard trajectory readers (dict with identifier as key and reader class as value)
-
MDAnalysis.coordinates.__init__.
_topology_coordinates_readers
= {'PDBQT': <class 'MDAnalysis.coordinates.PDBQT.PDBQTReader'>, 'XPDB': <class 'MDAnalysis.coordinates.PDB.ExtendedPDBReader'>, 'DMS': <class 'MDAnalysis.coordinates.DMS.DMSReader'>, 'PQR': <class 'MDAnalysis.coordinates.PQR.PQRReader'>, 'CRD': <class 'MDAnalysis.coordinates.CRD.CRDReader'>, 'GMS': <class 'MDAnalysis.coordinates.GMS.GMSReader'>, 'GRO': <class 'MDAnalysis.coordinates.GRO.GROReader'>, 'MOL2': <class 'MDAnalysis.coordinates.MOL2.MOL2Reader'>, 'DATA': <class 'MDAnalysis.coordinates.LAMMPS.DATAReader'>, 'PDB': <class 'MDAnalysis.coordinates.PDB.PrimitivePDBReader'>}¶ readers of files that contain both topology/atom data and coordinates (currently only the keys are used)
-
MDAnalysis.coordinates.__init__.
_trajectory_readers_permissive
= {'TRJ': <class 'MDAnalysis.coordinates.TRJ.TRJReader'>, 'MDCRD': <class 'MDAnalysis.coordinates.TRJ.TRJReader'>, 'TRZ': <class 'MDAnalysis.coordinates.TRZ.TRZReader'>, 'XTC': <class 'MDAnalysis.coordinates.xdrfile.XTC.XTCReader'>, 'CRD': <class 'MDAnalysis.coordinates.CRD.CRDReader'>, 'GMS': <class 'MDAnalysis.coordinates.GMS.GMSReader'>, 'GRO': <class 'MDAnalysis.coordinates.GRO.GROReader'>, 'TRR': <class 'MDAnalysis.coordinates.xdrfile.TRR.TRRReader'>, 'DATA': <class 'MDAnalysis.coordinates.LAMMPS.DATAReader'>, 'PDB': <class 'MDAnalysis.coordinates.PDB.PrimitivePDBReader'>, 'XPDB': <class 'MDAnalysis.coordinates.PDB.ExtendedPDBReader'>, 'LAMMPS': <class 'MDAnalysis.coordinates.LAMMPS.DCDReader'>, 'CHAIN': <class 'MDAnalysis.coordinates.base.ChainReader'>, 'DCD': <class 'MDAnalysis.coordinates.DCD.DCDReader'>, 'PDBQT': <class 'MDAnalysis.coordinates.PDBQT.PDBQTReader'>, 'PQR': <class 'MDAnalysis.coordinates.PQR.PQRReader'>, 'DMS': <class 'MDAnalysis.coordinates.DMS.DMSReader'>, 'XYZ': <class 'MDAnalysis.coordinates.XYZ.XYZReader'>, 'MOL2': <class 'MDAnalysis.coordinates.MOL2.MOL2Reader'>, 'NCDF': <class 'MDAnalysis.coordinates.TRJ.NCDFReader'>}¶ hack: readers that ignore most errors (permissive=True); at the moment the same as
_trajectory_readers
with the exception of the the PDB reader (PDBReader
is replaced by
-
MDAnalysis.coordinates.__init__.
_compressed_formats
= ['XYZ', 'TRJ', 'MDCRD', 'PQR', 'PDBQT']¶ formats of readers that can also handle gzip or bzip2 compressed files
-
MDAnalysis.coordinates.__init__.
_frame_writers
= {'XYZ': <class 'MDAnalysis.coordinates.XYZ.XYZWriter'>, 'PDBQT': <class 'MDAnalysis.coordinates.PDBQT.PDBQTWriter'>, 'MOL2': <class 'MDAnalysis.coordinates.MOL2.MOL2Writer'>, 'PQR': <class 'MDAnalysis.coordinates.PQR.PQRWriter'>, 'CRD': <class 'MDAnalysis.coordinates.CRD.CRDWriter'>, 'GRO': <class 'MDAnalysis.coordinates.GRO.GROWriter'>, 'PDB': <class 'MDAnalysis.coordinates.PDB.PrimitivePDBWriter'>}¶ frame writers: export to single frame formats such as PDB, gro, crd Signature:
W = FrameWriter(filename) W.write(AtomGroup)
-
MDAnalysis.coordinates.__init__.
_trajectory_writers
= {'MOL2': <class 'MDAnalysis.coordinates.MOL2.MOL2Writer'>, 'TRZ': <class 'MDAnalysis.coordinates.TRZ.TRZWriter'>, 'XTC': <class 'MDAnalysis.coordinates.xdrfile.XTC.XTCWriter'>, 'XYZ': <class 'MDAnalysis.coordinates.XYZ.XYZWriter'>, 'LAMMPS': <class 'MDAnalysis.coordinates.LAMMPS.DCDWriter'>, 'DCD': <class 'MDAnalysis.coordinates.DCD.DCDWriter'>, 'TRR': <class 'MDAnalysis.coordinates.xdrfile.TRR.TRRWriter'>, 'PDB': <class 'MDAnalysis.coordinates.PDB.MultiPDBWriter'>, 'NCDF': <class 'MDAnalysis.coordinates.TRJ.NCDFWriter'>}¶ trajectory writers: export frames, typically only saving coordinates Signature:
W = TrajectoryWriter(filename,numatoms,**kwargs) W.write_next_timestep(TimeStep) W.write(Timestep) W.write(AtomGroup) W.write(Universe)