7.9. Helper functions — MDAnalysis.core.util
¶
Small helper functions that don’t fit anywhere else.
7.9.1. Files and directories¶
-
MDAnalysis.core.util.
filename
(name, ext=None, keep=False)[source]¶ Return a new name that has suffix attached; replaces other extensions.
Arguments: - name
filename; extension is replaced unless keep=True; name can also be a
NamedStream
(and itsNamedStream.name
will be changed accordingly)- ext
extension
- keep
False
: replace existing extension with ext;True
: keep old extension if one existed
Changed in version 0.9.0: Also permits
NamedStream
to pass through.
-
MDAnalysis.core.util.
openany
(directory[, mode='r'])[source]¶ Context manager to open a compressed (bzip2, gzip) or plain file (uses
anyopen()
).
-
MDAnalysis.core.util.
anyopen
(datasource, mode='r', reset=True)[source]¶ Open datasource (gzipped, bzipped, uncompressed) and return a stream.
datasource can be a filename or a stream (see
isstream()
). By default, a stream is reset to its start if possible (viaseek()
orreset()
).If possible, the attribute
stream.name
is set to the filename or “<stream>” if no filename could be associated with the datasource.Arguments: - datasource
a file (from
file
oropen()
) or a stream (e.g. fromurllib2.urlopen()
orcStringIO.StringIO
)- mode
‘r’ or ‘w’ or ‘a’, more complicated modes (‘r+’, ‘w+’ are not supported because only the first letter is looked at) [
'r'
]- reset
try to read (mode ‘r’) the stream from the start [
True
]
Returns: tuple
stream
which is a file-like objectChanged in version 0.9.0: Only returns the
stream
and tries to setstream.name = filename
instead of the previous behavior to return a tuple(stream, filename)
.
-
MDAnalysis.core.util.
greedy_splitext
(p)[source]¶ Split extension in path p at the left-most separator.
-
MDAnalysis.core.util.
which
(program)[source]¶ Determine full path of executable program on
PATH
.(Jay at http://stackoverflow.com/questions/377017/test-if-executable-exists-in-python)
7.9.2. Streams¶
Many of the readers are not restricted to just reading files. They can
also use gzip-compressed or bzip2-compressed files (through the
internal use of openany()
). It is also possible to provide more
general streams as inputs, such as a cStringIO.StringIO()
instances (essentially, a memory buffer) by wrapping these instances
into a NamedStream
. This NamedStream
can then be
used in place of an ordinary file name (typically, with a
class:~MDAnalysis.core.AtomGroup.Universe but it is also possible to
write to such a stream using MDAnalysis.Writer()
).
In the following example, we use a PDB stored as a string pdb_s
:
import MDAnalysis
from MDAnalysis.core.util import NamedStream
import cStringIO
pdb_s = "TITLE Lonely Ion\nATOM 1 NA NA+ 1 81.260 64.982 10.926 1.00 0.00\n"
u = MDAnalysis.Universe(NamedStream(cStringIO.StringIO(pdb_s), "ion.pdb"))
print(u)
# <Universe with 1 atoms>
print(u.atoms.positions)
# [[ 81.26000214 64.98200226 10.92599964]]
It is important to provide a proper pseudo file name with the correct extension
(”.pdb”) to NamedStream
because the file type recognition uses the
extension of the file name to determine the file format or alternatively
provide the format="pdb"
keyword argument to the
Universe
.
The use of streams becomes more interesting when MDAnalysis is used as glue
between different analysis packages and when one can arrange things so that
intermediate frames (typically in the PDB format) are not written to disk but
remain in memory via e.g. cStringIO
buffers.
Note
A remote connection created by urllib2.urlopen()
is not seekable
and therefore will often not work as an input. But try it...
-
class
MDAnalysis.core.util.
NamedStream
(stream, filename, reset=True, close=False)[source]¶ Stream that also provides a (fake) name.
By wrapping a stream stream in this class, it can be passed to code that uses inspection of the filename to make decisions. For instance.
os.path.split()
will work correctly on aNamedStream
.The class can be used as a context manager.
NamedStream
is derived fromio.IOBase
(to indicate that it is a stream) andbasestring
(that one can useiterable()
in the same way as for strings).Example
Wrap a
cStringIO.StringIO()
instance to write to:import cStringIO import os.path stream = cStringIO.StringIO() f = NamedStream(stream, "output.pdb") print(os.path.splitext(f))
Wrap a
file
instance to read from:stream = open("input.pdb") f = NamedStream(stream, stream.name)
Use as a context manager (closes stream automatically when the
with
block is left):with NamedStream(open("input.pdb"), "input.pdb") as f: # use f print f.closed # --> False # ... print f.closed # --> True
Note
This class uses its own
__getitem__()
method so if stream implementsstream.__getitem__()
then that will be masked and this class should not be used.Warning
By default,
NamedStream.close()
will not close the stream but insteadreset()
it to the beginning. [1] Provide theforce=True
keyword toNamedStream.close()
to always close the stream.Initialize the
NamedStream
from a stream and give it a name.The constructor attempts to rewind the stream to the beginning unless the keyword reset is set to
False
. If rewinding fails, aMDAnalysis.StreamWarning
is issued.Note
By default, this stream will not be closed by
with
andclose()
(see there) unless the close keyword is set toTrue
.Arguments: - stream
open stream (e.g.
file
orcStringIO.StringIO()
)- filename
the filename that should be associated with the stream
Keywords: New in version 0.9.0.
-
close
(force=False)[source]¶ Reset or close the stream.
If
NamedStream.close_stream
is set toFalse
(the default) then this method will not close the stream and onlyreset()
it.If the force =
True
keyword is provided, the stream will be closed.Note
This
close()
method is non-standard.del NamedStream
always closes the underlying stream.
-
closed
¶ True
if stream is closed.
-
fileno
()[source]¶ Return the underlying file descriptor (an integer) of the stream if it exists.
An
IOError
is raised if the IO object does not use a file descriptor.
-
flush
()[source]¶ Flush the write buffers of the stream if applicable.
This does nothing for read-only and non-blocking streams. For file objects one also needs to call
os.fsync()
to write contents to disk.
-
readable
()[source]¶ Return
True
if the stream can be read from.If
False
,read()
will raiseIOError
.
-
seek
(offset, whence=0)[source]¶ Change the stream position to the given byte offset .
offset is interpreted relative to the position indicated by whence. Values for whence are:
io.SEEK_SET
or 0 – start of the stream (the default); offset should be zero or positiveio.SEEK_CUR
or 1 – current stream position; offset may be negativeio.SEEK_END
or 2 – end of the stream; offset is usually negative
Returns: the new absolute position.
-
seekable
()[source]¶ Return
True
if the stream supports random access.If
False
,seek()
,tell()
andtruncate()
will raiseIOError
.
-
MDAnalysis.core.util.
isstream
(obj)[source]¶ Detect if obj is a stream.
We consider anything a stream that has the methods
close()
and either set of the following
read()
,readline()
,readlines()
write()
,writeline()
,writelines()
See also
Arguments: - obj
stream or string
Returns: True
is obj is a stream,False
otherwiseNew in version 0.9.0.
7.9.3. Containers and lists¶
-
MDAnalysis.core.util.
iterable
(obj)[source]¶ Returns
True
if obj can be iterated over and is not a string.
7.9.4. File parsing¶
-
class
MDAnalysis.core.util.
FORTRANReader
(fmt)[source]¶ FORTRANReader provides a method to parse FORTRAN formatted lines in a file.
Usage:
atomformat = FORTRANReader('2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10') for line in open('coordinates.crd'): serial,TotRes,resName,name,x,y,z,chainID,resSeq,tempFactor = atomformat.read(line)
Fortran format edit descriptors; see Fortran Formats for the syntax.
Only simple one-character specifiers supported here: I F E A X (see
FORTRAN_format_regex
).Strings are stripped of leading and trailing white space.
Set up the reader with the FORTRAN format string.
The string fmt should look like ‘2I10,2X,A8,2X,A8,3F20.10,2X,A8,2X,A8,F20.10’.
-
number_of_matches
(line)[source]¶ Return how many format entries could be populated with legal values.
-
parse_FORTRAN_format
(edit_descriptor)[source]¶ Parse the descriptor.
parse_FORTRAN_format(edit_descriptor) –> dictReturns: dict with totallength (in chars), repeat, length, format, decimals Raises: ValueError
if the edit_descriptor is not recognized and cannot be parsedNote
Specifiers: L ES EN T TL TR / r S SP SS BN BZ are not supported, and neither are the scientific notation Ew.dEe forms.
-
-
MDAnalysis.core.util.
FORTRAN_format_regex
= '(?P<repeat>\\d+?)(?P<format>[IFEAX])(?P<numfmt>(?P<length>\\d+)(\\.(?P<decimals>\\d+))?)?'¶ Regular expresssion (see
re
) to parse a simple FORTRAN edit descriptor.(?P<repeat>\d?)(?P<format>[IFELAX])(?P<numfmt>(?P<length>\d+)(\.(?P<decimals>\d+))?)?
7.9.5. Data manipulation and handling¶
7.9.6. Strings¶
-
MDAnalysis.core.util.
convert_aa_code
(x)[source]¶ Converts between 3-letter and 1-letter amino acid codes.
See also
Data are defined in
amino_acid_codes
andinverse_aa_codes
.
-
MDAnalysis.core.util.
parse_residue
(residue)[source]¶ Process residue string.
- Examples:
- “LYS300:HZ1” –> (“LYS”, 300, “HZ1”)
- “K300:HZ1” –> (“LYS”, 300, “HZ1”)
- “K300” –> (“LYS”, 300, None)
- “4GB300:H6O” –> (“4GB”, 300, “H6O”)
- “4GB300” –> (“4GB”, 300, None)
Argument: The residue must contain a 1-letter or 3-letter or 4-letter residue string, a number (the resid) and optionally an atom identifier, which must be separate from the residue with a colon (”:”). White space is allowed in between. Returns: (3-letter aa string, resid, atomname); known 1-letter aa codes are converted to 3-letter codes
7.9.7. Mathematics and Geometry¶
-
MDAnalysis.core.util.
normal
(vec1, vec2)[source]¶ Returns the unit vector normal to two vectors.
\[\hat{\mathbf{n}} = \frac{\mathbf{v}_1 \times \mathbf{v}_2}{|\mathbf{v}_1 \times \mathbf{v}_2|}\]If the two vectors are collinear, the vector \(\mathbf{0}\) is returned.
-
MDAnalysis.core.util.
norm
(v)[source]¶ Returns the length of a vector,
sqrt(v.v)
.\[v = \sqrt{\mathbf{v}\cdot\mathbf{v}}\]Faster than
numpy.linalg.norm()
because no frills.
7.9.8. Class decorators¶
-
MDAnalysis.core.util.
cached
(key)[source]¶ Cache a property within a class
Requires the Class to have a cache dict called “_cache”
Usage:
- class A(object):
- def__init__(self):
- self._cache = dict()
@property @cached(‘keyname’) def size(self):
# This code gets ran only if the lookup of keyname fails # After this code has been ran once, the result is stored in # _cache with the key: ‘keyname’ size = 10.0
New in version 0.9.0.
Footnotes
[1] | The reason why NamedStream.close() does
not close a stream by default (but just rewinds it to the
beginning) is so that one can use the class NamedStream as
a drop-in replacement for file names, which are often re-opened
(e.g. when the same file is used as a topology and coordinate file
or when repeatedly iterating through a trajectory in some
implementations). The close=True keyword can be supplied in
order to make NamedStream.close() actually close the
underlying stream and NamedStream.close(force=True) will also
close it. |