3.1.5. Calculating path similarity — MDAnalysis.analysis.psa
¶
Author: | Sean Seyler |
---|---|
Year: | 2015 |
Copyright: | GNU Public License v3 |
New in version 0.10.0.
The module contains code to calculate the geometric similarity of trajectories using path metrics such as the Hausdorff or Fréchet distances [Seyler2015]. The path metrics are functions of two paths and return a nonnegative number, i.e., a distance. Two paths are identical if their distance is zero, and large distances indicate dissimilarity. Each path metric is a function of the individual points (e.g., coordinate snapshots) that comprise each path and, loosely speaking, identify the two points, one per path of a pair of paths, where the paths deviate the most. The distance between these points of maximal deviation is measured by the root mean square deviation (RMSD), i.e., to compute structural similarity.
One typically computes the pairwise similarity for an ensemble of paths to produce a symmetric distance matrix, which can be clustered to, at a glance, identify patterns in the trajectory data. To properly analyze a path ensemble, one must select a suitable reference structure to which all paths (each conformer in each path) will be universally aligned using the rotations determined by the best-fit rmsds. Distances between paths and their structures are then computed directly with no further alignment. This pre-processing step is necessary to preserve the metric properties of the Hausdorff and Fréchet metrics; using the best-fit rmsd on a pairwise basis does not generally preserve the triangle inequality.
See also
The PSAnalysisTutorial outlines a typical application of PSA to a set of trajectories, including doing proper alignment, performing distance comparisons, and generating heat map-dendrogram plots from hierarchical clustering.
References
[Seyler2015] | Sean L. Seyler, Avishek Kumar, Michael F. Thorpe, Oliver Beckstein. Path Similarity Analysis: a Method for Quantifying Macromolecular Pathways. arXiv:1505.04807 (2015). |
3.1.5.1. Helper functions and variables¶
The following convenience functions are used by other functions in this module.
-
MDAnalysis.analysis.psa.
sqnorm
(v, axis=None)[source]¶ Compute the sum of squares of elements along specified axes.
Parameters: - *v* –
numpy.ndarray
of coordinates - *axes* – None or int or tuple of ints, optional
Axes or axes along which a sum is performed. The default (axes =
None
) performs a sum over all the dimensions of the input array. The value of axes may be negative, in which case it counts from the last axis to the zeroth axis.
Returns: float, the sum of the squares of the elements of v along axes
- *v* –
-
MDAnalysis.analysis.psa.
get_msd_matrix
(P, Q, axis=None)[source]¶ Generate the matrix of pairwise mean-squared deviations (MSDs) between all pairs of points in P and Q, each pair having a point from P and a point from Q.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). The pairwise MSD matrix has dimensions \(N_p\) by \(N_q\).Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: numpy.ndarray of pairwise MSDs between points in P and points in Q
Return type: :class
- *P* –
-
MDAnalysis.analysis.psa.
get_coord_axes
(path)[source]¶ Return the number of atoms and the axes (axis) corresponding to atoms and coordinates for a given path.
The path is assumed to be a
numpy.ndarray
where the 0th axis corresponds to a frame (a snapshot of coordinates). The \(3N\) (Cartesian) coordinates are assumed to be either:- all in the 1st axis, starting with the x,y,z coordinates of the first atom, followed by the x,*y*,*z* coordinates of the 2nd, etc.
- in the 1st and 2nd axis, where the 1st axis indexes the atom number and the 2nd axis contains the x,*y*,*z* coordinates of each atom.
Parameters: *path* – numpy.ndarray
representing a pathReturns: (int, (int, ...)), the number of atoms and the axes containing coordinates
3.1.5.2. Classes, methods, and functions¶
-
MDAnalysis.analysis.psa.
get_path_metric_func
(name)[source]¶ Selects a path metric function by name.
Parameters: *name* – string, name of path metric Returns: The path metric function specified by name (if found).
-
MDAnalysis.analysis.psa.
hausdorff
(P, Q)[source]¶ Calculate the Hausdorff distance between two paths.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: float, the Hausdorff distance between paths P and Q
Example
>>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff(P,Q) 4.7786639840135905 >>> hausdorff(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory 4.7786639840135905
- *P* –
-
MDAnalysis.analysis.psa.
hausdorff_wavg
(P, Q)[source]¶ Calculate the weighted average Hausdorff distance between two paths.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form. The nearest neighbor distances for P (to Q) and those of Q (to P) are averaged individually to get the average nearest neighbor distance for P and likewise for Q. These averages are then summed and divided by 2 to get a measure that gives equal weight to P and Q.Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: float, the weighted average Hausdorff distance between paths P and Q
Example
>>> from MDAnalysis import Universe >>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff_wavg(P,Q) 2.5669644353703447 >>> hausdorff_wavg(P,Q[::-1]) # weighted avg hausdorff dist w/ Q reversed 2.5669644353703447
- *P* –
-
MDAnalysis.analysis.psa.
hausdorff_avg
(P, Q)[source]¶ Calculate the average Hausdorff distance between two paths.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form. The nearest neighbor distances for P (to Q) and those of Q (to P) are all averaged together to get a mean nearest neighbor distance. This measure biases the average toward the path that has more snapshots, whereas weighted average Hausdorff gives equal weight to both paths.Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: float, the average Hausdorff distance between paths P and Q
Example
>>> from MDAnalysis.tests.datafiles import PSF, DCD >>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = numpy.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = numpy.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> hausdorff_avg(P,Q) 2.5669646575869005 >>> hausdorff_avg(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory 2.5669646575869005
- *P* –
-
MDAnalysis.analysis.psa.
hausdorff_neighbors
(P, Q)[source]¶ Calculate the Hausdorff distance between two paths.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or \(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: dictionary of two pairs of numpy arrays, the first pair containing the indices of (Hausdorff) nearest neighbors for P and Q, respectively, the second containing (corresponding) nearest neighbor distances for P and Q, respectively
- *P* –
-
MDAnalysis.analysis.psa.
discrete_frechet
(P, Q)[source]¶ Calculate the discrete Frechet distance between two paths.
P (Q) is a
numpy.ndarray
of \(N_p\) (\(N_q\)) time steps, \(N\) atoms, and \(3N\) coordinates (e.g.,MDAnalysis.core.AtomGroup.AtomGroup.coordinates()
). P (Q) has either shape \(N_p \times N \times 3\) (\(N_q \times N \times 3\)), or :\(N_p \times (3N)\) (\(N_q \times (3N)\)) in flattened form.Parameters: - *P* –
numpy.ndarray
representing a path - *Q* –
numpy.ndarray
representing a path
Returns: float, the discrete Frechet distance between paths P and Q
Example
>>> u = Universe(PSF,DCD) >>> mid = len(u.trajectory)/2 >>> ca = u.select_atoms('name CA') >>> P = np.array([ ... ca.positions for _ in u.trajectory[:mid:] ... ]) # first half of trajectory >>> Q = np.array([ ... ca.positions for _ in u.trajectory[mid::] ... ]) # second half of trajectory >>> discrete_frechet(P,Q) 4.7786639840135905 >>> discrete_frechet(P,Q[::-1]) # frechet distance w/ 2nd trj reversed 2nd 6.8429011177113832
- *P* –
-
MDAnalysis.analysis.psa.
dist_mat_to_vec
(N, i, j)[source]¶ Convert distance matrix indices (in the upper triangle) to the index of the corresponding distance vector.
This is a convenience function to locate distance matrix elements (and the pair generating it) in the corresponding distance vector. The row index j should be greater than i+1, corresponding to the upper triangle of the distance matrix.
Parameters: - *N* – int, size of the distance matrix (of shape N-by-N)
- *i* – int, row index (starting at 0) of the distance matrix
- *j* – int, column index (starting at 0) of the distance matrix
Returns: int, index (of the matrix element) in the corresponding distance vector
-
class
MDAnalysis.analysis.psa.
PDBToBinaryTraj
(universe, output_type='.dcd', infix='')[source]¶ -
universe
¶ MDAnalysis.Universe
object with a trajectory
-
frames
¶ MDAnalysis.Universe.trajectory
-
newname
¶ string, filename for converted trajectory, including file extension
-
-
class
MDAnalysis.analysis.psa.
Path
(universe, reference, ref_select='name CA', path_select='all', ref_frame=0)[source]¶ Pre-process a
MDAnalysis.Universe
object: (1) fit the trajectory to a reference structure, (2) convert fitted time series to anumpy.ndarray
representation ofPath.path
.The analysis is performed with
PSAnalysis.run()
and stores the result in thenumpy.ndarray
distance matrixPSAnalysis.D
.PSAnalysis.run()
also generates a fitted trajectory and path from alignment of the original trajectories to a reference structure.New in version 0.9.1.
Setting up trajectory alignment and fitted path generation.
Parameters: - *universe* –
MDAnalysis.Universe
object containing a trajectory - *reference* – reference structure;
MDAnalysis.Universe
object; ifNone
then traj is used (uses the current time step of the object) [None
] - *ref_select* –
The selection to operate on for rms fitting; can be one of:
- any valid selection string for
select_atoms()
that produces identical selections in mobile and reference; or - a dictionary
{'mobile':sel1, 'reference':sel2}
(theMDAnalysis.analysis.align.fasta2select()
function returns such a dictionary based on a ClustalW or STAMP sequence alignment); or - a tuple
(sel1, sel2)
When using 2. or 3. with sel1 and sel2 then these selections can also each be a list of selection strings (to generate an AtomGroup with defined atom order as described under Ordered selections).
- any valid selection string for
- *path_select* – atom selection composing coordinates of (fitted) path; if
None
then path_select is set to ref_select [None
]
-
u_original
¶ MDAnalysis.Universe
object with a trajectory
-
u_reference
¶ MDAnalysis.Universe
object containing a reference structure
-
ref_select
¶ string, selection for
select_atoms()
to select frame fromPath.u_reference
-
path_select
¶ string, selection for
select_atoms()
to select atoms to composePath.path
-
ref_frame
¶ int, frame index to select frame from
Path.u_reference
-
u_fitted
¶ MDAnalysis.Universe
object with the fitted trajectory
-
path
¶ numpy.ndarray
object representation of the fitted trajectory
-
fit_to_reference
(filename=None, prefix='', postfix='_fit', rmsdfile=None, targetdir='.', mass_weighted=False, tol_mass=0.1)[source]¶ Align each trajectory frame to the reference structure with
MDAnalysis.analysis.align.rms_fit_trj()
.Parameters: - *filename* – file name for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from
Path.u_original
) with prefix prepended - *prefix* – prefix for auto-generating the new output filename
- *rmsdfile* – file name for writing the RMSD time series [
None
] - *mass_weighted* – do a mass-weighted RMSD fit
- *tol_mass* – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
Returns: MDAnalysis.Universe object containing a fitted trajectory
Return type: :class
- *filename* – file name for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from
-
get_num_atoms
()[source]¶ Return the number of atoms used to construct the
Path
.Must run :method:`Path.to_path()` prior to calling this method.
Returns: class:Path Return type: int, the number of atoms in the
-
run
(align=False, filename=None, postfix='_fit', rmsdfile=None, targetdir='.', mass_weighted=False, tol_mass=0.1, flat=False)[source]¶ Generate a path from a trajectory and reference structure, aligning to a reference structure if specified.
This is a convenience method to generate a fitted trajectory from an inputted universe (
Path.u_original
) and reference structure (Path.u_reference
).Path.fit_to_reference()
andPath.to_path()
are used consecutively to generate a new universe (Path.u_fitted
) containing the fitted trajectory along with the correspondingPath.path
represented as annumpy.ndarray
. The method returns a tuple of the topology name and new trajectory name, which can be fed directly into anMDAnalysis.Universe
object after unpacking the tuple using the*
operator, as inMDAnalysis.Universe(*(top_name, newtraj_name))
.Parameters: - *align* – Align trajectory to atom selection
Path.ref_select
ofPath.u_reference
. IfTrue
, a universe containing an aligned trajectory is produced withPath.fit_to_reference()
[False
] - *filename* – filename for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from
Path.u_original
) with prefix prepended - *prefix* – prefix for auto-generating the new output filename
- *rmsdfile* – file name for writing the RMSD time series [
None
] - *mass_weighted* – do a mass-weighted RMSD fit
- *tol_mass* – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
- *flat* – represent
Path.path
with 2D (\(N_p\times 3N\))numpy.ndarray
; ifFalse
thenPath.path
is a 3D (\(N_p\times N\times 3\))numpy.ndarray
[False
]
Returns: A tuple of the topology name and new trajectory name.
- *align* – Align trajectory to atom selection
-
to_path
(fitted=False, select=None, flat=False)[source]¶ Generates a coordinate time series from the fitted universe trajectory.
Given a selection of N atoms from select, the atomic positions for each frame in the fitted universe (
Path.u_fitted
) trajectory (with \(N_p\) total frames) are appended sequentially to form a 3D or 2D (if flat isTrue
)numpy.ndarray
representation of the fitted trajectory (with dimensions \(N_p\times N\times 3\) or \(N_p\times 3N\), respectively).Parameters: - *fitted* – construct a
Path.path
from thePath.u_fitted
trajectory; ifFalse
thenPath.path
is generated with the trajectory fromPath.u_original
[False
] - *select* – the selection for constructing the coordinates of each frame in
Path.path
; ifNone
thenPath.path_select
is used, else it is overridden by select [None
] - *flat* – represent
Path.path
as a 2D (\(N_p\times 3N\))numpy.ndarray
; ifFalse
thenPath.path
is a 3D (\(N_p\times N\times 3\))numpy.ndarray
[False
]
Returns: numpy.ndarray representing a time series of atomic positions of an
MDAnalysis.core.AtomGroup.AtomGroup
selection fromPath.u_fitted.trajectory
Return type: :class
- *fitted* – construct a
- *universe* –
-
class
MDAnalysis.analysis.psa.
PSAPair
(npaths, i, j)[source]¶ Generate nearest neighbor and Hausdorff pair information between a pair of paths from an all-pairs comparison generated by
PSA
.The nearest neighbors for each path of a pair of paths is generated by
PSAPair.compute_nearest_neighbors()
and stores the result in a dictionary (nearest_neighbors
): each path has anumpy.ndarray
of the frames of its nearest neighbors, and anumpy.ndarray
of its nearest neighbor distancesPSAnalysis.D
. For example, nearest_neighbors[‘frames’] is a pair ofnumpy.ndarray
, the first being the frames of the nearest neighbors of the first path, i, the second being those of the second path, j.The Hausdorff pair for the pair of paths is found by calling
find_hausdorff_pair()
(locates the nearest neighbor pair having the largest overall distance separating them), which stores the result in a dictionary (hausdorff_pair
) containing the frames (indices) of the pair along with the corresponding (Hausdorff) distance. hausdorff_pair[‘frame’] contains a pair of frames in the first path, i, and the second path, j, respectively, that correspond to the Hausdorff distance between them.New in version 0.11.
Set up a
PSAPair
for a pair of paths that are part of aPSA
comparison of npaths total paths.Each unique pair of paths compared using
PSA
is related by their nearest neighbors (and corresponding distances) and the Hausdorff pair and distance.PSAPair
is a convenience class for calculating and encapsulating nearest neighbor and Hausdorff pair information for one pair of paths.Given npaths,
PSA
performs and all-pairs comparison among all paths for a total of :math:` ext{npaths}*( ext{npaths}-1)/2` unique comparisons. If distances between paths are computed, the all-pairs comparison can be summarized in a symmetric distance matrix whose upper triangle can be mapped to a corresponding distance vector form in a one-to-one manner. A particular comparison of a pair of paths in a given instance ofPSAPair
is thus unique identified by the row and column indices in the distance matrix representation (whether or not distances are actually computed), or a single ID (index) in the corresponding distance vector.Parameters: - *npaths* – int, total number of paths in
PSA
used to generate thisPSAPair
- *i* – int, row index (starting at 0) of the distance matrix
- *j* – int, column index (starting at 0) of the distance matrix
-
matrix_id
¶ (int, int), (row, column) indices of the location of this
PSAPair
in the corresponding pairwise distance matrix
-
pair_id
¶ int, ID of this
PSAPair
(the pair_id:math:^text{th} comparison) in the distance vector corresponding to the pairwise distance matrix
- *npaths* – int, total number of paths in
-
class
MDAnalysis.analysis.psa.
PSAnalysis
(universes, reference=None, ref_select='name CA', ref_frame=0, path_select=None, labels=None, targetdir=None)[source]¶ Perform Path Similarity Analysis (PSA) on a set of trajectories.
The analysis is performed with
PSAnalysis.run()
and stores the result in thenumpy.ndarray
distance matrixPSAnalysis.D
.PSAnalysis.run()
also generates a fitted trajectory and path from alignment of the original trajectories to a reference structure.New in version 0.8.
Setting up Path Similarity Analysis.
The mutual similarity between all unique pairs of trajectories are computed using a selected path metric.
Parameters: - *universes* – a list of universes (
MDAnalysis.Universe
object), each containing a trajectory - *reference* – reference coordinates;
MDAnalysis.Universe
object; ifNone
the first time step of the first item in trajs is used [None
] - *ref_select* –
The selection to operate on; can be one of:
- any valid selection string for
select_atoms()
that produces identical selections in mobile and reference; or - a dictionary
{'mobile':sel1, 'reference':sel2}
(theMDAnalysis.analysis.align.fasta2select()
function returns such a dictionary based on a ClustalW or STAMP sequence alignment); or - a tuple
(sel1, sel2)
When using 2. or 3. with sel1 and sel2 then these selections can also each be a list of selection strings (to generate an AtomGroup with defined atom order as described under Ordered selections).
- any valid selection string for
- *mass_weighted* – do a mass-weighted RMSD fit [
False
] - *tol_mass* – Reject match if the atomic masses for matched atoms differ by more than tol_mass [0.1]
- *ref_frame* – frame index to select frame from reference [0]
- *path_select* – atom selection composing coordinates of (fitted) path; if
None
then path_select is set to ref_select [None
] - *targetdir* – output files are saved there [.]
- *labels* – list of strings, names of trajectories to be analyzed
(
MDAnalysis.Universe
); ifNone
, defaults to trajectory names [None
]
-
universes
¶ list of
MDAnalysis.Universe
objects containing trajectories
-
u_reference
¶ MDAnalysis.Universe
object containing a reference structure
-
ref_select
¶ string, selection for
select_atoms()
to select frame fromPSAnalysis.u_reference
-
path_select
¶ string, selection for
select_atoms()
to select atoms to composePath.path
-
ref_frame
¶ int, frame index to select frame from
Path.u_reference
-
filename
¶ string, name of file to store calculated distance matrix (
PSAnalysis.D
)
-
paths
¶ list of
numpy.ndarray
objects representing the set/ensemble of fitted trajectories
-
D
¶ string, name of file to store calculated distance matrix (
PSAnalysis.D
)
-
cluster
(distArray, method='ward', count_sort=False, distance_sort=False, no_plot=False, no_labels=True, color_threshold=4)[source]¶ Cluster trajectories and optionally plot the dendrogram.
Parameters: - *method* – string, name of linkage criterion for clustering [
'ward'
] - *no_plot* – boolean, if
True
, do not render the dendrogram [False
] - *no_labels* – boolean, if
True
then do not label dendrogram [True
] - *color_threshold* – For brevity, let t be the color_threshold. Colors all the
descendent links below a cluster node k the same color if k is
the first node below the cut threshold t. All links connecting
nodes with distances greater than or equal to the threshold are
colored blue. If t is less than or equal to zero, all nodes are
colored blue. If color_threshold is None or ‘default’,
corresponding with MATLAB(TM) behavior, the threshold is set to
0.7*max(Z[:,2]). [
4
]]
Returns: list of indices representing the row-wise order of the objects after clustering
- *method* – string, name of linkage criterion for clustering [
-
generate_paths
(**kwargs)[source]¶ Generate paths, aligning each to reference structure if necessary.
Keywords: - align
Align trajectories to atom selection
PSAnalysis.ref_select
ofPSAnalysis.u_reference
[False
]- filename
strings representing base filename for fitted trajectories and paths [
None
]- infix
additional tag string that is inserted into the output filename of the fitted trajectory files [‘’]
- mass_weighted
do a mass-weighted RMSD fit
- tol_mass
Reject match if the atomic masses for matched atoms differ by more than tol_mass
- ref_frame
frame index to select frame from reference
- flat
represent
Path.path
as a 2D (\(N_p\times 3N\))numpy.ndarray
; ifFalse
thenPath.path
is a 3D (\(N_p\times N\times 3\))numpy.ndarray
[False
]- save
boolean; if
True
, pickle list of names for fitted trajectories [True
]- store
boolean; if
True
then writes each path (numpy.ndarray
) inPSAnalysis.paths
to compressed npz (numpy) files [False
]
The fitted trajectories are written to new files in the “/trj_fit” subdirectory in
PSAnalysis.targetdir
named “filename(trajectory)XXX*infix*_psa”, where “XXX” is a number between 000 and 999; the extension of each file is the same as its original. Optionally, the trajectories can also be saved in numpy compressed npz format in the “/paths” subdirectory inPSAnalysis.targetdir
for persistence and can be accessed as the attributePSAnalysis.paths
.
-
get_num_atoms
()[source]¶ Return the number of atoms used to construct the
Path`s in :class:`PSA
.Must run :method:`PSAnalysis.generate_paths()` prior to calling this method.
Returns: class:PSA‘s :class:`Path`s’ Return type: int, the number of atoms in
-
get_num_paths
()[source]¶ Return the number of paths in
PSA
.Must run :method:`PSAnalysis.generate_paths()` prior to calling this method.
Returns: class:PSA Return type: int, the number of paths in
-
get_pairwise_distances
(vectorform=False)[source]¶ Return the distance matrix (or vector) of pairwise path distances.
Must run :method:`PSAnalysis.run(store=True)` prior to calling this method.
Parameters: *vectorform* – boolean, if True
, return the distance vector instead [False
]Returns: numpy.ndarray representation of the distance matrix (or vector) Return type: :class
-
get_paths
()[source]¶ Return the paths in
PSA
.Must run :method:`PSAnalysis.generate_paths()` prior to calling this method.
Returns: class:numpy.ndarray representations of paths in PSA
Return type: list of
-
hausdorff_pairs
¶ Get the Hausdorff pair for each (unique) pairs of paths.
This method returns a list of Hausdorff pair information, where each element is a dictionary containing the pair of frames and the (Hausdorff) distance between a pair of paths. See :method:`PSAnalysis.psa_pairs` and
PSAPair.hausdorff_pair
for more information about accessing Hausdorff pair data.Must run :method:`PSAnalysis.run_pairs_analysis(hausdorff_pairs=True)` prior to calling this method.
Returns: list of all Hausdorff pairs (in distance vector order)
-
nearest_neighbors
¶ Get the nearest neighbors for each (unique) pair of paths.
This method returns a list of nearest neighbor information, where each element is a dictionary containing the nearest neighbor frames and distances between a pair of paths. See :method:`PSAnalysis.psa_pairs` and
PSAPair.nearest_neighbors
for more information about accessing nearest neighbor data.Must run :method:`PSAnalysis.run_pairs_analysis(neighbors=True)` prior to calling this method.
Returns: list of all nearest neighbors (in distance vector order)
-
plot
(filename=None, linkage='ward', count_sort=False, distance_sort=False, figsize=4.5, labelsize=12)[source]¶ Plot a clustered distance matrix using method linkage along with the corresponding dendrogram. Rows (and columns) are identified using the list of strings specified by
PSAnalysis.labels
.Parameters: - *filename* – string, save figure to filename [
None
] - *linkage* – string, name of linkage criterion for clustering [
'ward'
] - *count_sort* – boolean, see scipy.cluster.hierarchy.dendrogram [
False
] - *distance_sort* – boolean, see scipy.cluster.hierarchy.dendrogram [
False
] - *figsize* – set the vertical size of plot in inches [
4.5
] - *labelsize* – set the font size for colorbar labels; font size for path labels on
dendrogram default to 3 points smaller [
12
]
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to
pylab.imshow()
.- *filename* – string, save figure to filename [
-
plot_annotated_heatmap
(filename=None, linkage='ward', count_sort=False, distance_sort=False, figsize=8, annot_size=6.5)[source]¶ Plot a clustered distance matrix using method linkage with annotated distances in the matrix. Rows (and columns) are identified using the list of strings specified by
PSAnalysis.labels
.Parameters: - *filename* – string, save figure to filename [
None
] - *count_sort* – boolean, see scipy.cluster.hierarchy.dendrogram [
False
] - *distance_sort* – boolean, see scipy.cluster.hierarchy.dendrogram [
False
] - *linkage* – string, name of linkage criterion for clustering [
'ward'
] - *figsize* – set the vertical size of plot in inches [
8
] - *annot_size* – float, font size of annotation labels on heat map [
6.5
]
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to
pylab.imshow()
.- *filename* – string, save figure to filename [
-
plot_nearest_neighbors
(filename=None, idx=0, labels=('Path 1', 'Path 2'), figsize=4.5, multiplot=False, aspect_ratio=1.75, labelsize=12)[source]¶ Plot nearest neighbor distances as a function of normalized frame number (mapped to the interval [0,1]).
Parameters: - *filename* – string, save figure to filename [
None
] - *idx* – integer, index of path (pair) comparison to plot [
0
] - *labels* – (string, string), pair of names to label nearest neighbor distance
curves [
('Path 1', 'Path 2')
] - *figsize* – float, set the vertical size of plot in inches [
4.5
] - *multiplot* – boolean, set to
True
to enable plotting multiple nearest neighbor distances on the same figure [False
] - *aspect_ratio* – float, set the ratio of width to height of the plot [
1.75
] - *labelsize* – set the font size for colorbar labels; font size for path labels on
dendrogram default to 3 points smaller [
12
]
If filename is supplied then the figure is also written to file (the suffix determines the file type, e.g. pdf, png, eps, ...). All other keyword arguments are passed on to
pylab.imshow()
.- *filename* – string, save figure to filename [
-
psa_pairs
¶ Get the list of :class:`PSAPair`s for each pair of paths.
:method:`psa_pairs` is a list of
PSAPair
whose elements are pairs of paths that have been compared using :method:`PSAnalysis.run_pairs_analysis()`. EachPSAPair
contains nearest neighbor and Hausdorff pair information specific to a pair of paths. The nearest neighbor frames and distances for aPSAPair
can be accessed in the nearest neighbor dictionary using the keys ‘frames’ and ‘distances’, respectively. E.g.,PSAPair.nearest_neighbors['distances']
returns a pair ofnumpy.ndarray
corresponding to the nearest neighbor distances for each path. Similarly, Hausdorff pair information can be accessed usingPSAPair.hausdorff_pair
with the keys ‘frames’ and ‘distance’.Must run :method:`PSAnalysis.run_pairs_analysis()` prior to calling this method.
Returns: class:PSAPair objects (in distance vector order) Return type: list of all
-
run
(**kwargs)[source]¶ Perform path similarity analysis on the trajectories to compute the distance matrix.
A number of parameters can be changed from the defaults. The result is stored as the array
PSAnalysis.D
.Keywords: - metric
selection string specifying the path metric to measure pairwise distances among
PSAnalysis.paths
['hausdorff'
]- start, stop, step
start and stop frame index with step size: analyze
trajectory[start:stop:step]
[None
]- store
boolean; if
True
then writesPSAnalysis.D
to text and compressed npz (numpy) files [True
]- filename
string, filename to save
PSAnalysis.D
-
run_pairs_analysis
(**kwargs)[source]¶ Perform PSA Hausdorff (nearest neighbor) pairs analysis on all unique pairs of paths in
PSAnalysis.paths
.Partial results can be stored in separate lists, where each list is indexed according to distance vector convention (i.e., element (i,j) in distance matrix representation corresponds to element \(s=N*i+j-(i+1)*(i+2)\) in distance vector representation, which is the \(s^ ext{th}\) comparison). For each unique pair of paths, the nearest neighbors for that pair can be stored in
NN
and the Hausdorff pair inHP
.PP
stores the full information of Hausdorff pairs analysis that is available for each pair of path, including nearest neighbors lists and the Hausdorff pairs.Keywords: - start, stop, step
start and stop frame index with step size: analyze
trajectory[start:stop:step]
[None
]- neighbors
boolean; if
True
, then stores dictionary of nearest neighbor frames/distances inPSAnalysis.NN
[False
]- hausdorff_pairs
boolean; if
True
, then stores dictionary of Hausdorff pair frames/distances inPSAnalysis.HP
[False
]
-
save_paths
(filename=None)[source]¶ Save fitted
PSAnalysis.paths
to numpy compressed npz files.Parameters: *filename* – string, specifies filename [ None
]The data are saved with
numpy.savez_compressed()
in the directory specified byPSAnalysis.targetdir
.
-
save_result
(filename=None)[source]¶ Save distance matrix
PSAnalysis.D
to a numpy compressed npz file and text file.Parameters: *filename* – string, specifies filename [ None
]The data are saved with
numpy.savez_compressed()
andnumpy.savetxt()
in the directory specified byPSAnalysis.targetdir
.
- *universes* – a list of universes (