5.1. clustering

Functions to cluster seismograms by a range of constraints.

copyright:Calum Chamberlain, Chet Hopp.
license:GNU Lesser General Public License, Version 3 (https://www.gnu.org/copyleft/lesser.html)
clustering.SVD(stream_list)[source]

Function to compute the SVD of a number of templates and return the singular vectors and singular values of the templates.

Parameters:stream_list (List of Obspy.Stream) – List of the templates to be analysed
Returns:SVector(list of ndarray), SValues(list) for each channel, Uvalues(list of ndarray) for each channel, stachans, List of String (station.channel)

Note

We recommend that you align the data before computing the SVD, e.g., the P-arrival on all templates for the same channel should appear at the same time in the trace. See the stacking.align_traces function for a way to do this.

clustering.SVD_2_stream(SVectors, stachans, k, sampling_rate)[source]

Function to convert the singular vectors output by SVD to streams, one for each singular vector level, for all channels.

Parameters:
  • SVectors (List of np.ndarray) – Singular vectors
  • stachans (List of Strings) – List of station.channel Strings
  • k (int) – Number of streams to return = number of SV’s to include
  • sampling_rate (float) – Sampling rate in Hz
Returns:

SVstreams, List of Obspy.Stream, with SVStreams[0] being composed of the highest rank singular vectors.

clustering.cluster(template_list, show=True, corr_thresh=0.3, save_corrmat=False, cores=u'all', debug=1)[source]

Function to take a set of templates and cluster them, will return groups as lists of streams. Clustering is done by computing the cross-channel correlation sum of each stream in stream_list with every other stream in the list. Scipy.cluster.hierachy functions are then used to compute the complete distance matrix, where distance is 1 minus the normalised cross-correlation sum such that larger distances are less similar events. Groups are then created by clustering the distance matrix at distances less than 1 - corr_thresh.

Will compute the distance matrix in parallel, using all available cores

Parameters:
  • stream_list – List of templates to compute clustering for
  • show (bool) – plot linkage on screen if True, defaults to True
  • corr_thresh (float) – Cross-channel correlation threshold for grouping
  • save_corrmat (bool) – If True will save the distance matrix to dist_mat.npy in the local directory.
  • cores (int) – numebr of cores to use when computing the distance matrix, defaults to ‘all’ which will work out how many cpus are available and hog them.
  • debug (int) – Level of debugging from 1-5, higher is more output, currently only level 1 implimented.
Returns:

List of groups with each group a list of streams making up that group.

clustering.corr_cluster(trace_list, thresh=0.9)[source]

Group traces based on correlations above threshold with the stack - will run twice, once with a lower threshold, then again with your threshold to remove large outliers.

Parameters:
  • trace_list (list of obspy.Trace) – Traces to compute similarity between
  • thresh (float) – Correlation threshold between -1-1
Returns:

np.ndarray of bool

Note

We recommend that you align the data before computing the clutsering, e.g., the P-arrival on all templates for the same channel should appear at the same time in the trace. See the stacking.align_traces function for a way to do this

clustering.cross_chan_coherence(st1, st2, i=0)[source]

Function to determine the cross-channel coherancy between two streams of multichannel seismic data.

Parameters:
  • st1 (obspy Stream) – Stream one
  • st2 (obspy Stream) – Stream two
  • i (int) – index used for parallel async processing, returned unaltered
Returns:

cross channel coherence, float - normalized by number of channels, if i, returns tuple of (cccoh, i) where i is int, as intput.

clustering.dist_mat_km(catalog)[source]

Function to compute the distance matrix for all events in a catalog - will give physical distance in kilometers.

Parameters:catalog (List of obspy.Catalog) – Catalog for which to compute the distance matrix
Returns:ndarray - distance matrix
clustering.distance_matrix(stream_list, cores=1)[source]

Function to compute the distance matrix for all templates - will give distance as 1-abs(cccoh), e.g. a well correlated pair of templates will have small distances, and an equally well correlated reverse image will have the same distance as a positively correlated image - this is an issue.

Parameters:
  • stream_list (List of obspy.Streams) – List of the streams to compute the distance matrix for
  • cores (int) – Number of cores to parallel process using, defaults to 1.
Returns:

ndarray - distance matrix

clustering.empirical_SVD(stream_list, linear=True)[source]

Empirical subspace detector generation function. Takes a list of templates and computes the stack as the first order subspace detector, and the differential of this as the second order subspace detector following the emprical subspace method of Barrett & Beroza, 2014 - SRL.

Parameters:
  • stream_list (list of stream) – list of template streams to compute the subspace detectors from
  • linear (bool) – Set to true by default to compute the linear stack as the first subspace vector, False will use the phase-weighted stack as the first subspace vector.
Returns:

list of two streams

clustering.extract_detections(detections, templates, contbase_list, extract_len=90.0, outdir=None, extract_Z=True, additional_stations=[])[source]

Function to extract the waveforms associated with each detection in a list of detections for the template, template. Waveforms will be returned as a list of obspy.Streams containing segments of extract_len. They will also be saved if outdir is set. The default is unset. The default extract_len is 90 seconds per channel.

Parameters:
  • detections (list tuple of of :class: datetime.datetime, string) – List of datetime objects, and their associated template name.
  • templates (list of tuple of string and :class: obspy.Stream) – A list of the tuples of the template name and the template Stream used to detect detections.
  • contbase_list (list of tuple of string) – List of tuples of the form [‘path’, ‘type’, ‘network’] Where path is the path to the continuous database, type is the directory structure, which can be either Yyyyy/Rjjj.01, which is the standard IRIS Year, julian day structure, or, yyyymmdd which is a single directory for every day.
  • extract_len (float) – Length to extract around the detection (will be equally cut around the detection time) in seconds. Default is 90.0.
  • outdir (bool or str) – Default is None, with None set, no files will be saved, if set each detection will be saved into this directory with files named according to the detection time, NOT than the waveform start time. Detections will be saved into template subdirectories.
  • extract_Z (bool) – Set to True to also extract Z channels for detections delays will be the same as horizontal channels, only applies if only horizontal channels were used in the template.
  • additional_stations (list of tuple) – List of stations, chanels and networks to also extract data for using an average delay.
Returns:

list of :class: obspy.Stream

clustering.group_delays(stream_list)[source]

Function to group template waveforms according to their delays.

Parameters:stream_list (list of obspy.Stream) – List of the waveforms you want to group
Returns:list of List of obspy.Streams where each initial list is a group with the same delays.
clustering.re_thresh_csv(path, old_thresh, new_thresh, chan_thresh)[source]

Function to remove detections by changing the threshold, can only be done to remove detection by increasing threshold, threshold lowering will have no effect.

Parameters:
  • path (str) – Path to the .csv detection file
  • old_thresh (float) – Old threshold MAD multiplier
  • new_thresh (float) – New threhsold MAD multiplier
  • chan_thresh (int) – Minimum number of channels for a detection

returns: List of detections

clustering.space_cluster(catalog, d_thresh, show=True)[source]

Function to cluster a catalog by distance only - will compute the matrix of physical distances between events and utilize the scipy.clusering.hierarchy module to perform the clustering.

Parameters:
  • catalog (obspy.Catalog) – Catalog of events to clustered
  • d_thresh (float) – Maximum inter-event distance threshold
Returns:

list of Catalog classes

clustering.space_time_cluster(detections, t_thresh, d_thresh)[source]

Function to cluster detections in space and time, use to seperate repeaters from other events.

Parameters:
  • detections (list) – List of tuple of tuple of location (lat, lon, depth (km)), and time as a datetime object
  • t_thresh (float) – Maximum inter-event time threshold in seconds
  • d_thresh (float) – Maximum inter-event distance in km
Returns:

List of tuple (detections, clustered) and list of indices of clustered detections