cnnclustering - A Python module for common-nearest-neighbour clustering¶
Go to:
cluster¶
The functionality of this module is primarily exposed and bundled by the
cnnclustering.cluster.Clustering
class. For hierarchical clusterings
cnnclustering.cluster.ClusteringChild
is used, too.
-
class
cnnclustering.cluster.
Clustering
(input_data=None, neighbours_getter=None, neighbours=None, neighbour_neighbours=None, metric=None, similarity_checker=None, queue=None, fitter=None, predictor=None, labels=None)¶ -
evaluate
(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, unicode plot_style: str = u'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, unicode annotate_pos: str = u'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True)¶ Returns a 2D plot of an original data set or a cluster result
- Args: ax: The Axes instance to which to add the plot. If
None, a new Figure with Axes will be created.
- clusters:
Cluster numbers to include in the plot. If None, consider all.
- original:
Allows to plot the original data instead of a cluster result. Overrides clusters. Will be considered True, if no cluster result is present.
- plot_style:
The kind of plotting method to use.
“dots”,
ax.plot()
“scatter”,
ax.scatter()
“contour”,
ax.contour()
“contourf”,
ax.contourf()
- parts:
Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on points.
- points:
Use a slice (start, stop, stride) on the data points before plotting.
- dim:
Use these two dimensions for plotting. If None, uses (0, 1).
- mask:
Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).
- annotate:
If there is a cluster result, plot the cluster numbers. Uses annotate_pos to determinte the position of the annotations.
- annotate_pos:
Where to put the cluster number annotation. Can be one of:
“mean”, Use the cluster mean
“random”, Use a random point of the cluster
Alternatively a list of x, y positions can be passed to set a specific point for each cluster (Not yet implemented)
- annotate_props:
Dictionary of keyword arguments passed to
ax.annotate()
.- ax_props:
Dictionary of ax properties to apply after plotting via
ax.set(**ax_props)()
. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_props:
Dictionary of keyword arguments passed to various functions (
plot.plot_dots()
etc.) with different meaning to format cluster plotting. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_noise_props:
Like plot_props but for formatting noise point plotting.
- hist_props:
Dictionary of keyword arguments passed to functions that involve the computing of a histogram via numpy.histogram2d.
- free_energy:
If True, converts computed histograms to pseudo free energy surfaces.
- Returns
Figure, Axes and a list of plotted elements
-
fit
(self, double radius_cutoff: float, cnn_cutoff: int, member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None, sort_by_size: bool = True, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False) → None¶ Execute clustering procedure
- Parameters
radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
member_cutoff – Valid clusters need to have at least this many members. Passed on to
Labels.sort_by_size()
if sort_by_size is True. Has no effect otherwise and valid clusters have at least one member.max_clusters – Keep only the largest max_clusters clusters. Passed on to
Labels.sort_by_size()
if sort_by_size is True. Has no effect otherwise.cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.
sort_by_size – Weather to sort (and trim) the created
Labels
instance. See alsoLabels.sort_by_size()
.info – Wether to modify
Labels.meta
information for this clustering.record – Wether to create a
Record
instance for this clustering which is appended to theSummary
.record_time – Wether to time clustering execution.
v – Be chatty.
purge – If True, force reinitialisation of cluster label assignments.
-
property
hierarchy_level
¶
-
property
input_data
¶
-
isolate
(self, bool purge: bool = True)¶ Split input data into childs based on cluster labels
-
property
labels
¶
-
pie
(self, ax=None, pie_props=None)¶
-
predict
(self, other: Type[u'Clustering'], double radius_cutoff: float, cnn_cutoff: int, clusters: Optional[Sequence[int]] = None, cnn_offset: Optional[int] = None, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)¶ Execute prediction procedure
- Parameters
other –
cnnclustering.cluster.Clustering
instance for which cluster labels should be predicted.radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
cluster – Sequence of cluster labels that should be included in the prediction.
cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.
purge – If True, force re-initialisation of predicted cluster labels.
-
reel
(self, depth: Optional[int] = None) → None¶ Wrap up label assignments of lower hierarchy levels
- Parameters
depth – How many lower levels to consider. If None,
all. (consider) –
-
summarize
(self, ax=None, unicode quantity: str = u'execution_time', treat_nan: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None)¶ Generate a 2D plot of record values
Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).
- Parameters
ax – Matplotlib Axes to plot on. If None, a new Figure with Axes will be created.
quantity –
Record value to visualise:
”time”
”clusters”
”largest”
”noise”
treat_nan – If not None, use this value to pad nan-values.
ax_props – Used to style ax.
contour_props – Passed on to contour.
-
property
summary
¶
-