cnnclustering - A Python module for common-nearest-neighbour clustering

Go to:

cluster

The functionality of this module is primarily exposed and bundled by the cnnclustering.cluster.Clustering class. For hierarchical clusterings cnnclustering.cluster.ClusteringChild is used, too.

class cnnclustering.cluster.Clustering(input_data=None, neighbours_getter=None, neighbours=None, neighbour_neighbours=None, metric=None, similarity_checker=None, queue=None, fitter=None, predictor=None, labels=None)
evaluate(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, unicode plot_style: str = u'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, unicode annotate_pos: str = u'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True)

Returns a 2D plot of an original data set or a cluster result

Args: ax: The Axes instance to which to add the plot. If

None, a new Figure with Axes will be created.

clusters:

Cluster numbers to include in the plot. If None, consider all.

original:

Allows to plot the original data instead of a cluster result. Overrides clusters. Will be considered True, if no cluster result is present.

plot_style:

The kind of plotting method to use.

  • “dots”, ax.plot()

  • “scatter”, ax.scatter()

  • “contour”, ax.contour()

  • “contourf”, ax.contourf()

parts:

Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on points.

points:

Use a slice (start, stop, stride) on the data points before plotting.

dim:

Use these two dimensions for plotting. If None, uses (0, 1).

mask:

Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).

annotate:

If there is a cluster result, plot the cluster numbers. Uses annotate_pos to determinte the position of the annotations.

annotate_pos:

Where to put the cluster number annotation. Can be one of:

  • “mean”, Use the cluster mean

  • “random”, Use a random point of the cluster

Alternatively a list of x, y positions can be passed to set a specific point for each cluster (Not yet implemented)

annotate_props:

Dictionary of keyword arguments passed to ax.annotate().

ax_props:

Dictionary of ax properties to apply after plotting via ax.set(**ax_props)(). If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

plot_props:

Dictionary of keyword arguments passed to various functions (plot.plot_dots() etc.) with different meaning to format cluster plotting. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

plot_noise_props:

Like plot_props but for formatting noise point plotting.

hist_props:

Dictionary of keyword arguments passed to functions that involve the computing of a histogram via numpy.histogram2d.

free_energy:

If True, converts computed histograms to pseudo free energy surfaces.

Returns

Figure, Axes and a list of plotted elements

fit(self, double radius_cutoff: float, cnn_cutoff: int, member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None, sort_by_size: bool = True, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)None

Execute clustering procedure

Parameters
  • radius_cutoff – Neighbour search radius.

  • cnn_cutoff – Similarity criterion.

  • member_cutoff – Valid clusters need to have at least this many members. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise and valid clusters have at least one member.

  • max_clusters – Keep only the largest max_clusters clusters. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise.

  • cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.

  • sort_by_size – Weather to sort (and trim) the created Labels instance. See also Labels.sort_by_size().

  • info – Wether to modify Labels.meta information for this clustering.

  • record – Wether to create a Record instance for this clustering which is appended to the Summary.

  • record_time – Wether to time clustering execution.

  • v – Be chatty.

  • purge – If True, force reinitialisation of cluster label assignments.

property hierarchy_level
property input_data
isolate(self, bool purge: bool = True)

Split input data into childs based on cluster labels

property labels
pie(self, ax=None, pie_props=None)
predict(self, other: Type[u'Clustering'], double radius_cutoff: float, cnn_cutoff: int, clusters: Optional[Sequence[int]] = None, cnn_offset: Optional[int] = None, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)

Execute prediction procedure

Parameters
  • othercnnclustering.cluster.Clustering instance for which cluster labels should be predicted.

  • radius_cutoff – Neighbour search radius.

  • cnn_cutoff – Similarity criterion.

  • cluster – Sequence of cluster labels that should be included in the prediction.

  • cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.

  • purge – If True, force re-initialisation of predicted cluster labels.

reel(self, depth: Optional[int] = None)None

Wrap up label assignments of lower hierarchy levels

Parameters
  • depth – How many lower levels to consider. If None,

  • all. (consider) –

summarize(self, ax=None, unicode quantity: str = u'execution_time', treat_nan: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None)

Generate a 2D plot of record values

Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).

Parameters
  • ax – Matplotlib Axes to plot on. If None, a new Figure with Axes will be created.

  • quantity

    Record value to visualise:

    • ”time”

    • ”clusters”

    • ”largest”

    • ”noise”

  • treat_nan – If not None, use this value to pad nan-values.

  • ax_props – Used to style ax.

  • contour_props – Passed on to contour.

property summary
class cnnclustering.cluster.ClusteringChild(parent, *args, **kwargs)

Clustering subclass.

Increments the hierarchy level of the parent object when instantiated.

parent

Weak reference to parent