Skip to content

Creating dendrogram objects

idendrogram.idendrogram

__init__(link_factory=lambda x: ClusterLink(**x), node_factory=lambda x: ClusterNode(**x), axis_label_factory=lambda x: AxisLabel(**x))

Initializes the idendrogram object, optionally with different formatting defaults.

Parameters:

Name Type Description Default
link_factory Callable[[Dict], ClusterLink]

idendrogram.ClusterLink factory that can be used to override link formatting defaults.

lambda x: ClusterLink(**x)
node_factory Callable[[Dict], ClusterNode]

idendrogram.ClusterNode factory that can be used to override node formatting defaults.

lambda x: ClusterNode(**x)
axis_label_factory Callable[[Dict], AxisLabel]

idendrogram.AxisLabel factory that can be used to override axis label formatting defaults.

lambda x: AxisLabel(**x)
Example

Customizing the Dendrogram to show smaller nodes and dashed link lines:

#define a subclass of `ClusterNode` and redefine radius and text label sizes
@dataclass
SmallClusterNode(ClusterNode):
    radius: 3
    labelsize: 3

#define a subclass of `ClusterLink` and redefine stroke dash pattern
@dataclass
DashedLink(ClusterLink):
    strokedash: List = field(default_factory= lambda: [1, 5, 5, 1])

#instantiate the idendrogram object with the factories for links and nodes
dd = idendrogram.idendrogram(
    link_factory=lambda x: DashedLink(**x), 
    node_factory=lambda x: SmallClusterNode(**x)
)

#proceed as usual
cdata = idendrogram.ClusteringData(
    linkage_matrix=model, 
    cluster_assignments=cluster_assignments, 
    threshold=threshold
)
dd.set_cluster_info(cdata)
dendrogram = dd.create_dendrogram().to_altair()

convert_scipy_dendrogram(R, compute_nodes=True, node_label_func=callbacks.cluster_id_if_cluster, node_hover_func=callbacks.default_hover, link_color_func=callbacks.link_painter())

Converts a dictionary representing a dendrogram generated by SciPy to idendrogram.Dendrogram object.

Parameters:

Name Type Description Default
R ScipyDendrogram

Dictionary as generated by SciPy's dendrogram(..., no_plot=True) or equivalent

required
compute_nodes bool

Whether to compute nodes (requires idendrogram.ClusteringData to be set via idendrogram.idendrogram.set_cluster_info and can be computationally expensive on large datasets).

True
node_label_func Callable[[], str]

Callback function to generate dendrogram node labels. See idendrogram.idendrogram.create_dendrogram for usage details.

callbacks.cluster_id_if_cluster
node_hover_func Callable[[], Union[Dict, str]]

Callback function to generate dendrogram hover text labels. See idendrogram.idendrogram.create_dendrogram for usage details.

callbacks.default_hover

Returns:

Name Type Description
Dendrogram Dendrogram

[idendrogram.Dendrogram] object

Example
#your clustering workflow
Z = scipy.cluster.hierarchy.linkage(*)
threshold = 42
cluster_assignments =  scipy.cluster.hierarchy.fcluster(Z, threshold=threshold, *)
R = scipy.cluster.hierarchy.dendrogram(Z, no_plot=True, get_leaves=True, *)        

#Render scipy's dendrogram in plotly without any additional modifications
dd = idendrogram.idendrogram()        
dendrogram = dd.convert_scipy_dendrogram(R, compute_nodes = False)
dendrogram.to_plotly()

create_dendrogram(truncate_mode='level', p=4, sort_criteria='distance', sort_descending=False, link_color_func=callbacks.link_painter(), leaf_label_func=callbacks.counts, compute_nodes=True, node_label_func=callbacks.cluster_id_if_cluster, node_hover_func=callbacks.default_hover)

Creates an idendrogram dendrogram object.

Parameters:

Name Type Description Default
truncate_mode level | lastp | None

Truncation mode used to condense the dendrogram. See scipy's dendrogram() for details.

'level'
p int

truncate_mode parameter. See scipy's dendrogram() for details.

4
sort_criteria count | distance

Node order criteria. count sorts by number of original observations in the node, distance by the distance between direct descendents of the node).

'distance'
sort_descending bool

Accompanying parameter to sort_criteria to indicate whether sorting should be descending.

False
link_color_func Callable[[ClusteringData, int], str]

A callable function that determines colors of nodes and links. See below for details.

callbacks.link_painter()
leaf_label_func Callable[[ClusteringData, int], str]

A callable function that determines leaf node labels. See below for details.

callbacks.counts
compute_nodes bool

Whether nodes should be computed (can be computationally expensive on large datasets).

True
node_label_func Callable[[ClusteringData, int], str]

A callable function that determines node text labels. See below for details.

callbacks.cluster_id_if_cluster
node_hover_func Callable[[ClusteringData, int], Dict[str, str]]

A callable function that determines node hover text. See below for details.

callbacks.default_hover

Returns:

Name Type Description
Dendrogram Dendrogram

[idendrogram.Dendrogram] object

Usage notes

For how-to examples, see How-to Guide.

SciPy's dendrogram parameters

idendrogram uses SciPy to generate the initial dendrogram structure and passes a few parameters directly to scipy.cluster.hierarchy.dendrogram:

  • truncate_mode and p are passed without modifications
  • sort_criteria and sort_descending map to count_sort and distance_sort
  • leaf_color_func and leaf_label_func are passed on with an additional wrapper that enables access to the linkage matrix (see below for details)

To fully understand these parameters, it is easiest to explore scipy's documentation directly.

Callback functions

idendrogram uses callbacks to allow customizing link/node colors (link_color_func), leaf axis labels (leaf_label_func), node labels (node_label_func) and tooltips (node_hover_func). All callback functions will be called with 2 parameters:

  • an instance of idendrogram.ClusteringData object that provides access to linkage matrix and other clustering information.
  • linkage ID (integer)

The return types should be:

  • link_color_func should return the color for the link/node represented by the linkage ID.
  • leaf_label_func should return the text label to be used for the axis label of the leaf node represented by the linkage ID.
  • node_label_func should return the text label to be used for the node represented by the linkage ID.
  • node_hover_func should return a dictionary of key:value pairs that will be displayed in a tooltip of the node represented by the linkage ID.

This setup allows nearly endless customization - examples are provided in How-to Guide.

set_cluster_info(cluster_data)

Sets the clustering data (linkage matrix and other parameters) that are required for some of the dendrogram generation features.

Parameters:

Name Type Description Default
cluster_data ClusteringData required
Example
#your clustering workflow
Z = scipy.cluster.hierarchy.linkage(...)
threshold = 42
cluster_assignments =  scipy.cluster.hierarchy.fcluster(Z, threshold=threshold, ...)        

#dendrogram creation
dd = idendrogram.idendrogram()
cdata = idendrogram.ClusteringData(
    linkage_matrix=Z, 
    cluster_assignments=cluster_assignments, 
    threshold=threshold 
)
dd.set_cluster_info(cdata)