Visualization Module Details

The Visualization class provides a collection of plotting tools tailored to dimensionality reduction results such as PCA or UMAP. It helps reveal data structures, clusters, and relationships between variables through intuitive 2D and 3D visualizations.

Overview

This module is designed to make sense of high-dimensional data by projecting it into interpretable low-dimensional spaces and offering multiple visualization options. It enables analysis of principal components, cluster distributions, and variable relationships through correlation plots.

Key Capabilities

  • Principal Plane Visualization: Display a 2D projection of the first two components, with or without cluster labels.

  • Correlation Circle Plot: Analyze how original variables relate to principal components.

  • 2D Cluster Scatter Plot: Quickly visualize cluster distributions in two selected dimensions.

  • 3D Cluster Scatter Plot: Explore spatial groupings of data points in three dimensions with optional customization.

Use Cases

Use the Visualization module when:

  • You want to interpret results from PCA, UMAP, or any other dimensionality reduction method.

  • You need to visualize how data clusters or groups behave in reduced space.

  • You want to analyze relationships between variables and components using intuitive plots.

Usage Example

Below is a minimal example of how to use the Visualization class. It assumes you have already computed principal components and optionally cluster labels:

import pandas as pd
import numpy as np
from riemannian_stats.visualization import Visualization

# Load data (you can use a processed dataset or raw input)
df = pd.read_csv("path/to/data.csv", sep=",", decimal=".")

# Dummy principal components (e.g., from PCA or UMAP)
components = np.random.rand(len(df), 2)

# Optional cluster labels
clusters = df["cluster"].values if "cluster" in df.columns else None

# Initialize the visualization object
viz = Visualization(data=df, components=components, explained_inertia=78.3, clusters=clusters)

# Plot the principal plane
viz.plot_principal_plane_with_clusters(title="Sample Data")

For comprehensive usage examples with real datasets, refer to the “How to Use Riemannian STATS” section, available both on the homepage (index) and in the sidebar navigation of the documentation.

API Documentation

class riemannian_stats.visualization.Visualization(data: DataFrame, components: ndarray | None = None, explained_inertia: float = 0.0, clusters: ndarray | None = None)

Bases: object

A class for generating visualizations of UMAP or PCA results, including projections, clusters, and correlation circles.

This class is designed to assist in interpreting dimensionality-reduced data through visual inspection.

Parameters:
  • data (pd.DataFrame) – The dataset to be visualized (must include index and optionally cluster columns).

  • components (np.ndarray, optional) – Principal component matrix with at least two dimensions.

  • explained_inertia (float, optional) – The total explained inertia (variance) by the components.

  • clusters (np.ndarray, optional) – Array of cluster labels corresponding to the data rows.

data

Read-only access to the input dataset.

Type:

pd.DataFrame

components

Read-only access to PCA/UMAP component matrix.

Type:

np.ndarray or None

explained_inertia

Read-only access to the explained inertia percentage.

Type:

float

clusters

Read-only access to cluster labels.

Type:

np.ndarray or None

Methods:

__get_adaptive_colormap(n_clusters): Returns a suitable colormap based on number of clusters. plot_principal_plane(title=””, figsize=(10, 8)): 2D projection of the components without clusters. plot_principal_plane_with_clusters(title=””, figsize=(10, 8)): 2D projection of the components colored by clusters. plot_correlation_circle(correlations, title=””, scale=1, draw_circle=True, figsize=(8, 8)):

Visualizes correlation of variables with the principal components.

plot_2d_scatter_with_clusters(x_col, y_col, cluster_col, title=””, figsize=(10, 8)):

2D scatter plot of selected features colored by clusters.

plot_3d_scatter_with_clusters(x_col, y_col, z_col, cluster_col, title=””, figsize=(12, 8), s=50, alpha=0.7):

3D scatter plot of selected features colored by clusters.

property clusters: ndarray | None

Returns the cluster labels for each data point.

property components: ndarray | None

Returns the matrix of principal components.

property data: DataFrame

Returns the data used for visualization.

property explained_inertia: float

Returns the explained inertia percentage.

plot_2d_scatter_with_clusters(x_col: str, y_col: str, cluster_col: str, title: str = '', figsize: Tuple[int, int] = (10, 8)) None

Generates a 2D scatter plot colored by cluster using a wide color palette.

Parameters:
  • x_col (str) – Name of the column for the x-axis.

  • y_col (str) – Name of the column for the y-axis.

  • cluster_col (str) – Name of the column containing cluster labels.

  • title (str, optional) – Custom title to add above the default title.

  • figsize (tuple, optional) – Figure size. Defaults to (10, 8).

plot_3d_scatter_with_clusters(x_col: str, y_col: str, z_col: str, cluster_col: str, title: str = '', figsize: Tuple[int, int] = (12, 8), s: int = 50, alpha: float = 0.7) None

Creates a 3D scatter plot colored by cluster using a wide colormap for better differentiation.

Parameters:
  • x_col (str) – Name of the column for the x-axis.

  • y_col (str) – Name of the column for the y-axis.

  • z_col (str) – Name of the column for the z-axis.

  • cluster_col (str) – Name of the column containing cluster labels.

  • title (str, optional) – Custom title to add above the default title.

  • figsize (tuple, optional) – Figure size. Defaults to (12, 8).

  • s (int, optional) – Size of the points. Defaults to 50.

  • alpha (float, optional) – Transparency of the points. Defaults to 0.7.

plot_correlation_circle(correlations: DataFrame, title: str = '', scale: float = 1, draw_circle: bool = True, figsize: Tuple[int, int] = (8, 8)) None

Generates a correlation circle for the principal components.

Parameters:
  • correlations (pandas.DataFrame) – DataFrame containing the correlations for each variable.

  • title (str, optional) – Custom title to add above the default title.

  • scale (float, optional) – Scaling factor for the arrows. Defaults to 1.

  • draw_circle (bool, optional) – Whether to draw the unit circle. Defaults to True.

  • figsize (tuple, optional) – Figure size. Defaults to (8, 8).

plot_principal_plane(title: str = '', figsize: Tuple[int, int] = (10, 8)) None

Generates a plot of the principal plane using the principal components.

Parameters:
  • title (str, optional) – Custom title to add above the default title.

  • figsize (tuple, optional) – Figure size. Defaults to (10, 8).

plot_principal_plane_with_clusters(title: str = '', figsize: Tuple[int, int] = (10, 8)) None

Generates a plot of the principal plane with points colored according to clusters, using a color map that supports many distinct clusters.

Parameters:
  • title (str, optional) – Custom title to add above the default title.

  • figsize (tuple, optional) – Figure size. Defaults to (10, 8).