RiemannianAnalysis Module Details

The RiemannianAnalysis class combines the power of Uniform Manifold Approximation and Projection (UMAP) with Riemannian geometry, enabling a more insightful exploration of high-dimensional data.

Overview

This module extends UMAP by incorporating Riemannian-based weighting to better capture the intrinsic geometry of the data. It enables more meaningful representations, particularly in contexts where non-Euclidean distance structures are important.

Key Capabilities

  • UMAP Dimensionality Reduction: Applies UMAP for nonlinear dimensionality reduction with customizable parameters.

  • Riemannian Distance Weighting: Integrates Riemannian weights to enhance pairwise similarity computation.

  • Custom Covariance and Correlation: Computes covariance and correlation matrices adapted to the Riemannian structure of the dataset.

  • Riemannian PCA: Performs principal component analysis using geometry-aware transformations.

  • Correlation with Components: Computes variable-to-component correlations in Riemannian space.

Use Cases

This module is especially useful in the following scenarios:

  • High-dimensional datasets where traditional methods fail to capture intrinsic structures.

  • Applications in neuroscience, biomechanics, and other fields that benefit from non-Euclidean analysis.

  • Scenarios requiring geometry-informed PCA and correlation analysis.

Usage Example

Here’s a simple example to demonstrate how to use the RiemannianAnalysis class with a dataset loaded using pandas:

import pandas as pd
from riemannian_stats.riemannian_analysis import RiemannianAnalysis

# Load your high-dimensional dataset
df = pd.read_csv("path/to/data.csv", sep=",", decimal=".")

# Create an analysis instance
analysis = RiemannianAnalysis(df, n_neighbors=5, min_dist=0.1, metric="euclidean")

# Compute the Riemannian correlation matrix
corr_matrix = analysis.riemannian_correlation_matrix()

# Extract principal components using the correlation matrix
components = analysis.riemannian_components_from_data_and_correlation(corr_matrix)

# Optionally, compute variable-component correlations
variable_corr = analysis.riemannian_correlation_variables_components(components)

For full usage examples and real-world datasets, refer to the “How to Use Riemannian STATS” section, available both on the homepage (Home) and in the sidebar navigation.

API Documentation

class riemannian_stats.riemannian_analysis.RiemannianAnalysis(data: ndarray | DataFrame, n_neighbors: int = 3, min_dist: float = 0.1, metric: str = 'euclidean')

Bases: object

A class to perform UMAP-based analysis combined with Riemannian geometry.

This class allows dimensionality reduction, similarity graph analysis, and custom covariance/correlation computations using a Riemannian-weighted framework, enhancing traditional UMAP with structure-aware geometry.

Parameters:
  • data (np.ndarray or pd.DataFrame) – Input dataset.

  • n_neighbors (int) – Number of neighbors for UMAP KNN graph construction. Default is 3.

  • min_dist (float) – Minimum distance parameter for UMAP, controlling cluster tightness. Default is 0.1.

  • metric (str) – Distance metric for UMAP (e.g., “euclidean”, “manhattan”). Default is “euclidean”.

Properties:

data (np.ndarray or pd.DataFrame): The input data. Setting this triggers automatic recomputation of all derived matrices. n_neighbors (int): Number of neighbors for UMAP. Setting this re-triggers internal recomputations. min_dist (float): Minimum distance used in UMAP embedding. Automatically recomputes internal matrices on change. metric (str): UMAP distance metric. Triggers recomputation if modified.

umap_similarities (np.ndarray): Matrix of similarity values from the UMAP fuzzy graph. rho (np.ndarray): Matrix computed as (1 - UMAP similarity), used to weight vector differences. riemannian_diff (np.ndarray): 3D array of weighted pairwise vector differences between observations. umap_distance_matrix (np.ndarray): Pairwise distance matrix computed from Riemannian differences.

riemannian_correlation_matrix() np.ndarray

Computes the correlation matrix based on the Riemannian covariance structure.

riemannian_components(corr_matrix

np.ndarray) -> np.ndarray: Performs Riemannian PCA using the supplied correlation matrix.

riemannian_components_from_data_and_correlation(corr_matrix

np.ndarray) -> np.ndarray: Like riemannian_components, but uses both data and a given correlation matrix.

riemannian_correlation_variables_components(components

np.ndarray) -> pd.DataFrame: Calculates Riemannian correlations between original features and the first two components.

Notes

  • Setting data, n_neighbors, min_dist, or metric automatically recalculates:
    • UMAP similarities

    • Rho matrix

    • Riemannian differences

    • UMAP distance matrix

  • Internal methods (prefixed with double underscores) are used for computing intermediate matrices and are not intended for external use.

property data
property metric
property min_dist
property n_neighbors
property rho: ndarray | None

Returns the Rho matrix (1 - UMAP similarities).

riemannian_components(corr_matrix: ndarray) ndarray

Performs Riemannian principal component analysis (PCA) using the supplied correlation matrix.

Parameters:

corr_matrix (numpy.ndarray) – Riemannian correlation matrix.

Returns:

Matrix of principal components.

Return type:

numpy.ndarray

Raises:

ValueError – If the correlation matrix is not square or if its size does not match the number of data columns.

riemannian_components_from_data_and_correlation(corr_matrix: ndarray) ndarray

Performs Riemannian principal component analysis (PCA) using the data and the provided correlation matrix.

Parameters:

corr_matrix (numpy.ndarray) – Correlation matrix of the variables.

Returns:

Matrix of principal components.

Return type:

numpy.ndarray

Raises:

ValueError – If the correlation matrix is not square or if its size does not match the number of data columns.

riemannian_correlation_matrix() ndarray

Calculates the Riemannian correlation matrix from the Riemannian covariance matrix.

Returns:

Riemannian correlation matrix.

Return type:

numpy.ndarray

riemannian_correlation_variables_components(components: ndarray) DataFrame

Calculates the Riemannian correlation between the original variables and the first two components.

Parameters:

components (numpy.ndarray) – Matrix of components (at least two columns are expected).

Returns:

DataFrame with the correlation of each original variable with the first and second components.

Return type:

pandas.DataFrame

property riemannian_diff: ndarray | None

Returns the 3D array of weighted Riemannian differences.

property umap_distance_matrix: ndarray | None

Returns the UMAP distance matrix.

property umap_similarities: ndarray | None

Returns the UMAP similarity matrix.