RiemannianAnalysis Module Details¶
The RiemannianAnalysis
class combines the power of Uniform Manifold Approximation and Projection (UMAP) with Riemannian geometry, enabling a more insightful exploration of high-dimensional data.
Overview¶
This module extends UMAP by incorporating Riemannian-based weighting to better capture the intrinsic geometry of the data. It enables more meaningful representations, particularly in contexts where non-Euclidean distance structures are important.
Key Capabilities¶
UMAP Dimensionality Reduction: Applies UMAP for nonlinear dimensionality reduction with customizable parameters.
Riemannian Distance Weighting: Integrates Riemannian weights to enhance pairwise similarity computation.
Custom Covariance and Correlation: Computes covariance and correlation matrices adapted to the Riemannian structure of the dataset.
Riemannian PCA: Performs principal component analysis using geometry-aware transformations.
Correlation with Components: Computes variable-to-component correlations in Riemannian space.
Use Cases¶
This module is especially useful in the following scenarios:
High-dimensional datasets where traditional methods fail to capture intrinsic structures.
Applications in neuroscience, biomechanics, and other fields that benefit from non-Euclidean analysis.
Scenarios requiring geometry-informed PCA and correlation analysis.
Usage Example¶
Here’s a simple example to demonstrate how to use the RiemannianAnalysis
class with a dataset loaded using pandas:
import pandas as pd
from riemannian_stats.riemannian_analysis import RiemannianAnalysis
# Load your high-dimensional dataset
df = pd.read_csv("path/to/data.csv", sep=",", decimal=".")
# Create an analysis instance
analysis = RiemannianAnalysis(df, n_neighbors=5, min_dist=0.1, metric="euclidean")
# Compute the Riemannian correlation matrix
corr_matrix = analysis.riemannian_correlation_matrix()
# Extract principal components using the correlation matrix
components = analysis.riemannian_components_from_data_and_correlation(corr_matrix)
# Optionally, compute variable-component correlations
variable_corr = analysis.riemannian_correlation_variables_components(components)
For full usage examples and real-world datasets, refer to the “How to Use Riemannian STATS” section, available both on the homepage (Home) and in the sidebar navigation.
API Documentation¶
- class riemannian_stats.riemannian_analysis.RiemannianAnalysis(data: ndarray | DataFrame, n_neighbors: int = 3, min_dist: float = 0.1, metric: str = 'euclidean')¶
Bases:
object
A class to perform UMAP-based analysis combined with Riemannian geometry.
This class allows dimensionality reduction, similarity graph analysis, and custom covariance/correlation computations using a Riemannian-weighted framework, enhancing traditional UMAP with structure-aware geometry.
- Parameters:
data (np.ndarray or pd.DataFrame) – Input dataset.
n_neighbors (int) – Number of neighbors for UMAP KNN graph construction. Default is 3.
min_dist (float) – Minimum distance parameter for UMAP, controlling cluster tightness. Default is 0.1.
metric (str) – Distance metric for UMAP (e.g., “euclidean”, “manhattan”). Default is “euclidean”.
- Properties:
data (np.ndarray or pd.DataFrame): The input data. Setting this triggers automatic recomputation of all derived matrices. n_neighbors (int): Number of neighbors for UMAP. Setting this re-triggers internal recomputations. min_dist (float): Minimum distance used in UMAP embedding. Automatically recomputes internal matrices on change. metric (str): UMAP distance metric. Triggers recomputation if modified.
umap_similarities (np.ndarray): Matrix of similarity values from the UMAP fuzzy graph. rho (np.ndarray): Matrix computed as (1 - UMAP similarity), used to weight vector differences. riemannian_diff (np.ndarray): 3D array of weighted pairwise vector differences between observations. umap_distance_matrix (np.ndarray): Pairwise distance matrix computed from Riemannian differences.
- riemannian_correlation_matrix() np.ndarray ¶
Computes the correlation matrix based on the Riemannian covariance structure.
- riemannian_components(corr_matrix
np.ndarray) -> np.ndarray: Performs Riemannian PCA using the supplied correlation matrix.
- riemannian_components_from_data_and_correlation(corr_matrix
np.ndarray) -> np.ndarray: Like riemannian_components, but uses both data and a given correlation matrix.
- riemannian_correlation_variables_components(components
np.ndarray) -> pd.DataFrame: Calculates Riemannian correlations between original features and the first two components.
Notes
- Setting data, n_neighbors, min_dist, or metric automatically recalculates:
UMAP similarities
Rho matrix
Riemannian differences
UMAP distance matrix
Internal methods (prefixed with double underscores) are used for computing intermediate matrices and are not intended for external use.
- property data¶
- property metric¶
- property min_dist¶
- property n_neighbors¶
- property rho: ndarray | None¶
Returns the Rho matrix (1 - UMAP similarities).
- riemannian_components(corr_matrix: ndarray) ndarray ¶
Performs Riemannian principal component analysis (PCA) using the supplied correlation matrix.
- Parameters:
corr_matrix (numpy.ndarray) – Riemannian correlation matrix.
- Returns:
Matrix of principal components.
- Return type:
numpy.ndarray
- Raises:
ValueError – If the correlation matrix is not square or if its size does not match the number of data columns.
- riemannian_components_from_data_and_correlation(corr_matrix: ndarray) ndarray ¶
Performs Riemannian principal component analysis (PCA) using the data and the provided correlation matrix.
- Parameters:
corr_matrix (numpy.ndarray) – Correlation matrix of the variables.
- Returns:
Matrix of principal components.
- Return type:
numpy.ndarray
- Raises:
ValueError – If the correlation matrix is not square or if its size does not match the number of data columns.
- riemannian_correlation_matrix() ndarray ¶
Calculates the Riemannian correlation matrix from the Riemannian covariance matrix.
- Returns:
Riemannian correlation matrix.
- Return type:
numpy.ndarray
- riemannian_correlation_variables_components(components: ndarray) DataFrame ¶
Calculates the Riemannian correlation between the original variables and the first two components.
- Parameters:
components (numpy.ndarray) – Matrix of components (at least two columns are expected).
- Returns:
DataFrame with the correlation of each original variable with the first and second components.
- Return type:
pandas.DataFrame
- property riemannian_diff: ndarray | None¶
Returns the 3D array of weighted Riemannian differences.
- property umap_distance_matrix: ndarray | None¶
Returns the UMAP distance matrix.
- property umap_similarities: ndarray | None¶
Returns the UMAP similarity matrix.