This document demonstrates the use of the
riemannian_stats
package to perform Riemannian
Principal Component Analysis (R-PCA) on the classical
Iris dataset. R-PCA is a novel extension of standard
PCA that leverages the local geometry of data using a Riemannian
manifold structure induced via UMAP.
We showcase the full pipeline: loading and preprocessing the dataset, computing manifold-based metrics, extracting Riemannian principal components, and visualizing both the structure and correlation of the data.
from riemannian_stats import riemannian_analysis, visualization, data_processing, utilities
data = data_processing.load_data("./data/iris.csv", separator=";", decimal=".")
n_neighbors = int(len(data) / 3)
if 'tipo' in data.columns:
clusters = data['tipo']
data_with_clusters = data.copy()
data = data.iloc[:, :-1]
else:
clusters = None
data_with_clusters = data
We load the Iris dataset with custom separator/decimal symbols. The
tipo
column is used as cluster label if present, and
removed for analysis purposes.
analysis = riemannian_analysis(data, n_neighbors=n_neighbors)
## C:\Anaconda\envs\RIEMAN~1\Lib\site-packages\sklearn\utils\deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
## warnings.warn(
We initialize the Riemannian analysis intance, specifying the number of neighbors based on the dataset size.
umap_similarities = analysis.umap_similarities
print("UMAP Similarities Matrix:\n", umap_similarities)
## UMAP Similarities Matrix:
## [[0. 0.01766587 0.05554996 ... 0. 0. 0. ]
## [0.01766587 0. 0.37440622 ... 0. 0. 0. ]
## [0.05554996 0.37440622 0. ... 0. 0. 0. ]
## ...
## [0. 0. 0. ... 0. 0.257874 0.10819354]
## [0. 0. 0. ... 0.257874 0. 0.12052599]
## [0. 0. 0. ... 0.10819354 0.12052599 0. ]]
rho = analysis.rho
print("Rho Matrix:\n", rho)
## Rho Matrix:
## [[1. 0.98233414 0.94445 ... 1. 1. 1. ]
## [0.98233414 1. 0.6255938 ... 1. 1. 1. ]
## [0.94445 0.6255938 1. ... 1. 1. 1. ]
## ...
## [1. 1. 1. ... 1. 0.742126 0.8918065 ]
## [1. 1. 1. ... 0.742126 1. 0.87947404]
## [1. 1. 1. ... 0.8918065 0.87947404 1. ]]
UMAP similarities define local neighborhood structure. The rho matrix encodes local scaling for each point, forming the foundation of the Riemannian metric.
riemannian_diff = analysis.riemannian_diff
print("Riemannian Vector Differences:\n", riemannian_diff)
## Riemannian Vector Differences:
## [[[ 0. 0. 0. 0. ]
## [ 0.19646683 0.49116707 0. 0. ]
## [ 0.37778001 0.28333501 0.094445 0. ]
## ...
## [-1.4 0.5 -3.8 -1.8 ]
## [-1.1 0.1 -4. -2.1 ]
## [-0.8 0.5 -3.7 -1.6 ]]
##
## [[-0.19646683 -0.49116707 0. 0. ]
## [ 0. 0. 0. 0. ]
## [ 0.12511876 -0.12511876 0.06255938 0. ]
## ...
## [-1.6 0. -3.8 -1.8 ]
## [-1.3 -0.4 -4. -2.1 ]
## [-1. 0. -3.7 -1.6 ]]
##
## [[-0.37778001 -0.28333501 -0.094445 0. ]
## [-0.12511876 0.12511876 -0.06255938 0. ]
## [ 0. 0. 0. 0. ]
## ...
## [-1.8 0.2 -3.9 -1.8 ]
## [-1.5 -0.2 -4.1 -2.1 ]
## [-1.2 0.2 -3.8 -1.6 ]]
##
## ...
##
## [[ 1.4 -0.5 3.8 1.8 ]
## [ 1.6 0. 3.8 1.8 ]
## [ 1.8 -0.2 3.9 1.8 ]
## ...
## [ 0. 0. 0. 0. ]
## [ 0.2226378 -0.2968504 -0.1484252 -0.2226378 ]
## [ 0.53508389 0. 0.08918065 0.1783613 ]]
##
## [[ 1.1 -0.1 4. 2.1 ]
## [ 1.3 0.4 4. 2.1 ]
## [ 1.5 0.2 4.1 2.1 ]
## ...
## [-0.2226378 0.2968504 0.1484252 0.2226378 ]
## [ 0. 0. 0. 0. ]
## [ 0.26384221 0.35178962 0.26384221 0.43973702]]
##
## [[ 0.8 -0.5 3.7 1.6 ]
## [ 1. 0. 3.7 1.6 ]
## [ 1.2 -0.2 3.8 1.6 ]
## ...
## [-0.53508389 0. -0.08918065 -0.1783613 ]
## [-0.26384221 -0.35178962 -0.26384221 -0.43973702]
## [ 0. 0. 0. 0. ]]]
umap_distance_matrix = analysis.umap_distance_matrix
print("UMAP Distance Matrix:\n", umap_distance_matrix)
## UMAP Distance Matrix:
## [[0. 0.52900312 0.48157691 ... 4.45982062 4.65080638 4.14004831]
## [0.52900312 0. 0.18767813 ... 4.49888875 4.71805044 4.15331193]
## [0.48157691 0.18767813 0. ... 4.66154481 4.84871117 4.29883705]
## ...
## [4.45982062 4.49888875 4.66154481 ... 0. 0.45747718 0.57103477]
## [4.65080638 4.71805044 4.84871117 ... 0.45747718 0. 0.67553683]
## [4.14004831 4.15331193 4.29883705 ... 0.57103477 0.67553683 0. ]]
We compute the Riemannian vector difference and UMAP-induced distances to quantify local geometrical deviation between samples.
riemann_corr = analysis.riemannian_correlation_matrix()
print("Riemannian Correlation Matrix:\n", riemann_corr)
## Riemannian Correlation Matrix:
## [[ 1. -0.12818677 0.86553676 0.78974045]
## [-0.12818677 1. -0.44643239 -0.39361534]
## [ 0.86553676 -0.44643239 1. 0.96096511]
## [ 0.78974045 -0.39361534 0.96096511 1. ]]
riemann_components = analysis.riemannian_components_from_data_and_correlation(riemann_corr)
print("Riemannian Principal Components:\n", riemann_components)
## Riemannian Principal Components:
## [[-2.63097358e+00 -5.26465635e-01 -2.31377990e-01 1.00902430e-02]
## [-2.42220256e+00 6.39026525e-01 -3.69619363e-01 8.66049256e-02]
## [-2.71312994e+00 3.12321472e-01 -8.14538421e-02 4.08231981e-03]
## [-2.64472373e+00 5.70817575e-01 -4.41041657e-02 -9.00116698e-02]
## [-2.76038267e+00 -6.90832808e-01 -8.38861258e-02 -5.33614916e-02]
## [-2.47600196e+00 -1.54771985e+00 -6.09838268e-02 -1.11707930e-02]
## [-2.80822531e+00 -7.45565981e-02 2.14340545e-01 -7.03377631e-02]
## [-2.59467656e+00 -2.65940241e-01 -2.00846520e-01 -3.96228831e-02]
## [-2.66845727e+00 1.09795624e+00 -3.49584529e-03 -5.62953317e-02]
## [-2.52589481e+00 4.30556981e-01 -3.80506873e-01 -4.93079250e-02]
## [-2.54462940e+00 -1.10269806e+00 -3.57588874e-01 1.07659036e-02]
## [-2.68778864e+00 -1.69782018e-01 -2.28231860e-02 -1.52787744e-01]
## [-2.55381618e+00 6.95140958e-01 -3.63611816e-01 -1.02593242e-02]
## [-2.96319693e+00 9.46697617e-01 4.39463823e-02 -4.90767271e-02]
## [-2.59090982e+00 -1.93336800e+00 -5.34785847e-01 1.94297281e-01]
## [-2.68638074e+00 -2.75824718e+00 -2.19814297e-02 3.54682501e-02]
## [-2.60443874e+00 -1.53960268e+00 -8.82566526e-02 1.66352661e-01]
## [-2.56197060e+00 -5.33486312e-01 -1.51782973e-01 7.25624942e-02]
## [-2.29037459e+00 -1.47451952e+00 -4.59275835e-01 5.85926878e-02]
## [-2.73025678e+00 -1.17589839e+00 4.07031342e-02 -5.89975772e-02]
## [-2.28001564e+00 -4.66373851e-01 -5.29620361e-01 9.18338459e-03]
## [-2.59445534e+00 -9.69458138e-01 5.84088508e-02 3.25344099e-02]
## [-3.13926199e+00 -4.86340614e-01 2.31251303e-01 -1.34060323e-02]
## [-2.19404013e+00 -1.26693680e-01 -9.59169201e-02 1.22483878e-01]
## [-2.59146105e+00 -1.75869893e-01 -2.36856659e-03 -2.85930334e-01]
## [-2.29537354e+00 5.85874185e-01 -4.41585514e-01 3.22351973e-02]
## [-2.42456140e+00 -2.82010885e-01 -3.48382796e-02 4.09407557e-02]
## [-2.53625375e+00 -5.77588684e-01 -3.10162347e-01 1.01378242e-04]
## [-2.50156449e+00 -3.62098463e-01 -3.78869853e-01 7.35419776e-02]
## [-2.61680236e+00 3.06233597e-01 -6.09992227e-02 -1.29060271e-01]
## [-2.48739326e+00 4.70600770e-01 -2.08491087e-01 -6.56085359e-02]
## [-2.20622806e+00 -4.76356621e-01 -3.84066741e-01 2.22889614e-01]
## [-3.00604748e+00 -1.85133358e+00 -1.84215625e-02 -2.36729289e-01]
## [-2.84812025e+00 -2.21706717e+00 -1.40563143e-01 -5.57599141e-02]
## [-2.52589481e+00 4.30556981e-01 -3.80506873e-01 -4.93079250e-02]
## [-2.55740723e+00 1.67069493e-01 -3.45079739e-01 1.51639179e-01]
## [-2.41264024e+00 -7.20811371e-01 -5.80606451e-01 1.92039101e-01]
## [-2.52589481e+00 4.30556981e-01 -3.80506873e-01 -4.93079250e-02]
## [-2.76736492e+00 8.86524601e-01 5.15752484e-02 -4.09742042e-02]
## [-2.53206593e+00 -3.15033997e-01 -2.86449083e-01 -5.23088444e-03]
## [-2.65669043e+00 -4.82363263e-01 -7.29986156e-02 8.25513590e-02]
## [-2.16816210e+00 2.32463667e+00 -3.87657400e-01 2.59308197e-01]
## [-2.90096184e+00 4.59602742e-01 1.75353849e-01 -9.90936761e-02]
## [-2.35335389e+00 -5.09513167e-01 1.86241054e-01 1.36825522e-01]
## [-2.53281702e+00 -1.19103623e+00 1.47570977e-01 -1.74048780e-01]
## [-2.41581021e+00 6.81099606e-01 -2.04421783e-01 1.14685178e-01]
## [-2.76715057e+00 -1.17090701e+00 -3.20736761e-02 -1.65850692e-01]
## [-2.74363138e+00 3.59385937e-01 1.09669280e-02 -7.46905423e-02]
## [-2.60724004e+00 -1.05360430e+00 -2.71986310e-01 -2.36260950e-02]
## [-2.55998730e+00 -5.04500197e-02 -2.69554026e-01 3.38177163e-02]
## [ 6.34573720e-01 -9.51943400e-01 -8.47213243e-01 3.51463679e-02]
## [ 2.46974569e-01 -6.07602467e-01 -2.54135068e-01 -1.72857128e-02]
## [ 7.84072241e-01 -7.18603539e-01 -7.46408891e-01 4.19420632e-03]
## [ 1.41511623e-02 1.62039950e+00 -2.50016885e-01 2.81342361e-02]
## [ 6.16344769e-01 1.20082015e-01 -5.89691798e-01 8.40366340e-02]
## [-2.29055442e-02 3.66639384e-01 -6.28020396e-02 -1.84846735e-01]
## [ 2.58915602e-01 -7.90763820e-01 -3.98716165e-02 -1.00977820e-01]
## [-8.15123345e-01 1.73121650e+00 2.40472314e-02 -7.82527732e-02]
## [ 4.88546099e-01 -1.22526230e-01 -7.81889214e-01 -3.22055753e-02]
## [-3.78947859e-01 9.37147664e-01 2.94636586e-01 -7.79065518e-02]
## [-4.50097231e-01 2.54002229e+00 -2.84566818e-01 -1.99523422e-02]
## [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
## [ 1.84921583e-01 1.67204576e+00 -9.77301182e-01 4.26150949e-02]
## [ 2.38531263e-01 9.43379210e-02 -2.34355322e-01 -1.56910308e-01]
## [-4.06903431e-01 3.48587466e-01 -4.66713273e-03 6.05582798e-02]
## [ 4.03016136e-01 -5.66312354e-01 -6.46709539e-01 8.90351445e-02]
## [-4.64074896e-02 7.16078879e-02 1.40616835e-01 -1.19851745e-01]
## [-2.11639815e-01 6.42516274e-01 -4.46429629e-01 -1.90288147e-01]
## [ 7.56218247e-01 1.43596353e+00 -6.75769320e-01 1.87090915e-01]
## [-2.18351357e-01 1.20553863e+00 -3.77925099e-01 -7.20410983e-02]
## [ 2.41680536e-01 -4.17983232e-01 3.66492167e-01 -1.24432132e-01]
## [ 1.75780153e-02 1.07726556e-01 -1.45301609e-01 2.81778027e-02]
## [ 8.06703756e-01 8.54082590e-01 -6.02266281e-01 -2.77136414e-02]
## [ 1.88650634e-01 3.01735821e-01 -4.43900227e-01 -2.50892177e-01]
## [ 2.19964824e-01 -1.60550949e-02 -5.15696921e-01 2.54481531e-02]
## [ 4.03494861e-01 -3.17120142e-01 -6.19150192e-01 8.33164390e-02]
## [ 8.18292287e-01 -2.00170323e-02 -9.30099834e-01 3.87738183e-02]
## [ 8.95764592e-01 -4.23025740e-01 -4.71084824e-01 4.50335838e-02]
## [ 3.78138574e-02 2.65138609e-02 -2.12804067e-02 -1.16741335e-02]
## [-4.13043911e-01 9.66785316e-01 -5.04039486e-01 4.41806532e-02]
## [-2.45176365e-01 1.46019122e+00 -3.61423062e-01 -3.43257136e-02]
## [-3.40566381e-01 1.45961853e+00 -4.41983840e-01 -5.14537143e-02]
## [-1.31574000e-01 5.69546285e-01 -2.85561372e-01 -1.18177190e-03]
## [ 6.19865773e-01 5.64978466e-01 -1.30075312e-01 -2.14993784e-01]
## [-1.64382711e-01 1.81566391e-01 3.40155004e-01 -2.31415175e-01]
## [-2.31311518e-02 -5.39816019e-01 1.54396286e-01 -8.98767868e-02]
## [ 5.93290399e-01 -6.14966219e-01 -5.87511065e-01 2.41173758e-02]
## [ 5.90174661e-01 1.19807160e+00 -8.43876317e-01 1.16597691e-01]
## [-1.34853310e-01 6.15413528e-02 3.42078698e-02 -6.92222413e-02]
## [-1.12206837e-01 1.21192264e+00 -1.32240178e-01 -2.69014517e-02]
## [-1.13670771e-01 9.62493464e-01 -1.17509408e-01 -2.68312500e-01]
## [ 1.24415846e-01 -6.68950160e-02 -1.50606976e-01 -1.15358059e-01]
## [-5.55208414e-02 7.73159735e-01 -3.45214293e-01 -1.40027137e-02]
## [-6.95979785e-01 1.89697255e+00 -1.16464121e-01 -1.81582279e-02]
## [-9.09679807e-02 5.81378126e-01 -6.38519894e-02 -1.02207347e-01]
## [-1.08174921e-01 3.88279246e-02 -2.20041779e-02 -8.34195519e-02]
## [-1.17080675e-08 1.94126246e-08 -2.97273436e-09 -9.81502316e-09]
## [ 6.01933666e-02 3.16446537e-02 -1.90644350e-01 -1.50103952e-02]
## [-7.95485150e-01 1.45271747e+00 -2.38435059e-02 1.46803866e-01]
## [-5.87253407e-02 1.96494847e-01 -4.30553511e-02 -3.31236624e-02]
## [ 1.31804249e+00 -9.43491827e-01 7.61935530e-01 -1.23744244e-01]
## [ 6.99649565e-01 6.40266009e-01 2.78433029e-01 -9.63206704e-02]
## [ 1.71120181e+00 -6.65747097e-01 -4.33751153e-01 3.30628118e-02]
## [ 9.73778662e-01 -3.23862089e-02 -7.00596138e-02 -2.67287604e-01]
## [ 1.37243179e+00 -3.76175941e-01 1.52641039e-01 -6.64360655e-02]
## [ 2.24901934e+00 -9.25420922e-01 -8.14036526e-01 -1.05643239e-01]
## [-5.54407357e-02 1.47582860e+00 6.96816064e-01 -1.98444445e-01]
## [ 1.82464936e+00 -5.37528818e-01 -8.78357805e-01 -2.34033662e-01]
## [ 1.55563342e+00 6.21023898e-01 -6.46390656e-01 -1.02242393e-01]
## [ 1.71325202e+00 -2.02774772e+00 1.83998565e-01 5.42236723e-02]
## [ 8.76064552e-01 -7.74851406e-01 6.95021608e-02 6.11660045e-02]
## [ 1.14266161e+00 3.44509091e-01 -2.20300380e-01 2.08387076e-02]
## [ 1.39493313e+00 -5.10348660e-01 -2.04216288e-01 1.07410270e-01]
## [ 7.94758362e-01 1.09230025e+00 3.08114755e-01 3.31654723e-02]
## [ 9.78602390e-01 3.93603220e-01 7.37738594e-01 1.86094807e-01]
## [ 1.08468126e+00 -7.50878261e-01 4.07526188e-01 1.25429032e-01]
## [ 1.00009228e+00 -3.42005360e-01 -1.86193647e-01 -1.83182480e-01]
## [ 1.87835448e+00 -2.69125208e+00 -3.18111465e-01 -2.85637740e-01]
## [ 2.81315735e+00 -1.40800189e-01 -9.67551637e-01 3.67896153e-02]
## [ 8.42270194e-01 1.62004442e+00 -5.19022162e-01 -8.69789497e-02]
## [ 1.52617120e+00 -1.00446421e+00 6.78619552e-03 1.19865572e-01]
## [ 5.03903470e-01 5.13042757e-01 5.65664051e-01 -4.23822284e-02]
## [ 2.40833310e+00 -5.42601435e-01 -1.09619450e+00 -1.19984883e-01]
## [ 8.80966491e-01 4.07979011e-01 -2.41115507e-01 1.01308772e-01]
## [ 1.19614551e+00 -1.10569627e+00 8.06905893e-02 -1.02922664e-01]
## [ 1.46531577e+00 -1.12272997e+00 -6.27541960e-01 -2.22462278e-01]
## [ 6.94335387e-01 2.37259558e-01 -9.71482094e-02 7.93684260e-02]
## [ 5.37888186e-01 -1.28876738e-01 1.11352020e-01 -5.25969895e-02]
## [ 1.31019670e+00 1.10918934e-01 2.12335729e-02 -1.64191164e-02]
## [ 1.39668833e+00 -6.77708180e-01 -9.24147007e-01 -2.00525582e-01]
## [ 1.95884305e+00 -3.76123739e-01 -9.59891065e-01 -1.93479495e-02]
## [ 1.76924220e+00 -2.76931037e+00 -6.68961245e-01 -2.08655655e-01]
## [ 1.37919969e+00 1.03898258e-01 1.00828590e-01 4.60531347e-02]
## [ 6.72545028e-01 2.12132699e-01 -4.04537977e-01 -2.03595853e-01]
## [ 7.72940835e-01 7.34266799e-01 -4.02902454e-01 -4.98781398e-01]
## [ 2.28908997e+00 -9.78409574e-01 -7.74540088e-01 2.75597579e-01]
## [ 1.05380427e+00 -1.14181491e+00 7.16956988e-01 -3.77527775e-02]
## [ 8.70683186e-01 -5.06372533e-01 -3.87017835e-02 -2.46634214e-01]
## [ 4.27569541e-01 -7.61505076e-02 1.79520318e-01 -4.11373674e-02]
## [ 1.35863611e+00 -7.70874055e-01 -2.34747758e-01 1.57123396e-01]
## [ 1.50464218e+00 -6.97807154e-01 1.88878833e-01 1.86994425e-01]
## [ 1.40031450e+00 -7.78827533e-01 -9.60123434e-02 4.15210488e-01]
## [ 6.99283037e-01 6.39930591e-01 2.78287165e-01 -9.62702105e-02]
## [ 1.52777896e+00 -9.59429038e-01 1.06025172e-01 -3.28815380e-03]
## [ 1.47215744e+00 -1.13377898e+00 3.99070657e-01 1.46966341e-01]
## [ 1.37400088e+00 -4.69208382e-01 2.01216901e-02 3.31105364e-01]
## [ 1.11732031e+00 8.26612581e-01 -2.78931036e-01 1.77708771e-01]
## [ 1.04177066e+00 -3.49958839e-01 -4.74582331e-02 7.49046130e-02]
## [ 8.57972266e-01 -1.08164190e+00 7.09328122e-01 -4.58553003e-02]
## [ 4.93417375e-01 -3.91215431e-02 2.98591060e-01 -2.10910627e-01]]
Principal components are derived from the Riemannian correlation matrix, capturing variance in the intrinsic geometry of the data.
comp1, comp2 = 0, 1
inertia = utilities.pca_inertia_by_components(riemann_corr, comp1, comp2) * 100
print(f"Explained Inertia (PC1 & PC2): {inertia:.2f}%")
## Explained Inertia (PC1 & PC2): 95.38%
The explained inertia quantifies the proportion of Riemannian variance captured by the first two components.
correlations = analysis.riemannian_correlation_variables_components(riemann_components)
print("Correlations Variables vs Components:\n", correlations)
## Correlations Variables vs Components:
## Component_1 Component_2
## feature_1 0.876449 -0.383029
## feature_2 -0.488327 -0.865365
## feature_3 0.992478 -0.033786
## feature_4 0.96051 -0.051085
We compute how strongly each original variable is correlated with the Riemannian principal components.
if clusters is not None:
viz = visualization(data=data_with_clusters,
components=riemann_components,
explained_inertia=inertia,
clusters=clusters)
viz.plot_2d_scatter_with_clusters(x_col="s.largo", y_col="s.ancho", cluster_col="tipo", title="iris.csv")
viz.plot_principal_plane_with_clusters(title="iris.csv")
viz.plot_3d_scatter_with_clusters(x_col="s.largo", y_col="s.ancho", z_col="p.largo", cluster_col="tipo",
title="iris.csv", figsize=(12, 8))
else:
viz = visualization(data=data,
components=riemann_components,
explained_inertia=inertia)
viz.plot_principal_plane(title="iris.csv")
viz.plot_correlation_circle(correlations=correlations, title="iris.csv")
We visualize the structure of the dataset via:
This analysis demonstrates how the Riemannian STATS package enables geometric analysis of tabular datasets by transforming Euclidean features into a Riemannian manifold via UMAP. The result is a more faithful low-dimensional representation that respects the dataset’s intrinsic structure.