Multi-output Operator-valued kernel RegressionΒΆ

An example to illustrate multi-output regression with operator-valued kernels.

We compare Operator-valued kernel (OVK) - without prior (ID), - with covariance constraint prior (covariance), - with multioutput decision tree.

Setting the operator A as a good prior (here a covariance constraint) helps to achieve a better score on a noisy test dataset. OVK methods can generalise better than decision trees but are slower to train.

../_images/sphx_glr_plot_ovk_regression_multiple_outputs_001.png

Out:

  seed 123
Creating dataset...
Fitting...
Leaning time DPeriodic ID:  0.26281785965
Leaning time DPeriodic covariance:  0.9061460495
Leaning time Trees:  0.00524687767029
Predicting...
Test time DPeriodic ID:  0.103090047836
Test time DPeriodic covariance:  0.0985400676727
Test time Trees:  0.000425815582275
plotting
Done

# Author: Romain Brault <romain.brault@telecom-paristech.fr> with help from
#         the scikit-learn community.
# License: MIT

import operalib as ovk
from sklearn.tree import DecisionTreeRegressor

import numpy as np
import matplotlib.pyplot as plt
import time

print(__doc__)

seed = 123
np.random.seed(seed)
print("seed", seed)

# Create a random dataset
print("Creating dataset...")
X = 200 * np.random.rand(1000, 1) - 100
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
Tr = 2 * np.random.rand(2, 2) - 1
Tr = np.dot(Tr, Tr.T)
Tr = Tr / np.linalg.norm(Tr, 2)
U = np.linalg.cholesky(Tr)
y = np.dot(y, U)

# Add some noise
Sigma = 2 * np.random.rand(2, 2) - 1
Sigma = np.dot(Sigma, Sigma.T)
Sigma = 1. * Sigma / np.linalg.norm(Sigma, 2)
Cov = np.linalg.cholesky(Sigma)
y += np.dot(np.random.randn(y.shape[0], y.shape[1]), Cov)

# Fit
# real period is 2 * pi \approx 6.28, but we set the period to 6 for
# demonstration purpose
print("Fitting...")
start = time.time()
A = np.eye(2)
regr_1 = ovk.Ridge('DPeriodic', lbda=0.01, period=6, theta=.995, A=A)
regr_1.fit(X, y)
print("Leaning time DPeriodic ID: ", time.time() - start)

start = time.time()
A = np.cov(y.T)
regr_2 = ovk.Ridge('DPeriodic', lbda=0.01, period=6, theta=.995, A=A)
regr_2.fit(X, y)
print("Leaning time DPeriodic covariance: ", time.time() - start)

start = time.time()
regr_3 = DecisionTreeRegressor(max_depth=100)
regr_3.fit(X, y)
print("Leaning time Trees: ", time.time() - start)


# Predict
print("Predicting...")
X_test = np.arange(-100.0, 100.0, .5)[:, np.newaxis]
start = time.time()
y_1 = regr_1.predict(X_test)
print("Test time DPeriodic ID: ", time.time() - start)
start = time.time()
# regr_2.linop_.A = np.eye(2)
y_2 = regr_2.predict(X_test)
print("Test time DPeriodic covariance: ", time.time() - start)
start = time.time()
y_3 = regr_3.predict(X_test)
print("Test time Trees: ", time.time() - start)

# Ground truth
y_t = np.dot(np.array([np.pi * np.sin(X_test).ravel(),
                       np.pi * np.cos(X_test).ravel()]).T, U)

# Plot
print("plotting")
plt.figure()
plt.scatter(y_1[::1, 0], y_1[::1, 1], c="g", label="OVK Id", s=5., lw=0)
plt.scatter(y_2[::1, 0], y_2[::1, 1], c="b", label="OVK Cov", s=5., lw=0)
plt.scatter(y_3[::1, 0], y_3[::1, 1], c="r", label="D TREE", s=5., lw=0)
# plt.scatter(y[::1, 0], y[::1, 1], c="k", label="Data", s=5., lw = 0)
plt.scatter(y_t[::1, 0], y_t[::1, 1], c="k", label="True", s=5., lw=0)
plt.xlim([-4, 4])
plt.ylim([-4, 4])
plt.xlabel("data")
plt.ylabel("target")
plt.title("Multi-output OVK Regression")
plt.legend()
plt.show()

print("Done")

Total running time of the script: (0 minutes 1.942 seconds)

Download Python source code: plot_ovk_regression_multiple_outputs.py
Download IPython notebook: plot_ovk_regression_multiple_outputs.ipynb