Multivariate time series classification with sktime¶
In this notebook, we will use sktime for multivariate time series classification.
For the simpler univariate time series classification setting, take a look at this notebook.
Preliminaries¶
[1]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import (
ColumnEnsembleClassifier,
TimeSeriesForestClassifier,
)
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
Load multivariate time series/panel data¶
The data set we use in this notebook was generated as part of a student project where four students performed four activities whilst wearing a smart watch. The watch collects 3D accelerometer and a 3D gyroscope It consists of four classes, which are walking, resting, running and badminton. Participants were required to record motion a total of five times, and the data is sampled once every tenth of a second, for a ten second period.
[2]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[3]:
# multivariate input data
X_train.head()
[3]:
dim_0 | dim_1 | dim_2 | dim_3 | dim_4 | dim_5 | |
---|---|---|---|---|---|---|
9 | 0 -0.407421 1 -0.407421 2 2.355158 3... | 0 1.413374 1 1.413374 2 -3.928032 3... | 0 0.092782 1 0.092782 2 -0.211622 3... | 0 -0.066584 1 -0.066584 2 -3.630177 3... | 0 0.223723 1 0.223723 2 -0.026634 3... | 0 0.135832 1 0.135832 2 -1.946925 3... |
24 | 0 0.383922 1 0.383922 2 -0.272575 3... | 0 0.302612 1 0.302612 2 -1.381236 3... | 0 -0.398075 1 -0.398075 2 -0.681258 3... | 0 0.071911 1 0.071911 2 -0.761725 3... | 0 0.175783 1 0.175783 2 -0.114525 3... | 0 -0.087891 1 -0.087891 2 -0.503377 3... |
5 | 0 -0.357300 1 -0.357300 2 -0.005055 3... | 0 -0.584885 1 -0.584885 2 0.295037 3... | 0 -0.792751 1 -0.792751 2 0.213664 3... | 0 0.074574 1 0.074574 2 -0.157139 3... | 0 0.159802 1 0.159802 2 -0.306288 3... | 0 0.023970 1 0.023970 2 1.230478 3... |
7 | 0 -0.352746 1 -0.352746 2 -1.354561 3... | 0 0.316845 1 0.316845 2 0.490525 3... | 0 -0.473779 1 -0.473779 2 1.454261 3... | 0 -0.327595 1 -0.327595 2 -0.269001 3... | 0 0.106535 1 0.106535 2 0.021307 3... | 0 0.197090 1 0.197090 2 0.460763 3... |
34 | 0 0.052231 1 0.052231 2 -0.54804... | 0 -0.730486 1 -0.730486 2 0.70700... | 0 -0.518104 1 -0.518104 2 -1.179430 3... | 0 -0.159802 1 -0.159802 2 -0.239704 3... | 0 -0.045277 1 -0.045277 2 0.023970 3... | 0 -0.029297 1 -0.029297 2 0.29829... |
[4]:
# multi-class target variable
np.unique(y_train)
[4]:
array(['badminton', 'running', 'standing', 'walking'], dtype=object)
Multivariate classification¶
sktime offers three main ways of solving multivariate time series classification problems:
Concatenation of time series columns into a single long time series column via
ColumnConcatenator
and apply a classifier to the concatenated data,Column-wise ensembling via
ColumnEnsembleClassifier
in which one classifier is fitted for each time series column and their predictions aggregated,Bespoke estimator-specific methods for handling multivariate time series data, e.g. finding shapelets in multidimensional spaces (still work in progress).
Time series concatenation¶
We can concatenate multivariate time series/panel data into long univariate time series/panel and then apply a classifier to the univariate data.
[5]:
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
[5]:
1.0
Column ensembling¶
We can also fit one classifier for each time series column and then aggregated their predictions. The interface is similar to the familiar ColumnTransformer
from sklearn.
[6]:
clf = ColumnEnsembleClassifier(
estimators=[
("TSF0", TimeSeriesForestClassifier(n_estimators=100), [0]),
("BOSSEnsemble3", BOSSEnsemble(max_ensemble_size=5), [3]),
]
)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
[6]:
0.9
Bespoke classification algorithms¶
Another approach is to use bespoke (or classifier-specific) methods for multivariate time series data. Here, we try out the MrSEQL algorithm in multidimensional space.
[7]:
clf = MrSEQLClassifier()
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
[7]:
1.0
Generated by nbsphinx. The Jupyter notebook can be found here.