Univariate time series classification with sktime¶
In this notebook, we will use sktime for univariate time series classification. Here, we have a single time series variable and an associated label for multiple instances. The goal is to find a classifier that can learn the relationship between time series and label and accurately predict the label of new series.
When you have multiple time series variables and want to learn the relationship between them and a label, you can take a look at our multivariate time series classification notebook.
Preliminaries¶
[1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sktime.utils.slope_and_trend import _slope
Load data¶
In this notebook, we use the arrow head problem.
The arrowhead dataset consists of outlines of the images of arrow heads. The classification of projectile points is an important topic in anthropology. The classes are based on shape distinctions such as the presence and location of a notch in the arrow.
The shapes of the projectile points are converted into a sequence using the angle-based method as described in this blog post about converting images into time series for data mining.
Data representation¶
Throughout sktime, the expected data format is a pd.DataFrame
, but in a slightly unusual format. A single column can contain not only primitives (floats, integers or strings), but also entire time series in form of a pd.Series
or np.array
.
For more details on our choice of data container, see this wiki entry.
[2]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[3]:
# univariate time series input data
X_train.head()
[3]:
dim_0 | |
---|---|
25 | 0 -1.6320 1 -1.6301 2 -1.6075 3 ... |
105 | 0 -1.6758 1 -1.6742 2 -1.6674 3 ... |
18 | 0 -2.1138 1 -2.0918 2 -2.0488 3 ... |
167 | 0 -1.7471 1 -1.7295 2 -1.7300 3 ... |
174 | 0 -1.6307 1 -1.6299 2 -1.6206 3 ... |
[4]:
# binary target variable
labels, counts = np.unique(y_train, return_counts=True)
print(labels, counts)
['0' '1' '2'] [60 54 44]
[5]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for label in labels:
X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=f"class {label}")
plt.legend()
ax.set(title="Example time series", xlabel="Time");
[5]:
[Text(0.5, 1.0, 'Example time series'), Text(0.5, 0, 'Time')]

Why not just use scikit-learn?¶
We can still use scikit-learn, but using scikit-learn comes with some implicit modelling choices.
Reduction: from time-series classification to tabular classification¶
To use scikit-learn, we have to convert the data into the required tabular format. There are different ways we can do that:
Treating time points as separate features (tabularisation)¶
Alternatively, we could bin and aggregate observations in time bins of different length.
[6]:
from sklearn.ensemble import RandomForestClassifier
from sktime.utils.data_processing import from_nested_to_2d_array
X_train_tab = from_nested_to_2d_array(X_train)
X_test_tab = from_nested_to_2d_array(X_test)
X_train_tab.head()
[6]:
dim_0__0 | dim_0__1 | dim_0__2 | dim_0__3 | dim_0__4 | dim_0__5 | dim_0__6 | dim_0__7 | dim_0__8 | dim_0__9 | ... | dim_0__241 | dim_0__242 | dim_0__243 | dim_0__244 | dim_0__245 | dim_0__246 | dim_0__247 | dim_0__248 | dim_0__249 | dim_0__250 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25 | -1.6320 | -1.6301 | -1.6075 | -1.6038 | -1.5762 | -1.5718 | -1.5191 | -1.4908 | -1.4612 | -1.3982 | ... | -1.4812 | -1.5110 | -1.5461 | -1.5718 | -1.5762 | -1.6238 | -1.6250 | -1.6302 | -1.6316 | -1.6323 |
105 | -1.6758 | -1.6742 | -1.6674 | -1.6630 | -1.5811 | -1.5492 | -1.5304 | -1.4950 | -1.4438 | -1.4327 | ... | -1.5237 | -1.5602 | -1.5713 | -1.6083 | -1.6610 | -1.6692 | -1.6723 | -1.7225 | -1.7234 | -1.6776 |
18 | -2.1138 | -2.0918 | -2.0488 | -2.0003 | -1.9664 | -1.9380 | -1.8775 | -1.8314 | -1.7594 | -1.7012 | ... | -1.5559 | -1.6216 | -1.7201 | -1.7719 | -1.8578 | -1.9386 | -1.9868 | -2.0597 | -2.1274 | -2.1308 |
167 | -1.7471 | -1.7295 | -1.7300 | -1.7044 | -1.6897 | -1.6581 | -1.6404 | -1.6003 | -1.5548 | -1.5033 | ... | -1.5668 | -1.5898 | -1.6347 | -1.6714 | -1.6739 | -1.7063 | -1.7189 | -1.7455 | -1.7590 | -1.7612 |
174 | -1.6307 | -1.6299 | -1.6206 | -1.6078 | -1.5797 | -1.5626 | -1.5270 | -1.5071 | -1.4648 | -1.4175 | ... | -1.2916 | -1.3501 | -1.4040 | -1.4513 | -1.4723 | -1.5136 | -1.5504 | -1.5816 | -1.5953 | -1.6208 |
5 rows × 251 columns
[7]:
# let's get a baseline for comparison
from sklearn.dummy import DummyClassifier
classifier = DummyClassifier(strategy="prior")
classifier.fit(X_train_tab, y_train)
classifier.score(X_test_tab, y_test)
[7]:
0.39622641509433965
[8]:
# now we can apply any scikit-learn classifier
classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train_tab, y_train)
y_pred = classifier.predict(X_test_tab)
accuracy_score(y_test, y_pred)
[8]:
0.8679245283018868
[9]:
from sklearn.pipeline import make_pipeline
# with sktime, we can write this as a pipeline
from sktime.transformations.panel.reduce import Tabularizer
classifier = make_pipeline(Tabularizer(), RandomForestClassifier())
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
[9]:
0.8490566037735849
What’s the implicit modelling choice here?
We treat each observation as a separate feature and thus ignore they are ordered in time. A tabular algorithm cannot make use of the fact that features are ordered in time, i.e. if we changed the order of the features, the fitted model and predictions wouldn’t change. Sometimes this works well, sometimes it doesn’t.
Feature extraction¶
Another modelling choice: we could extract features from the time series and then use the features to fit our tabular classifier. Here we use tsfresh for automatic feature extraction.
[10]:
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
transformer = TSFreshFeatureExtractor(default_fc_parameters="minimal")
extracted_features = transformer.fit_transform(X_train)
extracted_features.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:00<00:00, 48.63it/s]
[10]:
dim_0__sum_values | dim_0__median | dim_0__mean | dim_0__length | dim_0__standard_deviation | dim_0__variance | dim_0__maximum | dim_0__minimum | |
---|---|---|---|---|---|---|---|---|
0 | 0.000142 | 0.050745 | 5.657371e-07 | 251.0 | 0.998008 | 0.996020 | 1.4954 | -1.6323 |
1 | -0.000077 | 0.010407 | -3.067729e-07 | 251.0 | 0.998007 | 0.996018 | 1.7772 | -1.7234 |
2 | 0.000480 | -0.070497 | 1.912351e-06 | 251.0 | 0.998008 | 0.996020 | 1.4992 | -2.1308 |
3 | 0.000015 | 0.193190 | 5.976096e-08 | 251.0 | 0.998005 | 0.996013 | 1.2339 | -1.7612 |
4 | 0.000347 | 0.145480 | 1.382470e-06 | 251.0 | 0.998008 | 0.996019 | 1.3426 | -1.6307 |
[11]:
classifier = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False), RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00, 2.10s/it]
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00, 1.40it/s]
[11]:
0.8490566037735849
What’s the implicit modelling choice here?
Instead of working in the domain of the time series, we extract features from time series and choose to work in the domain of the features. Again, sometimes this works well, sometimes it doesn’t. The main difficulty is finding discriminative features for the classification problem.
Time series classification with sktime¶
sktime has a number of specialised time series algorithms.
Time series forest¶
Time series forest is a modification of the random forest algorithm to the time series setting:
Split the series into multiple random intervals,
Extract features (mean, standard deviation and slope) from each interval,
Train a decision tree on the extracted features,
Ensemble steps 1 - 3.
For more details, take a look at the paper.
In sktime, we can write:
[12]:
from sktime.transformations.panel.summarize import RandomIntervalFeatureExtractor
steps = [
(
"extract",
RandomIntervalFeatureExtractor(
n_intervals="sqrt", features=[np.mean, np.std, _slope]
),
),
("clf", DecisionTreeClassifier()),
]
time_series_tree = Pipeline(steps)
We can directly fit and evaluate the single time series tree (which is simply a pipeline).
[13]:
time_series_tree.fit(X_train, y_train)
time_series_tree.score(X_test, y_test)
[13]:
0.6981132075471698
For time series forest, we can simply use the single tree as the base estimator in the forest ensemble.
[14]:
tsf = ComposableTimeSeriesForestClassifier(
estimator=time_series_tree,
n_estimators=100,
criterion="entropy",
bootstrap=True,
oob_score=True,
random_state=1,
n_jobs=-1,
)
Fit and obtain the out-of-bag score:
[15]:
tsf.fit(X_train, y_train)
if tsf.oob_score:
print(tsf.oob_score_)
0.8354430379746836
[16]:
tsf = ComposableTimeSeriesForestClassifier()
tsf.fit(X_train, y_train)
tsf.score(X_test, y_test)
[16]:
0.8490566037735849
We can also obtain feature importances for the different features and intervals that the algorithms looked at and plot them in a feature importance graph over time.
[17]:
fi = tsf.feature_importances_
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi.plot(ax=ax)
ax.set(xlabel="Time", ylabel="Feature importance");
/Users/mloning/.conda/envs/sktime-dev/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py:584: UserWarning: The handle <matplotlib.lines.Line2D object at 0x7f983bf78550> has a label of '_slope' which cannot be automatically added to the legend.
ax.legend(handles, labels, loc="best", title=title)
[17]:
[Text(0.5, 0, 'Time'), Text(0, 0.5, 'Feature importance')]

More about feature importances¶
The feature importances method is based on the example showcased in this paper.
In addition to the feature importances method available in scikit-learn, our method collects the feature importances values from each estimator for their respective intervals, calculates the sum of feature importances values on each timepoint, and normalises the values first by the number of estimators and then by the number of intervals.
As a result, the temporal importance curves can be plotted, as shown in the previous example.
Please note that this method currently supports only one particular structure of the TSF, where RandomIntervalFeatureExtractor() is used in the pipeline, or simply the default TimeSeriesForestClassifier() setting. For instance, two possible approaches could be:
[18]:
# Method 1: Default time-series forest classifier
tsf1 = ComposableTimeSeriesForestClassifier()
tsf1.fit(X_train, y_train)
fi1 = tsf1.feature_importances_
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi1.plot(ax=ax)
# Method 2: Pipeline
features = [np.mean, np.std, _slope]
steps = [
("transform", RandomIntervalFeatureExtractor(features=features)),
("clf", DecisionTreeClassifier()),
]
base_estimator = Pipeline(steps)
tsf2 = ComposableTimeSeriesForestClassifier(estimator=base_estimator)
tsf2.fit(X_train, y_train)
fi2 = tsf2.feature_importances_
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi2.plot(ax=ax);
/Users/mloning/.conda/envs/sktime-dev/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py:584: UserWarning: The handle <matplotlib.lines.Line2D object at 0x7f983c6cd250> has a label of '_slope' which cannot be automatically added to the legend.
ax.legend(handles, labels, loc="best", title=title)
/Users/mloning/.conda/envs/sktime-dev/lib/python3.7/site-packages/pandas/plotting/_matplotlib/core.py:584: UserWarning: The handle <matplotlib.lines.Line2D object at 0x7f983c5a6e50> has a label of '_slope' which cannot be automatically added to the legend.
ax.legend(handles, labels, loc="best", title=title)
[18]:
<AxesSubplot:>


RISE¶
Another popular variant of time series forest is the so-called Random Interval Spectral Ensemble (RISE), which makes use of several series-to-series feature extraction transformers, including:
Fitted auto-regressive coefficients,
Estimated autocorrelation coefficients,
Power spectrum coefficients.
[19]:
from sktime.classification.interval_based import RandomIntervalSpectralForest
rise = RandomIntervalSpectralForest(n_estimators=10)
rise.fit(X_train, y_train)
rise.score(X_test, y_test)
[19]:
0.8301886792452831
K-nearest-neighbours classifier for time series¶
For time series, the most popular k-nearest-neighbours algorithm is based on dynamic time warping (dtw) distance measure.
Here we look at the BasicMotions data set. The data was generated as part of a student project where four students performed four activities whilst wearing a smart watch. The watch collects 3D accelerometer and a 3D gyroscope It consists of four classes, which are walking, resting, running and badminton. Participants were required to record motion a total of five times, and the data is sampled once every tenth of a second, for a ten second period.
[20]:
from sktime.datasets import load_basic_motions
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X.iloc[:, [0]], y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 1) (60,) (20, 1) (20,)
[21]:
labels, counts = np.unique(y_train, return_counts=True)
print(labels, counts)
['badminton' 'running' 'standing' 'walking'] [14 16 14 16]
[22]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for label in labels:
X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=label)
plt.legend()
ax.set(title="Example time series", xlabel="Time");
[22]:
[Text(0.5, 1.0, 'Example time series'), Text(0.5, 0, 'Time')]

[23]:
for label in labels[:2]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for instance in X_train.loc[y_train == label, "dim_0"]:
ax.plot(instance)
ax.set(title=f"Instances of {label}")


from sklearn.neighbors import KNeighborsClassifier knn = make_pipeline( Tabularizer(), KNeighborsClassifier(n_neighbors=1, metric=“euclidean”)) knn.fit(X_train, y_train) knn.score(X_test, y_test)
[24]:
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, metric="dtw")
knn.fit(X_train, y_train)
knn.score(X_test, y_test)
[24]:
0.55
Other classifiers¶
To find out what other algorithms we have implemented in sktime, you can use our utility function:
[25]:
from sktime.utils import all_estimators
all_estimators(estimator_types="classifier")
[25]:
[('BOSSEnsemble', sktime.classification.dictionary_based._boss.BOSSEnsemble),
('ColumnEnsembleClassifier',
sktime.classification.compose._column_ensemble.ColumnEnsembleClassifier),
('ContractableBOSS',
sktime.classification.dictionary_based._cboss.ContractableBOSS),
('ElasticEnsemble',
sktime.classification.distance_based._elastic_ensemble.ElasticEnsemble),
('IndividualBOSS',
sktime.classification.dictionary_based._boss.IndividualBOSS),
('IndividualTDE', sktime.classification.dictionary_based._tde.IndividualTDE),
('KNeighborsTimeSeriesClassifier',
sktime.classification.distance_based._time_series_neighbors.KNeighborsTimeSeriesClassifier),
('MUSE', sktime.classification.dictionary_based._muse.MUSE),
('MrSEQLClassifier',
sktime.classification.shapelet_based.mrseql.mrseql.MrSEQLClassifier),
('ProximityForest',
sktime.classification.distance_based._proximity_forest.ProximityForest),
('ProximityStump',
sktime.classification.distance_based._proximity_forest.ProximityStump),
('ProximityTree',
sktime.classification.distance_based._proximity_forest.ProximityTree),
('RandomIntervalSpectralForest',
sktime.classification.frequency_based._rise.RandomIntervalSpectralForest),
('ShapeletTransformClassifier',
sktime.classification.shapelet_based._stc.ShapeletTransformClassifier),
('TemporalDictionaryEnsemble',
sktime.classification.dictionary_based._tde.TemporalDictionaryEnsemble),
('TimeSeriesForest',
sktime.classification.interval_based._tsf.TimeSeriesForest),
('TimeSeriesForestClassifier',
sktime.classification.compose._ensemble.TimeSeriesForestClassifier),
('WEASEL', sktime.classification.dictionary_based._weasel.WEASEL)]
Generated by nbsphinx. The Jupyter notebook can be found here.