Feature extraction with tsfresh transformer¶
In this tutorial, we show how you can use sktime with tsfresh to first extract features from time series, so that we can then use any scikit-learn estimator.
Preliminaries¶
You have to install tsfresh if you haven’t already. To install it, uncomment the cell below:
[1]:
# !pip install --upgrade tsfresh
[2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
Univariate time series classification data¶
For more details on the data set, see the univariate time series classification notebook.
[3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
X_train.head()
[4]:
dim_0 | |
---|---|
173 | 0 -1.7390 1 -1.7415 2 -1.7329 3 ... |
17 | 0 -2.1788 1 -2.1751 2 -2.1550 3 ... |
145 | 0 -1.7427 1 -1.7399 2 -1.7102 3 ... |
59 | 0 -1.9969 1 -2.0076 2 -2.0010 3 ... |
95 | 0 -1.8284 1 -1.8393 2 -1.8025 3 ... |
[5]:
# binary classification task
np.unique(y_train)
[5]:
array(['0', '1', '2'], dtype=object)
Using tsfresh to extract features¶
[6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00, 2.32s/it]
[6]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_0__fourier_entropy__bins_2 | dim_0__fourier_entropy__bins_3 | dim_0__fourier_entropy__bins_5 | dim_0__fourier_entropy__bins_10 | dim_0__fourier_entropy__bins_100 | dim_0__permutation_entropy__dimension_3__tau_1 | dim_0__permutation_entropy__dimension_4__tau_1 | dim_0__permutation_entropy__dimension_5__tau_1 | dim_0__permutation_entropy__dimension_6__tau_1 | dim_0__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.000161 | 250.000831 | 0.315017 | 0.005540 | -0.000152 | 0.166000 | ... | 0.08151 | 0.092513 | 0.173767 | 0.219798 | 1.219806 | 1.447748 | 2.089695 | 2.619112 | 3.055134 | 3.411670 |
1 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000050 | 250.000350 | 0.351749 | 0.004856 | -0.000231 | -0.057820 | ... | 0.08151 | 0.081510 | 0.138673 | 0.250609 | 1.340724 | 1.568692 | 2.482612 | 3.225589 | 3.789130 | 4.198932 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | 0.000655 | 249.999869 | 0.323647 | 0.005567 | -0.000038 | 0.176560 | ... | 0.08151 | 0.092513 | 0.173767 | 0.285506 | 1.292960 | 1.439692 | 2.121711 | 2.705458 | 3.143189 | 3.489565 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000223 | 249.999003 | 0.341776 | 0.004884 | 0.000024 | -0.194770 | ... | 0.08151 | 0.081510 | 0.127671 | 0.184769 | 1.226987 | 1.535460 | 2.355170 | 2.990719 | 3.530660 | 3.961005 |
4 | 0.0 | 0.0 | 0.0 | 1.0 | -0.000167 | 250.001244 | 0.365629 | 0.006513 | -0.000238 | 0.009969 | ... | 0.08151 | 0.081510 | 0.092513 | 0.173767 | 1.159755 | 1.568399 | 2.464715 | 3.270374 | 3.890812 | 4.321230 |
5 rows × 773 columns
Using tsfresh with sktime¶
[7]:
classifier = make_pipeline(
TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00, 2.26s/it]
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00, 1.30it/s]
[7]:
0.7358490566037735
Multivariate time series classification data¶
[8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[9]:
# multivariate input data
X_train.head()
[9]:
dim_0 | dim_1 | dim_2 | dim_3 | dim_4 | dim_5 | |
---|---|---|---|---|---|---|
21 | 0 0.648833 1 0.648833 2 0.076985 3... | 0 -0.996722 1 -0.996722 2 -0.897264 3... | 0 -0.644136 1 -0.644136 2 0.970515 3... | 0 -0.101208 1 -0.101208 2 -0.407496 3... | 0 0.055931 1 0.055931 2 -0.157139 3... | 0 -0.031960 1 -0.031960 2 -0.343575 3... |
13 | 0 1.463566 1 1.463566 2 6.16934... | 0 1.782945 1 1.782945 2 8.09897... | 0 -0.817491 1 -0.817491 2 -5.628303 3... | 0 0.082565 1 0.082565 2 -2.671363 3... | 0 0.159802 1 0.159802 2 0.282318 3... | 0 0.095881 1 0.095881 2 -1.502142 3... |
17 | 0 3.789469 1 3.789469 2 1.78594... | 0 -1.353556 1 -1.353556 2 -10.69460... | 0 -0.685072 1 -0.685072 2 -4.465480 3... | 0 -0.021307 1 -0.021307 2 2.753927 3... | 0 -0.159802 1 -0.159802 2 -0.820319 3... | 0 0.133169 1 0.133169 2 2.974987 3... |
26 | 0 -0.761604 1 -0.761604 2 0.121078 3... | 0 0.260125 1 0.260125 2 -1.423255 3... | 0 -0.064487 1 -0.064487 2 0.075600 3... | 0 0.069248 1 0.069248 2 -0.282318 3... | 0 0.242367 1 0.242367 2 -0.332922 3... | 0 -0.007990 1 -0.007990 2 0.239704 3... |
11 | 0 -0.193013 1 -0.193013 2 2.40398... | 0 -0.106266 1 -0.106266 2 0.52392... | 0 -0.636563 1 -0.636563 2 -1.166243 3... | 0 -0.087891 1 -0.087891 2 -2.716640 3... | 0 0.010653 1 0.010653 2 1.297062 3... | 0 0.205080 1 0.205080 2 -0.609912 3... |
[10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
"tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:21<00:00, 4.24s/it]
[10]:
dim_0__variance_larger_than_standard_deviation | dim_0__has_duplicate_max | dim_0__has_duplicate_min | dim_0__has_duplicate | dim_0__sum_values | dim_0__abs_energy | dim_0__mean_abs_change | dim_0__mean_change | dim_0__mean_second_derivative_central | dim_0__median | ... | dim_5__fourier_entropy__bins_2 | dim_5__fourier_entropy__bins_3 | dim_5__fourier_entropy__bins_5 | dim_5__fourier_entropy__bins_10 | dim_5__fourier_entropy__bins_100 | dim_5__permutation_entropy__dimension_3__tau_1 | dim_5__permutation_entropy__dimension_4__tau_1 | dim_5__permutation_entropy__dimension_5__tau_1 | dim_5__permutation_entropy__dimension_6__tau_1 | dim_5__permutation_entropy__dimension_7__tau_1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 1.0 | 0.0 | 1.0 | 57.045746 | 172.027276 | 0.807892 | 0.001584 | 0.003131 | 0.422100 | ... | 0.165443 | 0.165443 | 0.165443 | 0.165443 | 1.241657 | 1.494736 | 2.333086 | 3.047524 | 3.577109 | 3.928619 |
1 | 1.0 | 1.0 | 0.0 | 1.0 | 575.369181 | 18681.476663 | 9.290810 | -0.150685 | -0.004437 | 11.136336 | ... | 0.165443 | 0.165443 | 0.165443 | 0.192626 | 1.343990 | 1.540222 | 2.478743 | 3.332544 | 3.891606 | 4.245651 |
2 | 1.0 | 0.0 | 0.0 | 1.0 | 456.363177 | 14668.442452 | 8.609941 | -0.103845 | 0.003627 | 10.290202 | ... | 0.165443 | 0.192626 | 0.192626 | 0.356468 | 1.923853 | 1.538814 | 2.523494 | 3.444948 | 4.027225 | 4.375502 |
3 | 1.0 | 0.0 | 0.0 | 1.0 | 73.888480 | 220.949429 | 1.057349 | -0.002087 | -0.003908 | 0.613719 | ... | 0.165443 | 0.192626 | 0.192626 | 0.192626 | 1.064807 | 1.530752 | 2.427612 | 3.185985 | 3.780048 | 4.133971 |
4 | 1.0 | 0.0 | 1.0 | 1.0 | 450.805329 | 13280.972257 | 7.340858 | -0.079199 | 0.020523 | 9.844745 | ... | 0.165443 | 0.192626 | 0.192626 | 0.288342 | 1.524120 | 1.597675 | 2.690962 | 3.511981 | 4.075170 | 4.366321 |
5 rows × 4638 columns
Using tsfresh for forecasting¶
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.
[11]:
from sklearn.ensemble import RandomForestRegressor
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
regressor = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)
fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)
Generated by nbsphinx. The Jupyter notebook can be found here.