sktime.classification.dictionary_based¶
-
class
sktime.classification.dictionary_based.
BOSSEnsemble
(threshold=0.92, max_ensemble_size=500, max_win_len_prop=1, min_window=10, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Bag of SFA Symbols (BOSS)
Bag of SFA Symbols Ensemble: implementation of BOSS from [1]
Overview: Input n series length m BOSS performs a gird search over a set of parameter values, evaluating each with a LOOCV. It then retains all ensemble members within 92% of the best. There are three primary :param alpha: alphabet size :param w: window length :param l: word length.
for any combination, a single BOSS slides a window length w along the series. The w length window is shortened to an l length word through taking a Fourier transform and keeping the first l/2 complex coefficients. These l coefficients are then discretised into alpha possible values, to form a word length l. A histogram of words for each series is formed and stored. fit involves finding n histograms.
predict uses 1 nearest neighbour with a bespoke distance function.
- Parameters
threshold (double [0,1] retain all classifiers within) –
of the best one (threshold%) –
(default = 0.92) (optional) –
max_ensemble_size (int or None, retain a maximum number of) –
classifiers –
if within threshold (even) –
(default = 500) (optional) –
max_win_len_prop (maximum window length as a proportion of) –
length (default = 1) (series) –
min_window (minimum window size, (default = 10)) –
n_jobs (int, optional (default=1)) –
number of jobs to run in parallel for both fit and predict. (The) –
means using all processors. (-1) –
(default to no seed) (optional) –
-
<= max_ensemble_size)
Notes
- ..[1] Patrick Schäfer, “The BOSS is concerned with time series classification
in the presence of noise”, Data Mining and Knowledge Discovery, 29(6): 2015 https://link.springer.com/article/10.1007/s10618-014-0377-7
For the Java version, see https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/dictionary_based/BOSS.java
-
fit
(X, y)[source]¶ Build an ensemble of BOSS classifiers from the training set (X, y), through creating a variable size ensemble of those within a threshold of the best.
- Parameters
X (pd.DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
class
sktime.classification.dictionary_based.
ContractableBOSS
(n_parameter_samples=250, max_ensemble_size=50, max_win_len_prop=1, time_limit=0.0, min_window=10, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Contractable Bag of SFA Symbols (cBOSS) implementation of BOSS from [1] with refinements described in [2]
Overview: Input n series length m cBOSS randomly samples n_parameter_samples parameter sets, evaluating each with a LOOCV. It then retains max_ensemble_size classifiers with the highest accuracy. There are three primary parameters:
alpha: alphabet size w: window length l: word length.
for any combination, a single BOSS slides a window length w along the series. The w length window is shortened to an l length word through taking a Fourier transform and keeping the first l/2 complex coefficients. These l coefficients are then discretised into alpha possible values, to form a word length l. A histogram of words for each series is formed and stored. fit involves finding n histograms.
predict uses 1 nearest neighbour with a bespoke distance function.
- Parameters
n_parameter_samples (int, if search is randomised, number of) –
combos to try (default = 250) (parameter) –
max_ensemble_size (int or None, retain a maximum number of) –
classifiers –
if within threshold (even) –
(default = 50) (optional) –
max_win_len_prop (maximum window length as a proportion of) –
length (default = 1) (series) –
time_limit (time contract to limit build time in minutes) –
= 0 ((default) –
limit) (no) –
min_window (minimum window size, (default = 10)) –
n_jobs (int, optional (default=1)) –
number of jobs to run in parallel for both fit and predict. (The) –
means using all processors. (-1) –
(default to no seed) (optional) –
-
<= max_ensemble_size)
See also
Notes
..[1] Patrick Schäfer, “The BOSS is concerned with time series classification in the presence of noise”, Data Mining and Knowledge Discovery, 29(6): 2015
..[2] Matthew Middlehurst, William Vickers and Anthony Bagnall “Scalable Dictionary Classifiers for Time Series Classification”, in proc 20th International Conference on Intelligent Data Engineering and Automated Learning,LNCS, volume 11871
For the Java version, see https://github.com/uea-machine-learning/tsml/blob/master/src/ main/java/tsml/classifiers/dictionary_based/cBOSS.java
-
fit
(X, y)[source]¶ Build an ensemble of BOSS classifiers from the training set (X, y), through randomising over the para space to make a fixed size ensemble of the best.
- Parameters
X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
class
sktime.classification.dictionary_based.
IndividualBOSS
(window_size=10, word_length=8, norm=False, alphabet_size=4, save_words=True, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Single Bag of SFA Symbols (BOSS) classifier
Bag of SFA Symbols Ensemble: implementation of BOSS from Schaffer : @article
-
class
sktime.classification.dictionary_based.
IndividualTDE
(window_size=10, word_length=8, norm=False, levels=1, igb=False, alphabet_size=4, bigrams=True, dim_threshold=0.85, max_dims=20, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Single TDE classifier, based off the Bag of SFA Symbols (BOSS) model
-
class
sktime.classification.dictionary_based.
MUSE
(anova=True, bigrams=True, window_inc=2, p_threshold=0.05, use_first_order_differences=True, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
WEASEL+MUSE (MUltivariate Symbolic Extension) MUSE: implementation of multivariate version of WEASEL, referred to as just MUSE from [1]
- Overview: Input n series length m
WEASEL+MUSE is a multivariate dictionary classifier that builds a bag-of-patterns using SFA for different window lengths and learns a logistic regression classifier on this bag.
- There are these primary parameters:
alphabet_size: alphabet size chi2-threshold: used for feature selection to select best words anova: select best l/2 fourier coefficients other than first ones bigrams: using bigrams of SFA words binning_strategy: the binning strategy used to disctrtize into
SFA words.
- Parameters
anova (boolean, default = True) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given
bigrams (boolean, default = True) – whether to create bigrams of SFA words
window_inc (int, default = 4) –
- WEASEL create a BoP model for each window sizes. This is the
increment used to determine the next window size.
- p_threshold: int, default = 0.05 (disabled by default)
Feature selection is applied based on the chi-squared test. This is the p-value threshold to use for chi-squared test on bag-of-words (lower means more strict). 1 indicates that the test should not be performed.
use_first_order_differences (boolean, default = True) – If set to True will add the first order differences of each dimension to the data.
See also
Notes
..[1] Patrick Schäfer and Ulf Leser, “Multivariate time series classification with WEASEL+MUSE”, in proc 3rd ECML/PKDD Workshop on AALTD}, 2018 https://arxiv.org/abs/1711.11343 Java version https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/multivariate/WEASEL_MUSE.java
-
fit
(X, y)[source]¶ Build a WEASEL+MUSE classifiers from the training set (X, y),
- Parameters
X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
class
sktime.classification.dictionary_based.
TemporalDictionaryEnsemble
(n_parameter_samples=250, max_ensemble_size=50, time_limit=0.0, max_win_len_prop=1, min_window=10, randomly_selected_params=50, bigrams=None, dim_threshold=0.85, max_dims=20, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Temporal Dictionary Ensemble (TDE) as described in [1].
Overview: Input n series length m with d dimensions TDE searches k parameter values selected using a Gaussian processes regressor, evaluating each with a LOOCV. It then retains s ensemble members. There are six primary parameters for individual classifiers:
alpha: alphabet size w: window length l: word length p: normalise/no normalise h: levels b: MCB/IGB
for any combination, an individual TDE classifier slides a window of length w along the series. The w length window is shortened to an l length word through taking a Fourier transform and keeping the first l/2 complex coefficients. These lcoefficients are then discretised into alpha possible values, to form a word length l using breakpoints found using b. A histogram of words for each series is formed and stored, using a spatial pyramid of h levels. For multivariate series, accuracy from a reduced histogram is used to select dimensions.
fit involves finding n histograms. predict uses 1 nearest neighbour with a the histogram intersection distance function.
For the original Java version, see https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/ tsml/classifiers/dictionary_based/TDE.java
- Parameters
n_parameter_samples (int, number of parameter combos to try) –
(default=250) –
max_ensemble_size (int, maximum number of classifiers) –
(default=50) –
time_limit (int, time contract to limit build time in) –
(default=0 (minutes) –
limit) (no) –
max_win_len_prop (float between 0 and 1, maximum window length) –
a proportion of series length (default=1) (as) –
min_window (int, minimum window size (default=10)) –
randomly_selected_params (int, number of parameters randomly selected) –
GP is used (default=50) (before) –
bigrams (boolean or None, whether to use bigrams) –
(default=None –
for univariate (true) –
for multivariate) (false) –
dim_threshold (float between 0 and 1, dimension accuracy) –
for multivariate (default=0.85) (threshold) –
max_dims (int, max number of dimensions for multivariate) –
(default=20) –
n_jobs (int, optional (default=1)) –
number of jobs to run in parallel for both fit and predict. (The) –
means using all processors. (-1) –
(default to no seed) (optional) –
-
(<=max_ensemble_size)
Notes
- ..[1] Matthew Middlehurst, James Large, Gavin Cawley and Anthony Bagnall
“The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification”,
in proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020
Java version https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/ tsml/classifiers/dictionary_based/TDE.java
-
fit
(X, y)[source]¶ Build an ensemble of individual TDE classifiers from the training set (X,y), through randomising over the parameter space to a set number of times then selecting new parameters using Gaussian processes
- Parameters
X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
class
sktime.classification.dictionary_based.
WEASEL
(anova=True, bigrams=True, binning_strategy='information-gain', window_inc=2, p_threshold=0.05, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Word ExtrAction for time SEries cLassification (WEASEL) from [1].
# Overview: Input n series length m # WEASEL is a dictionary classifier that builds a bag-of-patterns using SFA # for different window lengths and learns a logistic regression classifier # on this bag. # # There are these primary parameters: # alphabet_size: alphabet size # chi2-threshold: used for feature selection to select best words # anova: select best l/2 fourier coefficients other than first ones # bigrams: using bigrams of SFA words # binning_strategy: the binning strategy used to discretise into # SFA words. # # WEASEL slides a window length w along the series. The w length window # is shortened to an l length word through taking a Fourier transform and # keeping the best l/2 complex coefficients using an anova one-sided # test. These l coefficients are then discretised into alpha possible # symbols, to form a word of length l. A histogram of words for each # series is formed and stored. # For each window-length a bag is created and all words are joint into # one bag-of-patterns. Words from different window-lengths are # discriminated by different prefixes. # # fit involves training a logistic regression classifier on the single # bag-of-patterns. # # predict uses the logistic regression classifier
# For the Java version, see # https://github.com/uea-machine-learning/tsml/blob/master/src/main/java # /tsml/classifiers/dictionary_based/WEASEL.java
- Parameters
anova (boolean, default = True) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given
bigrams (boolean, default = True) – whether to create bigrams of SFA words
binning_strategy ({"equi-depth", "equi-width", "information-gain"},) – default=”information-gain” The binning method used to derive the breakpoints.
window_inc (int, default = 4) – WEASEL create a BoP model for each window sizes. This is the increment used to determine the next window size.
p_threshold (int, default = 0.05 (disabled by default)) – Feature selection is applied based on the chi-squared test. This is the p-value threshold to use for chi-squared test on bag-of-words (lower means more strict). 1 indicates that the test should not be performed.
Notes
..[1] Patrick Schäfer and Ulf Leser, : @inproceedings{schafer2017fast,
title={Fast and Accurate Time Series Classification with WEASEL}, author={Sch{“a}fer, Patrick and Leser, Ulf}, booktitle={Proceedings of the 2017 ACM on Conference on Information and
Knowledge Management},
pages={637–646}, year={2017}
} https://dl.acm.org/doi/10.1145/3132847.3132980
-
fit
(X, y)[source]¶ Build a WEASEL classifiers from the training set (X, y),
- Parameters
X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type