WEASEL¶
-
class
sktime.classification.dictionary_based.
WEASEL
(anova=True, bigrams=True, binning_strategy='information-gain', window_inc=2, p_threshold=0.05, n_jobs=1, random_state=None)[source]¶ Word ExtrAction for time SEries cLassification (WEASEL) from [1].
# Overview: Input n series length m # WEASEL is a dictionary classifier that builds a bag-of-patterns using SFA # for different window lengths and learns a logistic regression classifier # on this bag. # # There are these primary parameters: # alphabet_size: alphabet size # chi2-threshold: used for feature selection to select best words # anova: select best l/2 fourier coefficients other than first ones # bigrams: using bigrams of SFA words # binning_strategy: the binning strategy used to discretise into # SFA words. # # WEASEL slides a window length w along the series. The w length window # is shortened to an l length word through taking a Fourier transform and # keeping the best l/2 complex coefficients using an anova one-sided # test. These l coefficients are then discretised into alpha possible # symbols, to form a word of length l. A histogram of words for each # series is formed and stored. # For each window-length a bag is created and all words are joint into # one bag-of-patterns. Words from different window-lengths are # discriminated by different prefixes. # # fit involves training a logistic regression classifier on the single # bag-of-patterns. # # predict uses the logistic regression classifier
# For the Java version, see # https://github.com/uea-machine-learning/tsml/blob/master/src/main/java # /tsml/classifiers/dictionary_based/WEASEL.java
- Parameters
anova (boolean, default = True) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given
bigrams (boolean, default = True) – whether to create bigrams of SFA words
binning_strategy ({"equi-depth", "equi-width", "information-gain"},) – default=”information-gain” The binning method used to derive the breakpoints.
window_inc (int, default = 4) – WEASEL create a BoP model for each window sizes. This is the increment used to determine the next window size.
p_threshold (int, default = 0.05 (disabled by default)) – Feature selection is applied based on the chi-squared test. This is the p-value threshold to use for chi-squared test on bag-of-words (lower means more strict). 1 indicates that the test should not be performed.
Notes
..[1] Patrick Schäfer and Ulf Leser, : @inproceedings{schafer2017fast,
title={Fast and Accurate Time Series Classification with WEASEL}, author={Sch{“a}fer, Patrick and Leser, Ulf}, booktitle={Proceedings of the 2017 ACM on Conference on Information and
Knowledge Management},
pages={637–646}, year={2017}