SFA¶
-
class
sktime.transformations.panel.dictionary_based.
SFA
(word_length=8, alphabet_size=4, window_size=12, norm=False, binning_method='equi-depth', anova=False, bigrams=False, skip_grams=False, remove_repeat_words=False, levels=1, lower_bounding=True, save_words=False, save_binning_dft=False, return_pandas_data_series=False, n_jobs=1)[source]¶ SFA (Symbolic Fourier Approximation) Transformer, as described in
- @inproceedings{schafer2012sfa,
title={SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets}, author={Sch{“a}fer, Patrick and H{“o}gqvist, Mikael}, booktitle={Proceedings of the 15th International Conference on Extending Database Technology}, pages={516–527}, year={2012}, organization={ACM}
}
- Overview: for each series:
run a sliding window across the series for each window
shorten the series with DFT discretise the shortened series into bins set by MFC form a word from these discrete values
by default SFA produces a single word per series (window_size=0) if a window is used, it forms a histogram of counts of words.
- Parameters
word_length (int, default = 8) – length of word to shorten window to (using PAA)
alphabet_size (int, default = 4) – number of values to discretise each value to
window_size (int, default = 12) – size of window for sliding. Input series length for whole series transform
norm (boolean, default = False) – mean normalise words by dropping first fourier coefficient
binning_method ({"equi-depth", "equi-width", "information-gain", "kmeans"},) – default=”equi-depth” the binning method used to derive the breakpoints.
anova (boolean, default = False) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given
bigrams (boolean, default = False) – whether to create bigrams of SFA words
skip_grams (boolean, default = False) – whether to create skip-grams of SFA words
remove_repeat_words (boolean, default = False) – whether to use numerosity reduction (default False)
levels (int, default = 1) – Number of spatial pyramid levels
save_words (boolean, default = False) – whether to save the words generated for each series (default False)
return_pandas_data_series (boolean, default = False) – set to true to return Pandas Series as a result of transform. setting to true reduces speed significantly but is required for automatic test.
n_jobs (int, optional, default = 1) – The number of jobs to run in parallel for both transform.
-1
means using all processors.
-
num_insts = 0
-
num_atts = 0
-
__init__
(word_length=8, alphabet_size=4, window_size=12, norm=False, binning_method='equi-depth', anova=False, bigrams=False, skip_grams=False, remove_repeat_words=False, levels=1, lower_bounding=True, save_words=False, save_binning_dft=False, return_pandas_data_series=False, n_jobs=1)[source]¶ Initialize self. See help(type(self)) for accurate signature.