sktime.transformations.panel.dictionary_based¶
-
class
sktime.transformations.panel.dictionary_based.
PAA
(num_intervals=8)[source]¶ Bases:
sktime.transformations.base._PanelToPanelTransformer
(PAA) Piecewise Aggregate Approximation Transformer, as described in Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286, 2001. For each series reduce the dimensionality to num_intervals, where each value is the mean of values in the interval.
- TO DO: pythonise it to make it more efficient. Maybe check vs this version
Could have: Tune the interval size in fit somehow?
- Parameters
num_intervals (int, dimension of the transformed data (default 8)) –
-
class
sktime.transformations.panel.dictionary_based.
SAX
(word_length=8, alphabet_size=4, window_size=12, remove_repeat_words=False, save_words=False, return_pandas_data_series=True)[source]¶ Bases:
sktime.transformations.base._PanelToPanelTransformer
SAX (Symbolic Aggregate approXimation) Transformer, as described in Jessica Lin, Eamonn Keogh, Li Wei and Stefano Lonardi, “Experiencing SAX: a novel symbolic representation of time series” Data Mining and Knowledge Discovery, 15(2):107-144 Overview: for each series:
run a sliding window across the series for each window
shorten the series with PAA (Piecewise Approximate Aggregation) discretise the shortened series into fixed bins form a word from these discrete values
by default SAX produces a single word per series (window_size=0). SAX returns a pandas data frame where column 0 is the histogram (sparse pd.series) of each series.
- Parameters
word_length (int, length of word to shorten window to (using) –
(default 8) (PAA)) –
alphabet_size (int, number of values to discretise each value) –
(default to 4) (to) –
window_size (int, size of window for sliding. Input series) –
for whole series transform (default to 12) (length) –
remove_repeat_words (boolean, whether to use numerosity reduction () –
False) (default) –
save_words (boolean, whether to use numerosity reduction () –
False) –
return_pandas_data_series (boolean, default = True) – set to true to return Pandas Series as a result of transform. setting to true reduces speed significantly but is required for automatic test.
-
class
sktime.transformations.panel.dictionary_based.
SFA
(word_length=8, alphabet_size=4, window_size=12, norm=False, binning_method='equi-depth', anova=False, bigrams=False, skip_grams=False, remove_repeat_words=False, levels=1, lower_bounding=True, save_words=False, save_binning_dft=False, return_pandas_data_series=False, n_jobs=1)[source]¶ Bases:
sktime.transformations.base._PanelToPanelTransformer
SFA (Symbolic Fourier Approximation) Transformer, as described in
- @inproceedings{schafer2012sfa,
title={SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets}, author={Sch{“a}fer, Patrick and H{“o}gqvist, Mikael}, booktitle={Proceedings of the 15th International Conference on Extending Database Technology}, pages={516–527}, year={2012}, organization={ACM}
}
- Overview: for each series:
run a sliding window across the series for each window
shorten the series with DFT discretise the shortened series into bins set by MFC form a word from these discrete values
by default SFA produces a single word per series (window_size=0) if a window is used, it forms a histogram of counts of words.
- Parameters
word_length (int, default = 8) – length of word to shorten window to (using PAA)
alphabet_size (int, default = 4) – number of values to discretise each value to
window_size (int, default = 12) – size of window for sliding. Input series length for whole series transform
norm (boolean, default = False) – mean normalise words by dropping first fourier coefficient
binning_method ({"equi-depth", "equi-width", "information-gain", "kmeans"},) – default=”equi-depth” the binning method used to derive the breakpoints.
anova (boolean, default = False) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given
bigrams (boolean, default = False) – whether to create bigrams of SFA words
skip_grams (boolean, default = False) – whether to create skip-grams of SFA words
remove_repeat_words (boolean, default = False) – whether to use numerosity reduction (default False)
levels (int, default = 1) – Number of spatial pyramid levels
save_words (boolean, default = False) – whether to save the words generated for each series (default False)
return_pandas_data_series (boolean, default = False) – set to true to return Pandas Series as a result of transform. setting to true reduces speed significantly but is required for automatic test.
n_jobs (int, optional, default = 1) – The number of jobs to run in parallel for both transform.
-1
means using all processors.
-
num_insts = 0
-
num_atts = 0
-
fit
(X, y=None)[source]¶ Calculate word breakpoints using _mcb
- Parameters
X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.
y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The class labels.
- Returns
self
- Return type