sktime.transformations.panel.dictionary_based

class sktime.transformations.panel.dictionary_based.PAA(num_intervals=8)[source]

Bases: sktime.transformations.base._PanelToPanelTransformer

(PAA) Piecewise Aggregate Approximation Transformer, as described in Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286, 2001. For each series reduce the dimensionality to num_intervals, where each value is the mean of values in the interval.

TO DO: pythonise it to make it more efficient. Maybe check vs this version

http://vigne.sh/posts/piecewise-aggregate-approx/

Could have: Tune the interval size in fit somehow?

Parameters

num_intervals (int, dimension of the transformed data (default 8)) –

set_num_intervals(n)[source]
transform(X, y=None)[source]
Parameters

X (nested pandas DataFrame of shape [n_instances, n_dims]) – Nested dataframe with multivariate time-series in cells.

Returns

dims – second in column one etc.

Return type

Pandas data frame with first dimension in column zero,

class sktime.transformations.panel.dictionary_based.SAX(word_length=8, alphabet_size=4, window_size=12, remove_repeat_words=False, save_words=False, return_pandas_data_series=True)[source]

Bases: sktime.transformations.base._PanelToPanelTransformer

SAX (Symbolic Aggregate approXimation) Transformer, as described in Jessica Lin, Eamonn Keogh, Li Wei and Stefano Lonardi, “Experiencing SAX: a novel symbolic representation of time series” Data Mining and Knowledge Discovery, 15(2):107-144 Overview: for each series:

run a sliding window across the series for each window

shorten the series with PAA (Piecewise Approximate Aggregation) discretise the shortened series into fixed bins form a word from these discrete values

by default SAX produces a single word per series (window_size=0). SAX returns a pandas data frame where column 0 is the histogram (sparse pd.series) of each series.

Parameters
  • word_length (int, length of word to shorten window to (using) –

  • (default 8) (PAA)) –

  • alphabet_size (int, number of values to discretise each value) –

  • (default to 4) (to) –

  • window_size (int, size of window for sliding. Input series) –

  • for whole series transform (default to 12) (length) –

  • remove_repeat_words (boolean, whether to use numerosity reduction () –

  • False) (default) –

  • save_words (boolean, whether to use numerosity reduction () –

  • False)

  • return_pandas_data_series (boolean, default = True) – set to true to return Pandas Series as a result of transform. setting to true reduces speed significantly but is required for automatic test.

words[source]
Type

history = []

transform(X, y=None)[source]
Parameters

X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.

Returns

dims

Return type

Pandas data frame with first dimension in column zero

class sktime.transformations.panel.dictionary_based.SFA(word_length=8, alphabet_size=4, window_size=12, norm=False, binning_method='equi-depth', anova=False, bigrams=False, skip_grams=False, remove_repeat_words=False, levels=1, lower_bounding=True, save_words=False, save_binning_dft=False, return_pandas_data_series=False, n_jobs=1)[source]

Bases: sktime.transformations.base._PanelToPanelTransformer

SFA (Symbolic Fourier Approximation) Transformer, as described in

@inproceedings{schafer2012sfa,

title={SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets}, author={Sch{“a}fer, Patrick and H{“o}gqvist, Mikael}, booktitle={Proceedings of the 15th International Conference on Extending Database Technology}, pages={516–527}, year={2012}, organization={ACM}

}

Overview: for each series:

run a sliding window across the series for each window

shorten the series with DFT discretise the shortened series into bins set by MFC form a word from these discrete values

by default SFA produces a single word per series (window_size=0) if a window is used, it forms a histogram of counts of words.

Parameters
  • word_length (int, default = 8) – length of word to shorten window to (using PAA)

  • alphabet_size (int, default = 4) – number of values to discretise each value to

  • window_size (int, default = 12) – size of window for sliding. Input series length for whole series transform

  • norm (boolean, default = False) – mean normalise words by dropping first fourier coefficient

  • binning_method ({"equi-depth", "equi-width", "information-gain", "kmeans"},) – default=”equi-depth” the binning method used to derive the breakpoints.

  • anova (boolean, default = False) – If True, the Fourier coefficient selection is done via a one-way ANOVA test. If False, the first Fourier coefficients are selected. Only applicable if labels are given

  • bigrams (boolean, default = False) – whether to create bigrams of SFA words

  • skip_grams (boolean, default = False) – whether to create skip-grams of SFA words

  • remove_repeat_words (boolean, default = False) – whether to use numerosity reduction (default False)

  • levels (int, default = 1) – Number of spatial pyramid levels

  • save_words (boolean, default = False) – whether to save the words generated for each series (default False)

  • return_pandas_data_series (boolean, default = False) – set to true to return Pandas Series as a result of transform. setting to true reduces speed significantly but is required for automatic test.

  • n_jobs (int, optional, default = 1) – The number of jobs to run in parallel for both transform. -1 means using all processors.

words[source]
Type

[]

breakpoints[source]
Type

= []

num_insts = 0
num_atts = 0
static create_bigram_word(word, other_word, length)[source]
fit(X, y=None)[source]

Calculate word breakpoints using _mcb

Parameters
  • X (nested pandas DataFrame of shape [n_instances, 1]) – Nested dataframe with univariate time-series in cells.

  • y (array-like, shape = [n_samples] or [n_samples, n_outputs]) – The class labels.

Returns

self

Return type

object

static right_shift(left, right)[source]
classmethod shorten_word(word, amount)[source]
transform(X, y=None, supplied_dft=None)[source]

Transform data. Returns a transformed version of X.

classmethod word_list(word, length)[source]