sktime.classification.interval_based

class sktime.classification.interval_based.RandomIntervalSpectralForest(n_estimators=200, min_interval=16, acf_lag=100, acf_min_values=4, n_jobs=None, random_state=None)[source]

Bases: sklearn.ensemble._forest.ForestClassifier, sktime.classification.base.BaseClassifier

Random Interval Spectral Forest (RISE) from [1]

Overview:

Input: n series length m
for each tree
    sample a random intervals
    take the ACF and PS over this interval, and concatenate features
    build tree on new features
ensemble the trees through averaging probabilities.

Need to have a minimum interval for each tree This is from the python github.

Parameters
  • n_estimators (int, optional (default=200)) – The number of trees in the forest.

  • min_interval (int, optional (default=16)) – The minimum width of an interval.

  • acf_lag (int, optional (default=100)) – The maximum number of autocorrelation terms to use.

  • acf_min_values (int, optional (default=4)) – Never use fewer than this number of terms to find a correlation.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel for both fit and predict. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

n_classes[source]

The number of classes, extracted from the data.

Type

int

classifiers[source]
Type

array of shape = [n_estimators] of DecisionTree classifiers

intervals[source]

Stores indexes of start and end points for all classifiers.

Type

array of shape = [n_estimators][2]

Notes

..[1] Jason Lines, Sarah Taylor and Anthony Bagnall, “Time Series Classification with HIVE-COTE: The Hierarchical Vote Collective of Transformation-Based Ensembles”,

ACM Transactions on Knowledge and Data Engineering, 12(5): 2018

https://dl.acm.org/doi/10.1145/3182382 Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/frequency_based/RISE.java

capabilities = {'missing_values': False, 'multivariate': False, 'unequal_length': False}[source]
property feature_importances_[source]

The impurity-based feature importances.

The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Returns

feature_importances_ – The values of this array sum to 1, unless all trees are single node trees consisting of only the root node, in which case it will be an array of zeros.

Return type

ndarray of shape (n_features,)

fit(X, y)[source]

Build a forest of trees from the training set (X, y) using random intervals and spectral features.

Parameters
  • X (array-like or sparse matrix of shape = [n_instances,) –

  • or shape = [n_instances (series_length]) – The training input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.

  • n_columns] – The training input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.

  • y (array-like, shape = [n_instances]) – The class labels.

Returns

self

Return type

object

predict(X)[source]

Find predictions for all cases in X. Built on top of predict_proba.

Parameters

X (array-like or sparse matrix of shape = [n_instances, n_columns]) – The input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.

Returns

y – The predicted classes.

Return type

array of shape = [n_instances]

predict_proba(X)[source]

Find probability estimates for each class for all cases in X.

Parameters
  • X (array-like or sparse matrix of shape = [n_instances, n_columns]) – The input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.

  • variables (Local) –

  • ---------------

  • n_instances (int) – Number of cases to classify.

  • n_columns (int) – Number of attributes in X, must match series_length determined in fit.

Returns

output – The class probabilities of all cases.

Return type

array of shape = [n_instances, n_classes]

class sktime.classification.interval_based.SupervisedTimeSeriesForest(n_estimators=500, n_jobs=1, random_state=None)[source]

Bases: sklearn.ensemble._forest.ForestClassifier, sktime.classification.base.BaseClassifier

Supervised Time Series Forest (STSF) classifier as described in [1].

A time series forest is an ensemble of decision trees built on intervals selected through a supervised process.

Overview: Input n series length m for each tree

sample X using class-balanced bagging sample intervals for all 3 representations and 7 features using supervised method find mean, median, std, slope, iqr, min and max using their corresponding interval for each rperesentation, concatenate to form new data set build decision tree on new data set

ensemble the trees with averaged probability estimates

n_estimators : int, ensemble size, optional (default = 200) n_jobs : int, optional (default=1) The number of jobs to run in parallel for both fit and predict. -1 means using all processors. random_state : int, seed for random, optional (default = none)

n_classes : int, extracted from the data classifiers : array of shape = [n_estimators] of DecisionTree classifiers intervals : array of shape = [n_estimators][3][7][n_intervals][2] stores indexes of all start and end points for all classifiers for each representaion and feature

..[1] Cabello, Nestor, et al. “Fast and Accurate Time Series Classification Through Supervised Interval Search.” IEEE ICDM 2020

Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/ java/tsml/classifiers/interval_based/STSF.java

capabilities = {'missing_values': False, 'multivariate': False, 'unequal_length': False}[source]
fit(X, y)[source]

Build a forest of trees from the training set (X, y) using supervised intervals and summary features :param X: :type X: array-like or sparse matrix of shape = [n_instances, :param series_length] or shape = [n_instances: The training input samples. If a Pandas data frame is passed it

must have a single column (i.e. univariate classification. STSF has no bespoke method for multivariate classification as yet.

Parameters
  • n_columns] – The training input samples. If a Pandas data frame is passed it must have a single column (i.e. univariate classification. STSF has no bespoke method for multivariate classification as yet.

  • y (array-like, shape = [n_instances] The class labels.) –

Returns

self

Return type

object

predict(X)[source]

Find predictions for all cases in X. Built on top of predict_proba :param X: :type X: The training input samples. array-like or pandas data frame. :param If a Pandas data frame is passed: :param a check is performed that it only: :param has one column.: :param If not: :param an exception is thrown: :param since this classifier does not yet have: :param multivariate capability.:

Returns

output

Return type

array of shape = [n_test_instances]

predict_proba(X)[source]

Find probability estimates for each class for all cases in X. :param X: :type X: The training input samples. array-like or sparse matrix of shape :param = [n_test_instances: If a Pandas data frame is passed (sktime format) a check is

performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.

Parameters

series_length] – If a Pandas data frame is passed (sktime format) a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.

Returns

output – Predicted probabilities

Return type

nd.array of shape = (n_instances, n_classes)

class sktime.classification.interval_based.TimeSeriesForestClassifier(min_interval=3, n_estimators=200, n_jobs=1, random_state=None)[source]

Bases: sktime.series_as_features.base.estimators.interval_based._tsf.BaseTimeSeriesForest, sklearn.ensemble._forest.ForestClassifier, sktime.classification.base.BaseClassifier

Time series forest classifier.

A time series forest is an ensemble of decision trees built on random intervals.

Overview: Input n series length m. For each tree

  • sample sqrt(m) intervals,

  • find mean, std and slope for each interval, concatenate to form new

data set, - build decision tree on new data set.

Ensemble the trees with averaged probability estimates.

This implementation deviates from the original in minor ways. It samples intervals with replacement and does not use the splitting criteria tiny refinement described in [1]. This is an intentionally stripped down, non configurable version for use as a hive-cote component. For a configurable tree based ensemble, see sktime.classifiers.ensemble.TimeSeriesForestClassifier

n_estimators : int, ensemble size, optional (default = 200) min_interval : int, minimum width of an interval, optional (default to 3) n_jobs : int, optional (default=1)

The number of jobs to run in parallel for both fit and predict. -1 means using all processors.

random_state : int, seed for random, optional (default = none)

n_classes : int n_intervals : int classes_ : List of classes for a given problem

1

H.Deng, G.Runger, E.Tuv and M.Vladimir, “A time series forest for

classification and feature extraction”,Information Sciences, 239, 2013 Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/ java/tsml/classifiers/interval_based/TSF.java Arxiv version of the paper: https://arxiv.org/abs/1302.2277