sktime.classification.interval_based¶
-
class
sktime.classification.interval_based.
RandomIntervalSpectralForest
(n_estimators=200, min_interval=16, acf_lag=100, acf_min_values=4, n_jobs=None, random_state=None)[source]¶ Bases:
sklearn.ensemble._forest.ForestClassifier
,sktime.classification.base.BaseClassifier
Random Interval Spectral Forest (RISE) from [1]
Overview:
Input: n series length m for each tree sample a random intervals take the ACF and PS over this interval, and concatenate features build tree on new features ensemble the trees through averaging probabilities.
Need to have a minimum interval for each tree This is from the python github.
- Parameters
n_estimators (int, optional (default=200)) – The number of trees in the forest.
min_interval (int, optional (default=16)) – The minimum width of an interval.
acf_lag (int, optional (default=100)) – The maximum number of autocorrelation terms to use.
acf_min_values (int, optional (default=4)) – Never use fewer than this number of terms to find a correlation.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
-
intervals
[source]¶ Stores indexes of start and end points for all classifiers.
- Type
array of shape = [n_estimators][2]
Notes
..[1] Jason Lines, Sarah Taylor and Anthony Bagnall, “Time Series Classification with HIVE-COTE: The Hierarchical Vote Collective of Transformation-Based Ensembles”,
ACM Transactions on Knowledge and Data Engineering, 12(5): 2018
https://dl.acm.org/doi/10.1145/3182382 Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/frequency_based/RISE.java
-
property
feature_importances_
[source]¶ The impurity-based feature importances.
The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See
sklearn.inspection.permutation_importance()
as an alternative.- Returns
feature_importances_ – The values of this array sum to 1, unless all trees are single node trees consisting of only the root node, in which case it will be an array of zeros.
- Return type
ndarray of shape (n_features,)
-
fit
(X, y)[source]¶ Build a forest of trees from the training set (X, y) using random intervals and spectral features.
- Parameters
X (array-like or sparse matrix of shape = [n_instances,) –
or shape = [n_instances (series_length]) – The training input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.
n_columns] – The training input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.
y (array-like, shape = [n_instances]) – The class labels.
- Returns
self
- Return type
-
predict
(X)[source]¶ Find predictions for all cases in X. Built on top of predict_proba.
- Parameters
X (array-like or sparse matrix of shape = [n_instances, n_columns]) – The input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.
- Returns
y – The predicted classes.
- Return type
array of shape = [n_instances]
-
predict_proba
(X)[source]¶ Find probability estimates for each class for all cases in X.
- Parameters
X (array-like or sparse matrix of shape = [n_instances, n_columns]) – The input samples. If a Pandas data frame is passed it must have a single column (i.e., univariate classification). RISE has no bespoke method for multivariate classification as yet.
variables (Local) –
--------------- –
n_instances (int) – Number of cases to classify.
n_columns (int) – Number of attributes in X, must match series_length determined in fit.
- Returns
output – The class probabilities of all cases.
- Return type
array of shape = [n_instances, n_classes]
-
class
sktime.classification.interval_based.
SupervisedTimeSeriesForest
(n_estimators=500, n_jobs=1, random_state=None)[source]¶ Bases:
sklearn.ensemble._forest.ForestClassifier
,sktime.classification.base.BaseClassifier
Supervised Time Series Forest (STSF) classifier as described in [1].
A time series forest is an ensemble of decision trees built on intervals selected through a supervised process.
Overview: Input n series length m for each tree
sample X using class-balanced bagging sample intervals for all 3 representations and 7 features using supervised method find mean, median, std, slope, iqr, min and max using their corresponding interval for each rperesentation, concatenate to form new data set build decision tree on new data set
ensemble the trees with averaged probability estimates
n_estimators : int, ensemble size, optional (default = 200) n_jobs : int, optional (default=1) The number of jobs to run in parallel for both fit and predict.
-1
means using all processors. random_state : int, seed for random, optional (default = none)n_classes : int, extracted from the data classifiers : array of shape = [n_estimators] of DecisionTree classifiers intervals : array of shape = [n_estimators][3][7][n_intervals][2] stores indexes of all start and end points for all classifiers for each representaion and feature
..[1] Cabello, Nestor, et al. “Fast and Accurate Time Series Classification Through Supervised Interval Search.” IEEE ICDM 2020
Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/ java/tsml/classifiers/interval_based/STSF.java
-
fit
(X, y)[source]¶ Build a forest of trees from the training set (X, y) using supervised intervals and summary features :param X: :type X: array-like or sparse matrix of shape = [n_instances, :param series_length] or shape = [n_instances: The training input samples. If a Pandas data frame is passed it
must have a single column (i.e. univariate classification. STSF has no bespoke method for multivariate classification as yet.
- Parameters
n_columns] – The training input samples. If a Pandas data frame is passed it must have a single column (i.e. univariate classification. STSF has no bespoke method for multivariate classification as yet.
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
predict
(X)[source]¶ Find predictions for all cases in X. Built on top of predict_proba :param X: :type X: The training input samples. array-like or pandas data frame. :param If a Pandas data frame is passed: :param a check is performed that it only: :param has one column.: :param If not: :param an exception is thrown: :param since this classifier does not yet have: :param multivariate capability.:
- Returns
output
- Return type
array of shape = [n_test_instances]
-
predict_proba
(X)[source]¶ Find probability estimates for each class for all cases in X. :param X: :type X: The training input samples. array-like or sparse matrix of shape :param = [n_test_instances: If a Pandas data frame is passed (sktime format) a check is
performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.
- Parameters
series_length] – If a Pandas data frame is passed (sktime format) a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.
- Returns
output – Predicted probabilities
- Return type
nd.array of shape = (n_instances, n_classes)
-
-
class
sktime.classification.interval_based.
TimeSeriesForestClassifier
(min_interval=3, n_estimators=200, n_jobs=1, random_state=None)[source]¶ Bases:
sktime.series_as_features.base.estimators.interval_based._tsf.BaseTimeSeriesForest
,sklearn.ensemble._forest.ForestClassifier
,sktime.classification.base.BaseClassifier
Time series forest classifier.
- A time series forest is an ensemble of decision trees built on random intervals.
Overview: Input n series length m. For each tree
sample sqrt(m) intervals,
find mean, std and slope for each interval, concatenate to form new
data set, - build decision tree on new data set.
Ensemble the trees with averaged probability estimates.
This implementation deviates from the original in minor ways. It samples intervals with replacement and does not use the splitting criteria tiny refinement described in [1]. This is an intentionally stripped down, non configurable version for use as a hive-cote component. For a configurable tree based ensemble, see sktime.classifiers.ensemble.TimeSeriesForestClassifier
n_estimators : int, ensemble size, optional (default = 200) min_interval : int, minimum width of an interval, optional (default to 3) n_jobs : int, optional (default=1)
The number of jobs to run in parallel for both fit and predict.
-1
means using all processors.random_state : int, seed for random, optional (default = none)
n_classes : int n_intervals : int classes_ : List of classes for a given problem
- 1
H.Deng, G.Runger, E.Tuv and M.Vladimir, “A time series forest for
classification and feature extraction”,Information Sciences, 239, 2013 Java implementation https://github.com/uea-machine-learning/tsml/blob/master/src/main/ java/tsml/classifiers/interval_based/TSF.java Arxiv version of the paper: https://arxiv.org/abs/1302.2277