sktime.classification.distance_based¶
-
class
sktime.classification.distance_based.
ElasticEnsemble
(distance_measures='all', proportion_of_param_options=1.0, proportion_train_in_param_finding=1.0, proportion_train_for_test=1.0, n_jobs=None, random_state=0, verbose=0)[source]¶ Bases:
sktime.classification.base.BaseClassifier
The Elastic Ensemble (EE) as described in Jason Lines and Anthony Bagnall, “Time Series Classification with Ensembles of Elastic Distance Measures”, Data Mining and Knowledge Discovery, 29(3), 2015.
https://link.springer.com/article/10.1007/s10618-014-0361-2
Overview:
Input n series length m
EE is an ensemble of elastic nearest neighbor classifiers
Note
For the original Java version, see ElasticEnsemble.
- Parameters
distance_measures (list of strings, optional (default="all")) – A list of strings identifying which distance measures to include.
proportion_of_param_option (float, optional (default=1)) – The proportion of the parameter grid space to search optional.
proportion_train_in_param_finding (float, optional (default=1)) – The proportion of the train set to use in the parameter search optional.
proportion_train_for_test (float, optional (default=1)) – The proportion of the train set to use in classifying new cases optional.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.random_state (int, default=0) – The random seed.
verbose (int, default=0) – If
>0
, then prints out debug information.
-
fit
(X, y)[source]¶ Build an ensemble of 1-NN classifiers from the training set (X, y), :param X: The training input samples. If a Pandas data frame is passed,
it must have a single column. BOSS not configured to handle multivariate
- Parameters
y (array-like, shape = [n_instances] The class labels.) –
- Returns
self
- Return type
-
class
sktime.classification.distance_based.
KNeighborsTimeSeriesClassifier
(n_neighbors=1, weights='uniform', distance='dtw', distance_params=None, **kwargs)[source]¶ Bases:
sklearn.neighbors._classification.KNeighborsClassifier
,sktime.classification.base.BaseClassifier
An adapted version of the scikit-learn KNeighborsClassifier to work with time series data.
- Necessary changes required for time series data:
calls to X.shape in kneighbors, predict and predict_proba. In the base class, these methods contain:
n_samples, _ = X.shape
This however assumes that data must be 2d (a set of multivariate time series is 3d). Therefore these methods needed to be overridden to change this call to the following to support 3d data:
n_samples = X.shape[0]
check array has been disabled. This method allows nd data via an
- argument in the method header. However, there
seems to be no way to set this in the classifier and allow it to propagate down to the method. Therefore, this method has been temporarily disabled (and then re-enabled). It is unclear how to fix this issue without either writing a new classifier from scratch or changing the scikit-learn implementation. TO-DO: find permanent resolution to this issue (raise as an issue on sklearn GitHub?)
- Parameters
n_neighbors (int, set k for knn (default =1)) –
weights (mechanism for weighting a vote: 'uniform', 'distance') –
a callable function (or) –
algorithm (search method for neighbours {‘auto’, ‘ball_tree’,) –
‘kd_tree’ (default = 'brute') –
‘brute’} (default = 'brute') –
distance (distance measure for time series: {'dtw','ddtw',) –
'wdtw' (default ='dtw') –
'lcss' (default ='dtw') –
'erp' (default ='dtw') –
'msm' (default ='dtw') –
'twe'} (default ='dtw') –
distance_params (dictionary for metric parameters: default = None) –
-
fit
(X, y)[source]¶ Fit the model using X as training data and y as target values
- Parameters
X (sktime-format pandas dataframe with shape([n_cases,n_dimensions]),) –
numpy ndarray with shape([n_cases (or) –
n_readings –
n_dimensions]) –
y ({array-like, sparse matrix}) – Target values of shape = [n_samples]
-
kneighbors
(X, n_neighbors=None, return_distance=True)[source]¶ Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.
- Parameters
X (sktime-format pandas dataframe with shape([n_cases,n_dimensions]),) –
numpy ndarray with shape([n_cases (or) –
n_readings –
n_dimensions]) –
y ({array-like, sparse matrix}) – Target values of shape = [n_samples]
n_neighbors (int) – Number of neighbors to get (default is the value passed to the constructor).
return_distance (boolean, optional. Defaults to True.) – If False, distances will not be returned
- Returns
dist (array) – Array representing the lengths to points, only present if return_distance=True
ind (array) – Indices of the nearest points in the population matrix.
-
predict
(X)[source]¶ Predict the class labels for the provided data
- Parameters
X (sktime-format pandas dataframe or array-like, shape (n_query,) –
n_features) – Test samples.
(n_query (or) – Test samples.
if metric == 'precomputed' (n_indexed)) – Test samples.
- Returns
y – Class labels for each data sample.
- Return type
array of shape [n_samples] or [n_samples, n_outputs]
-
predict_proba
(X)[source]¶ Return probability estimates for the test data X.
- Parameters
X (sktime-format pandas dataframe or array-like, shape (n_query,) –
n_features) – Test samples.
(n_query (or) – Test samples.
if metric == 'precomputed' (n_indexed)) – Test samples.
- Returns
p – of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.
- Return type
array of shape = [n_samples, n_classes], or a list of n_outputs
-
class
sktime.classification.distance_based.
ProximityForest
(random_state=None, n_estimators=100, distance_measure=None, get_distance_measure=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, get_gain=<function gini_gain>, verbosity=0, max_depth=inf, is_leaf=<function pure>, n_jobs=1, n_stump_evaluations=5, find_stump=None, setup_distance_measure_getter=<function setup_all_distance_measure_getter>)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Proximity Forest class to model a decision tree forest which uses distance measures to partition data, see [1].
- Parameters
random_state (random, default = None) – seed for reproducibility
n_estimators (int, default=100) – The number of trees in the forest.
distance_measure (default = None) –
get_distance_measure (default=None,) – distance measure getters
get_exemplars (default=get_one_exemplar_per_class_proximity,) –
get_gain (default=gini_gain,) – function to score the quality of a split
verbosity (default=0,) – logging verbosity
max_depth (default=np.math.inf,) –
is_leaf (default=pure,) –
n_jobs (default=int, 1,) – number of jobs to run in parallel *across threads”
n_stump_evaluations (int, default=5,) –
find_stump (default=None,) – function to find the best split of data
setup_distance_measure_getter=setup_all_distance_measure_getter –
:param : :param setup_distance_measure_getter: :type setup_distance_measure_getter: function to setup the distance
Notes
- ..[1] Ben Lucas et al., “Proximity Forest: an effective and scalable distance-based
classifier for time series”,Data Mining and Knowledge Discovery, 33(3): 607-635, 2019 https://arxiv.org/abs/1808.10594
Java wrapper of authors original https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/distance_based/ProximityForestWrapper.java Java version https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/distance_based/proximity/ProximityForest.java
-
fit
(X, y)[source]¶ - Xarray-like or sparse matrix of shape = [n_instances, n_columns]
The training input samples. If a Pandas data frame is passed, column 0 is extracted.
- yarray-like, shape = [n_instances]
The class labels.
- Returns
self
- Return type
-
predict_proba
(X)[source]¶ Find probability estimates for each class for all cases in X. :param X: The training input samples.
If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.
- Returns
output
- Return type
array of shape = [n_instances, n_classes] of probabilities
-
class
sktime.classification.distance_based.
ProximityStump
(random_state=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, setup_distance_measure=<function setup_all_distance_measure_getter>, get_distance_measure=None, distance_measure=None, get_gain=<function gini_gain>, verbosity=0, n_jobs=1)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Proximity Stump class to model a decision stump which uses a distance measure to partition data.
-
class value list
-
getters from dataframe and class value list
-
distance_to_exemplars
(X)[source]¶ find distance to exemplars :param X: the dataset containing a list of instances :return: 2d numpy array of distances from each instance to each exemplar (instance by exemplar)
-
find_closest_exemplar_indices
(X)[source]¶ find the closest exemplar index for each instance in a dataframe :param X: the dataframe containing instances :return: 1d numpy array of indices, one for each instance, reflecting the index of the closest exemplar
-
fit
(X, y)[source]¶ - Xarray-like or sparse matrix of shape = [n_instances, n_columns]
The training input samples. If a Pandas data frame is passed, column 0 is extracted.
- yarray-like, shape = [n_instances]
The class labels.
- Returns
self
- Return type
-
predict_proba
(X)[source]¶ Find probability estimates for each class for all cases in X. :param X: The training input samples.
If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.
- Returns
output
- Return type
array of shape = [n_instances, n_classes] of probabilities
-
-
class
sktime.classification.distance_based.
ProximityTree
(random_state=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, distance_measure=None, get_distance_measure=None, setup_distance_measure=<function setup_all_distance_measure_getter>, get_gain=<function gini_gain>, max_depth=inf, is_leaf=<function pure>, verbosity=0, n_jobs=1, n_stump_evaluations=5, find_stump=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
Proximity Tree class to model a decision tree which uses distance measures to partition data.
- @article{lucas19proximity,
title={Proximity Forest: an effective and scalable distance-based classifier for time series}, author={B. Lucas and A. Shifaz and C. Pelletier and L. O’Neill and N. Zaidi and B. Goethals and F. Petitjean and G. Webb}, journal={Data Mining and Knowledge Discovery}, volume={33}, number={3}, pages={607–635}, year={2019}
} https://arxiv.org/abs/1808.10594
-
class value list
-
getters from dataframe and class value list
-
therefore can have a depth of >=0
-
fit
(X, y)[source]¶ - Xarray-like or sparse matrix of shape = [n_instances, n_columns]
The training input samples. If a Pandas data frame is passed, column 0 is extracted.
- yarray-like, shape = [n_instances]
The class labels.
- Returns
self
- Return type
-
predict_proba
(X)[source]¶ Find probability estimates for each class for all cases in X. :param X: The training input samples.
If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.
- Returns
output
- Return type
array of shape = [n_instances, n_classes] of probabilities
-
class
sktime.classification.distance_based.
ShapeDTW
(n_neighbors=1, subsequence_length=30, shape_descriptor_function='raw', shape_descriptor_functions=['raw', 'derivative'], metric_params=None)[source]¶ Bases:
sktime.classification.base.BaseClassifier
The ShapeDTW classifier works by initially extracting a set of subsequences describing local neighbourhoods around each data point in a time series. These subsequences are then passed into a shape descriptor function that transforms these local neighbourhoods into a new representation. This new representation is then sent into DTW with 1-NN.
- Parameters
The possible shape descriptor functions are as follows:
- ‘raw’use the raw subsequence as the
shape descriptor function.
params = None
- ‘paa’use PAA as the shape descriptor function.
params = num_intervals_paa (default=8)
- ‘dwt’use DWT (Discrete Wavelet Transform)
as the shape descriptor function.
params = num_levels_dwt (default=3)
- ‘slope’use the gradient of each subsequence
fitted by a total least squares regression as the shape descriptor function.
params = num_intervals_slope (default=8)
- ‘derivative’use the derivative of each subsequence
as the shape descriptor function.
params = None
- ‘hog1d’use a histogram of gradients in one
dimension as the shape desciptor function.
- params = num_intervals_hog1d
(defualt=2)
- = num_bins_hod1d
(default=8)
- = scaling_factor_hog1d
(default=0.1)
- ‘compound’use a combination of two shape
descriptors simultaneously.
- params = weighting_factor
- (default=None)
Defines how to scale values of a shape descriptor. If a value is not given, this value is tuned by 10-fold cross-validation on the training data.
- shape_descriptor_functionsstring list, only applicable when the
shape_descriptor_function is set to ‘compound’. Use a list of shape descriptor functions at the same time. (default = [‘raw’,’derivative’])
- metric_paramsdictionary for metric parameters
(default = None).
Notes
- ..[1] Jiaping Zhao and Laurent Itti, “shapeDTW: Shape Dynamic Time Warping”,
Pattern Recognition, 74, pp 171-184, 2018 http://www.sciencedirect.com/science/article/pii/S0031320317303710,
-
fit
(X, y)[source]¶ Method to perform training on the classifier.
- Parameters
- pandas dataframe of training data of shape [n_instances (X) –
1] –
- list of class labels of shape [n_instances] (y) –
- Returns
self
- Return type
the shapeDTW object
-
predict
(X)[source]¶ Find predictions for all cases in X. Could do a wrap function for predict_proba, but this will do for now. ———- X : The testing input samples of shape [n_instances,1].
- Returns
output
- Return type
numpy array of shape = [n_instances]
-
predict_proba
(X)[source]¶ Function to perform predictions on the testing data X. This function returns the probabilities for each class.
- Parameters
- pandas dataframe of testing data of shape [n_instances (X) –
1] –
- Returns
output – [n_instances, num_classes] of probabilities
- Return type
numpy array of shape =