sktime.classification.distance_based

class sktime.classification.distance_based.ElasticEnsemble(distance_measures='all', proportion_of_param_options=1.0, proportion_train_in_param_finding=1.0, proportion_train_for_test=1.0, n_jobs=None, random_state=0, verbose=0)[source]

Bases: sktime.classification.base.BaseClassifier

The Elastic Ensemble (EE) as described in Jason Lines and Anthony Bagnall, “Time Series Classification with Ensembles of Elastic Distance Measures”, Data Mining and Knowledge Discovery, 29(3), 2015.

https://link.springer.com/article/10.1007/s10618-014-0361-2

Overview:

  • Input n series length m

  • EE is an ensemble of elastic nearest neighbor classifiers

Note

For the original Java version, see ElasticEnsemble.

Parameters
  • distance_measures (list of strings, optional (default="all")) – A list of strings identifying which distance measures to include.

  • proportion_of_param_option (float, optional (default=1)) – The proportion of the parameter grid space to search optional.

  • proportion_train_in_param_finding (float, optional (default=1)) – The proportion of the train set to use in the parameter search optional.

  • proportion_train_for_test (float, optional (default=1)) – The proportion of the train set to use in classifying new cases optional.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel for both fit and predict. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • random_state (int, default=0) – The random seed.

  • verbose (int, default=0) – If >0, then prints out debug information.

estimators_[source]

A list storing all classifiers

Type

list

train_accs_by_classifier[source]

Store the train accuracies of the classifiers

Type

ndarray

train_preds_by_classifier[source]

Store the train predictions of each classifier

Type

list

capabilities = {'missing_values': False, 'multivariate': False, 'unequal_length': False}[source]
fit(X, y)[source]

Build an ensemble of 1-NN classifiers from the training set (X, y), :param X: The training input samples. If a Pandas data frame is passed,

it must have a single column. BOSS not configured to handle multivariate

Parameters

y (array-like, shape = [n_instances] The class labels.) –

Returns

self

Return type

object

get_metric_params()[source]
get_train_probs(X=None)[source]
predict(X, return_preds_and_probas=False)[source]
Parameters
  • X (panda dataframe) – instances of the dataset

  • ----

Returns

predictions – array of predictions of each instance (class value)

Return type

1d numpy array

predict_proba(X)[source]
write_constituent_train_files(output_file_path, dataset_name, actual_y)[source]
class sktime.classification.distance_based.KNeighborsTimeSeriesClassifier(n_neighbors=1, weights='uniform', distance='dtw', distance_params=None, **kwargs)[source]

Bases: sklearn.neighbors._classification.KNeighborsClassifier, sktime.classification.base.BaseClassifier

An adapted version of the scikit-learn KNeighborsClassifier to work with time series data.

Necessary changes required for time series data:
  • calls to X.shape in kneighbors, predict and predict_proba. In the base class, these methods contain:

    n_samples, _ = X.shape

    This however assumes that data must be 2d (a set of multivariate time series is 3d). Therefore these methods needed to be overridden to change this call to the following to support 3d data:

    n_samples = X.shape[0]

  • check array has been disabled. This method allows nd data via an

argument in the method header. However, there

seems to be no way to set this in the classifier and allow it to propagate down to the method. Therefore, this method has been temporarily disabled (and then re-enabled). It is unclear how to fix this issue without either writing a new classifier from scratch or changing the scikit-learn implementation. TO-DO: find permanent resolution to this issue (raise as an issue on sklearn GitHub?)

Parameters
  • n_neighbors (int, set k for knn (default =1)) –

  • weights (mechanism for weighting a vote: 'uniform', 'distance') –

  • a callable function (or) –

  • algorithm (search method for neighbours {‘auto’, ‘ball_tree’,) –

  • ‘kd_tree’ (default = 'brute') –

  • ‘brute’} (default = 'brute') –

  • distance (distance measure for time series: {'dtw','ddtw',) –

  • 'wdtw' (default ='dtw') –

  • 'lcss' (default ='dtw') –

  • 'erp' (default ='dtw') –

  • 'msm' (default ='dtw') –

  • 'twe'} (default ='dtw') –

  • distance_params (dictionary for metric parameters: default = None) –

capabilities = {'missing_values': False, 'multivariate': True, 'unequal_length': False}[source]
fit(X, y)[source]

Fit the model using X as training data and y as target values

Parameters
  • X (sktime-format pandas dataframe with shape([n_cases,n_dimensions]),) –

  • numpy ndarray with shape([n_cases (or) –

  • n_readings

  • n_dimensions])

  • y ({array-like, sparse matrix}) – Target values of shape = [n_samples]

kneighbors(X, n_neighbors=None, return_distance=True)[source]

Finds the K-neighbors of a point. Returns indices of and distances to the neighbors of each point.

Parameters
  • X (sktime-format pandas dataframe with shape([n_cases,n_dimensions]),) –

  • numpy ndarray with shape([n_cases (or) –

  • n_readings

  • n_dimensions])

  • y ({array-like, sparse matrix}) – Target values of shape = [n_samples]

  • n_neighbors (int) – Number of neighbors to get (default is the value passed to the constructor).

  • return_distance (boolean, optional. Defaults to True.) – If False, distances will not be returned

Returns

  • dist (array) – Array representing the lengths to points, only present if return_distance=True

  • ind (array) – Indices of the nearest points in the population matrix.

predict(X)[source]

Predict the class labels for the provided data

Parameters
  • X (sktime-format pandas dataframe or array-like, shape (n_query,) –

  • n_features) – Test samples.

  • (n_query (or) – Test samples.

  • if metric == 'precomputed' (n_indexed)) – Test samples.

Returns

y – Class labels for each data sample.

Return type

array of shape [n_samples] or [n_samples, n_outputs]

predict_proba(X)[source]

Return probability estimates for the test data X.

Parameters
  • X (sktime-format pandas dataframe or array-like, shape (n_query,) –

  • n_features) – Test samples.

  • (n_query (or) – Test samples.

  • if metric == 'precomputed' (n_indexed)) – Test samples.

Returns

p – of such arrays if n_outputs > 1. The class probabilities of the input samples. Classes are ordered by lexicographic order.

Return type

array of shape = [n_samples, n_classes], or a list of n_outputs

class sktime.classification.distance_based.ProximityForest(random_state=None, n_estimators=100, distance_measure=None, get_distance_measure=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, get_gain=<function gini_gain>, verbosity=0, max_depth=inf, is_leaf=<function pure>, n_jobs=1, n_stump_evaluations=5, find_stump=None, setup_distance_measure_getter=<function setup_all_distance_measure_getter>)[source]

Bases: sktime.classification.base.BaseClassifier

Proximity Forest class to model a decision tree forest which uses distance measures to partition data, see [1].

Parameters
  • random_state (random, default = None) – seed for reproducibility

  • n_estimators (int, default=100) – The number of trees in the forest.

  • distance_measure (default = None) –

  • get_distance_measure (default=None,) – distance measure getters

  • get_exemplars (default=get_one_exemplar_per_class_proximity,) –

  • get_gain (default=gini_gain,) – function to score the quality of a split

  • verbosity (default=0,) – logging verbosity

  • max_depth (default=np.math.inf,) –

  • is_leaf (default=pure,) –

  • n_jobs (default=int, 1,) – number of jobs to run in parallel *across threads”

  • n_stump_evaluations (int, default=5,) –

  • find_stump (default=None,) – function to find the best split of data

  • setup_distance_measure_getter=setup_all_distance_measure_getter

:param : :param setup_distance_measure_getter: :type setup_distance_measure_getter: function to setup the distance

label_encoder[source]
Type

label encoder to change string labels to numeric indices

classes_[source]
Type

unique list of classes

get_exemplars[source]

class value list

Type

function to extract exemplars from a dataframe and

max_depth[source]
Type

max tree depth

X[source]
Type

train data

y[source]
Type

train data labels

trees[source]
Type

list of trees in the forest

Notes

..[1] Ben Lucas et al., “Proximity Forest: an effective and scalable distance-based

classifier for time series”,Data Mining and Knowledge Discovery, 33(3): 607-635, 2019 https://arxiv.org/abs/1808.10594

Java wrapper of authors original https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/distance_based/ProximityForestWrapper.java Java version https://github.com/uea-machine-learning/tsml/blob/master/src/main/java/tsml/ classifiers/distance_based/proximity/ProximityForest.java

capabilities = {'missing_values': False, 'multivariate': False, 'unequal_length': False}[source]
fit(X, y)[source]
Xarray-like or sparse matrix of shape = [n_instances, n_columns]

The training input samples. If a Pandas data frame is passed, column 0 is extracted.

yarray-like, shape = [n_instances]

The class labels.

Returns

self

Return type

object

predict_proba(X)[source]

Find probability estimates for each class for all cases in X. :param X: The training input samples.

If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.

Returns

output

Return type

array of shape = [n_instances, n_classes] of probabilities

class sktime.classification.distance_based.ProximityStump(random_state=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, setup_distance_measure=<function setup_all_distance_measure_getter>, get_distance_measure=None, distance_measure=None, get_gain=<function gini_gain>, verbosity=0, n_jobs=1)[source]

Bases: sktime.classification.base.BaseClassifier

Proximity Stump class to model a decision stump which uses a distance measure to partition data.

label_encoder[source]

label encoder to change string labels to numeric indices

y_exemplar[source]

class label list of the exemplar instances

X_exemplar[source]

dataframe of the exemplar instances

X_branches[source]

dataframes for each branch, one per exemplar

y_branches[source]

class label list for each branch, one per exemplar

classes_[source]

unique list of classes

entropy[source]

the gain associated with the split of data

random_state[source]

the random state

get_exemplars[source]

function to extract exemplars from a dataframe and

class value list
setup_distance_measure[source]

function to setup the distance measure

getters from dataframe and class value list
get_distance_measure[source]

distance measure getters

distance_measure[source]

distance measures

get_gain[source]

function to score the quality of a split

verbosity[source]

logging verbosity

n_jobs[source]

number of jobs to run in parallel *across threads”

distance_to_exemplars(X)[source]

find distance to exemplars :param X: the dataset containing a list of instances :return: 2d numpy array of distances from each instance to each exemplar (instance by exemplar)

find_closest_exemplar_indices(X)[source]

find the closest exemplar index for each instance in a dataframe :param X: the dataframe containing instances :return: 1d numpy array of indices, one for each instance, reflecting the index of the closest exemplar

fit(X, y)[source]
Xarray-like or sparse matrix of shape = [n_instances, n_columns]

The training input samples. If a Pandas data frame is passed, column 0 is extracted.

yarray-like, shape = [n_instances]

The class labels.

Returns

self

Return type

object

grow()[source]

grow the stump, creating branches for each exemplar :return: self

predict_proba(X)[source]

Find probability estimates for each class for all cases in X. :param X: The training input samples.

If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.

Returns

output

Return type

array of shape = [n_instances, n_classes] of probabilities

class sktime.classification.distance_based.ProximityTree(random_state=None, get_exemplars=<function get_one_exemplar_per_class_proximity>, distance_measure=None, get_distance_measure=None, setup_distance_measure=<function setup_all_distance_measure_getter>, get_gain=<function gini_gain>, max_depth=inf, is_leaf=<function pure>, verbosity=0, n_jobs=1, n_stump_evaluations=5, find_stump=None)[source]

Bases: sktime.classification.base.BaseClassifier

Proximity Tree class to model a decision tree which uses distance measures to partition data.

@article{lucas19proximity,

title={Proximity Forest: an effective and scalable distance-based classifier for time series}, author={B. Lucas and A. Shifaz and C. Pelletier and L. O’Neill and N. Zaidi and B. Goethals and F. Petitjean and G. Webb}, journal={Data Mining and Knowledge Discovery}, volume={33}, number={3}, pages={607–635}, year={2019}

} https://arxiv.org/abs/1808.10594

label_encoder[source]

label encoder to change string labels to numeric indices

classes_[source]

unique list of classes

random_state[source]

the random state

get_exemplars[source]

function to extract exemplars from a dataframe and

class value list
setup_distance_measure[source]

function to setup the distance measure

getters from dataframe and class value list
get_distance_measure[source]

distance measure getters

distance_measure[source]

distance measures

get_gain[source]

function to score the quality of a split

verbosity[source]

logging verbosity

n_jobs[source]

number of jobs to run in parallel *across threads”

find_stump[source]

function to find the best split of data

max_depth[source]

max tree depth

depth[source]

current depth of tree, as each node is a tree itself,

therefore can have a depth of >=0
X[source]

train data

y[source]

train data labels

stump[source]

the stump used to split data at this node

branches[source]

the partitions of data driven by the stump

fit(X, y)[source]
Xarray-like or sparse matrix of shape = [n_instances, n_columns]

The training input samples. If a Pandas data frame is passed, column 0 is extracted.

yarray-like, shape = [n_instances]

The class labels.

Returns

self

Return type

object

predict_proba(X)[source]

Find probability estimates for each class for all cases in X. :param X: The training input samples.

If a Pandas data frame is passed (sktime format) If a Pandas data frame is passed, a check is performed that it only has one column. If not, an exception is thrown, since this classifier does not yet have multivariate capability.

Returns

output

Return type

array of shape = [n_instances, n_classes] of probabilities

class sktime.classification.distance_based.ShapeDTW(n_neighbors=1, subsequence_length=30, shape_descriptor_function='raw', shape_descriptor_functions=['raw', 'derivative'], metric_params=None)[source]

Bases: sktime.classification.base.BaseClassifier

The ShapeDTW classifier works by initially extracting a set of subsequences describing local neighbourhoods around each data point in a time series. These subsequences are then passed into a shape descriptor function that transforms these local neighbourhoods into a new representation. This new representation is then sent into DTW with 1-NN.

Parameters
  • n_neighbours (int, int, set k for knn (default =1)) –

  • subsequence_length (int, defines the length of the) – subsequences(default=sqrt(n_timepoints)).

  • shape_descriptor_function (string, defines the function to describe) – the set of subsequences (default = ‘raw’).

The possible shape descriptor functions are as follows:

  • ‘raw’use the raw subsequence as the

    shape descriptor function.

    • params = None

  • ‘paa’use PAA as the shape descriptor function.
    • params = num_intervals_paa (default=8)

  • ‘dwt’use DWT (Discrete Wavelet Transform)

    as the shape descriptor function.

    • params = num_levels_dwt (default=3)

  • ‘slope’use the gradient of each subsequence

    fitted by a total least squares regression as the shape descriptor function.

    • params = num_intervals_slope (default=8)

  • ‘derivative’use the derivative of each subsequence

    as the shape descriptor function.

    • params = None

  • ‘hog1d’use a histogram of gradients in one

    dimension as the shape desciptor function.

    • params = num_intervals_hog1d

      (defualt=2)

      = num_bins_hod1d

      (default=8)

      = scaling_factor_hog1d

      (default=0.1)

  • ‘compound’use a combination of two shape

    descriptors simultaneously.

    • params = weighting_factor
      (default=None)

      Defines how to scale values of a shape descriptor. If a value is not given, this value is tuned by 10-fold cross-validation on the training data.

shape_descriptor_functionsstring list, only applicable when the

shape_descriptor_function is set to ‘compound’. Use a list of shape descriptor functions at the same time. (default = [‘raw’,’derivative’])

metric_paramsdictionary for metric parameters

(default = None).

Notes

..[1] Jiaping Zhao and Laurent Itti, “shapeDTW: Shape Dynamic Time Warping”,

Pattern Recognition, 74, pp 171-184, 2018 http://www.sciencedirect.com/science/article/pii/S0031320317303710,

capabilities = {'missing_values': False, 'multivariate': False, 'unequal_length': False}[source]
fit(X, y)[source]

Method to perform training on the classifier.

Parameters
  • - pandas dataframe of training data of shape [n_instances (X) –

  • 1]

  • - list of class labels of shape [n_instances] (y) –

Returns

self

Return type

the shapeDTW object

predict(X)[source]

Find predictions for all cases in X. Could do a wrap function for predict_proba, but this will do for now. ———- X : The testing input samples of shape [n_instances,1].

Returns

output

Return type

numpy array of shape = [n_instances]

predict_proba(X)[source]

Function to perform predictions on the testing data X. This function returns the probabilities for each class.

Parameters
  • - pandas dataframe of testing data of shape [n_instances (X) –

  • 1]

Returns

output – [n_instances, num_classes] of probabilities

Return type

numpy array of shape =