sktime.forecasting.model_selection

class sktime.forecasting.model_selection.CutoffSplitter(cutoffs, fh=1, window_length=10)[source]

Bases: sktime.forecasting.model_selection._split.BaseSplitter

Manual window splitter to split time series at given cutoff points.

Parameters
  • cutoffs (np.array) – cutoff points, positive and integer-index like, usable with pandas .iloc[] indexing

  • fh (int, list or np.array) –

  • window_length (int) –

get_cutoffs(y=None)[source]

Return the cutoff points

get_n_splits(y=None)[source]

Return the number of splits

class sktime.forecasting.model_selection.ExpandingWindowSplitter(fh=1, window_length=10, step_length=1, initial_window=None, start_with_window=False)[source]

Bases: sktime.forecasting.model_selection._split.BaseWindowSplitter

Expanding window splitter

Parameters
  • fh (int, list or np.array) – Forecasting horizon

  • window_length (int) –

  • step_length (int) –

  • initial_window (int) –

  • start_with_window (bool, optional (default=False)) –

Examples

For example for window_length = 5, step_length = 1 and fh = 3 here is a representation of the folds:

|-----------------------|
| * * * * * x x x - - - |
| * * * * * * x x x - - |
| * * * * * * * x x x - |
| * * * * * * * * x x x |

* = training fold.

x = test fold.

class sktime.forecasting.model_selection.ForecastingGridSearchCV(forecaster, cv, param_grid, scoring=None, n_jobs=None, refit=True, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]

Bases: sktime.forecasting.model_selection._tune.BaseGridSearch

Performs grid-search cross-validation to find optimal model parameters. The forecaster is fit on the initial window and then temporal cross-validation is used to find the optimal parameter

Grid-search cross-validation is performed based on a cross-validation iterator encoding the cross-validation scheme, the parameter grid to search over, and (optionally) the evaluation metric for comparing model performance. As in scikit-learn, tuning works through the common hyper-parameter interface which allows to repeatedly fit and evaluate the same forecaster with different hyper-parameters.

Parameters
  • forecaster (estimator object) – The estimator should implement the sktime or scikit-learn estimator interface. Either the estimator must contain a “score” function, or a scoring function must be passed.

  • cv (cross-validation generator or an iterable) – e.g. SlidingWindowSplitter()

  • param_grid (dict or list of dictionaries) – Model tuning parameters of the forecaster to evaluate

  • scoring (function, optional (default=None)) – Function to score models for evaluation of optimal parameters

  • n_jobs (int, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • refit (bool, optional (default=True)) – Refit the forecaster with the best parameters on all the data

  • verbose (int, optional (default=0)) –

  • pre_dispatch (str, optional (default='2*n_jobs')) –

  • error_score (numeric value or the str 'raise', optional (default=np.nan)) – The test score returned when a forecaster fails to be fitted.

  • return_train_score (bool, optional (default=False)) –

best_index_[source]
Type

int

best_score_[source]

Score of the best model

Type

float

best_params_[source]

Best parameter values across the parameter grid

Type

dict

best_forecaster_[source]

Fitted estimator with the best parameters

Type

estimator

cv_results_[source]

Results from grid search cross validation

Type

dict

n_splits_[source]

Number of splits in the data for cross validation}

Type

int

refit_time_[source]

Time (seconds) to refit the best forecaster

Type

float

scorer_[source]

Function used to score model

Type

function

class sktime.forecasting.model_selection.ForecastingRandomizedSearchCV(forecaster, cv, param_distributions, n_iter=10, scoring=None, n_jobs=None, refit=True, verbose=0, random_state=None, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]

Bases: sktime.forecasting.model_selection._tune.BaseGridSearch

Performs randomized-search cross-validation to find optimal model parameters. The forecaster is fit on the initial window and then temporal cross-validation is used to find the optimal parameter

Randomized cross-validation is performed based on a cross-validation iterator encoding the cross-validation scheme, the parameter distributions to search over, and (optionally) the evaluation metric for comparing model performance. As in scikit-learn, tuning works through the common hyper-parameter interface which allows to repeatedly fit and evaluate the same forecaster with different hyper-parameters.

Parameters
  • forecaster (estimator object) – The estimator should implement the sktime or scikit-learn estimator interface. Either the estimator must contain a “score” function, or a scoring function must be passed.

  • cv (cross-validation generator or an iterable) – e.g. SlidingWindowSplitter()

  • param_distributions (dict or list of dicts) – Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.

  • n_iter (int, default=10) – Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

  • scoring (function, optional (default=None)) – Function to score models for evaluation of optimal parameters

  • n_jobs (int, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • refit (bool, optional (default=True)) – Refit the forecaster with the best parameters on all the data

  • verbose (int, optional (default=0)) –

  • random_state (int, RandomState instance or None, default=None) – Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls.

  • pre_dispatch (str, optional (default='2*n_jobs')) –

  • error_score (numeric value or the str 'raise', optional (default=np.nan)) – The test score returned when a forecaster fails to be fitted.

  • return_train_score (bool, optional (default=False)) –

best_index_[source]
Type

int

best_score_[source]

Score of the best model

Type

float

best_params_[source]

Best parameter values across the parameter grid

Type

dict

best_forecaster_[source]

Fitted estimator with the best parameters

Type

estimator

cv_results_[source]

Results from grid search cross validation

Type

dict

n_splits_[source]

Number of splits in the data for cross validation}

Type

int

refit_time_[source]

Time (seconds) to refit the best forecaster

Type

float

scorer_[source]

Function used to score model

Type

function

class sktime.forecasting.model_selection.SingleWindowSplitter(fh, window_length=None)[source]

Bases: sktime.forecasting.model_selection._split.BaseWindowSplitter

Single window splitter

Split time series once into a training and test window.

Parameters
  • fh (int, list or np.array) –

  • window_length (int) –

get_cutoffs(y=None)[source]

Get the cutoff time points.

Parameters

y (pd.Series or pd.Index, optional (default=None)) –

Returns

cutoffs

Return type

np.array

get_n_splits(y=None)[source]

Return number of splits

Parameters

y (pd.Series, optional (default=None)) –

Returns

n_splits

Return type

int

split_initial(y)[source]

Split initial window

This is useful during forecasting model selection where we want to fit the forecaster on some part of the data first before doing temporal cross-validation

Parameters

y (pd.Series) –

Returns

  • intial_training_window (np.array)

  • initial_test_window (np.array)

class sktime.forecasting.model_selection.SlidingWindowSplitter(fh=1, window_length=10, step_length=1, initial_window=None, start_with_window=False)[source]

Bases: sktime.forecasting.model_selection._split.BaseWindowSplitter

Sliding window splitter

Parameters
  • fh (int, list or np.array) – Forecasting horizon

  • window_length (int) –

  • step_length (int) –

  • initial_window (int) –

  • start_with_window (bool, optional (default=False)) –

Examples

For example for window_length = 5, step_length = 1 and fh = 3 here is a representation of the folds:

|-----------------------|
| * * * * * x x x - - - |
| - * * * * * x x x - - |
| - - * * * * * x x x - |
| - - - * * * * * x x x |

* = training fold.

x = test fold.

sktime.forecasting.model_selection.temporal_train_test_split(y, X=None, test_size=None, train_size=None, fh=None)[source]

Split arrays or matrices into sequential train and test subsets Creates train/test splits over endogenous arrays an optional exogenous arrays. This is a wrapper of scikit-learn’s train_test_split that does not shuffle.

Parameters
  • *series (sequence of pd.Series with same length / shape[0]) –

  • test_size (float, int or None, optional (default=None)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the relative number of test samples. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25.

  • train_size (float, int, or None, (default=None)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the relative number of train samples. If None, the value is automatically set to the complement of the test size.

  • fh (ForecastingHorizon) –

Returns

splitting – List containing train-test split of inputs.

Return type

list, length=2 * len(arrays)

References

..[1] adapted from https://github.com/alkaline-ml/pmdarima/