sktime.forecasting.model_selection¶
-
class
sktime.forecasting.model_selection.
CutoffSplitter
(cutoffs, fh=1, window_length=10)[source]¶ Bases:
sktime.forecasting.model_selection._split.BaseSplitter
Manual window splitter to split time series at given cutoff points.
- Parameters
-
class
sktime.forecasting.model_selection.
ExpandingWindowSplitter
(fh=1, window_length=10, step_length=1, initial_window=None, start_with_window=False)[source]¶ Bases:
sktime.forecasting.model_selection._split.BaseWindowSplitter
Expanding window splitter
- Parameters
Examples
For example for window_length = 5, step_length = 1 and fh = 3 here is a representation of the folds:
|-----------------------| | * * * * * x x x - - - | | * * * * * * x x x - - | | * * * * * * * x x x - | | * * * * * * * * x x x |
*
= training fold.x
= test fold.
-
class
sktime.forecasting.model_selection.
ForecastingGridSearchCV
(forecaster, cv, param_grid, scoring=None, n_jobs=None, refit=True, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]¶ Bases:
sktime.forecasting.model_selection._tune.BaseGridSearch
Performs grid-search cross-validation to find optimal model parameters. The forecaster is fit on the initial window and then temporal cross-validation is used to find the optimal parameter
Grid-search cross-validation is performed based on a cross-validation iterator encoding the cross-validation scheme, the parameter grid to search over, and (optionally) the evaluation metric for comparing model performance. As in scikit-learn, tuning works through the common hyper-parameter interface which allows to repeatedly fit and evaluate the same forecaster with different hyper-parameters.
- Parameters
forecaster (estimator object) – The estimator should implement the sktime or scikit-learn estimator interface. Either the estimator must contain a “score” function, or a scoring function must be passed.
cv (cross-validation generator or an iterable) – e.g. SlidingWindowSplitter()
param_grid (dict or list of dictionaries) – Model tuning parameters of the forecaster to evaluate
scoring (function, optional (default=None)) – Function to score models for evaluation of optimal parameters
n_jobs (int, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
refit (bool, optional (default=True)) – Refit the forecaster with the best parameters on all the data
verbose (int, optional (default=0)) –
pre_dispatch (str, optional (default='2*n_jobs')) –
error_score (numeric value or the str 'raise', optional (default=np.nan)) – The test score returned when a forecaster fails to be fitted.
return_train_score (bool, optional (default=False)) –
-
class
sktime.forecasting.model_selection.
ForecastingRandomizedSearchCV
(forecaster, cv, param_distributions, n_iter=10, scoring=None, n_jobs=None, refit=True, verbose=0, random_state=None, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)[source]¶ Bases:
sktime.forecasting.model_selection._tune.BaseGridSearch
Performs randomized-search cross-validation to find optimal model parameters. The forecaster is fit on the initial window and then temporal cross-validation is used to find the optimal parameter
Randomized cross-validation is performed based on a cross-validation iterator encoding the cross-validation scheme, the parameter distributions to search over, and (optionally) the evaluation metric for comparing model performance. As in scikit-learn, tuning works through the common hyper-parameter interface which allows to repeatedly fit and evaluate the same forecaster with different hyper-parameters.
- Parameters
forecaster (estimator object) – The estimator should implement the sktime or scikit-learn estimator interface. Either the estimator must contain a “score” function, or a scoring function must be passed.
cv (cross-validation generator or an iterable) – e.g. SlidingWindowSplitter()
param_distributions (dict or list of dicts) – Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. Distributions must provide a
rvs
method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.n_iter (int, default=10) – Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.
scoring (function, optional (default=None)) – Function to score models for evaluation of optimal parameters
n_jobs (int, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.
refit (bool, optional (default=True)) – Refit the forecaster with the best parameters on all the data
verbose (int, optional (default=0)) –
random_state (int, RandomState instance or None, default=None) – Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls.
pre_dispatch (str, optional (default='2*n_jobs')) –
error_score (numeric value or the str 'raise', optional (default=np.nan)) – The test score returned when a forecaster fails to be fitted.
return_train_score (bool, optional (default=False)) –
-
class
sktime.forecasting.model_selection.
SingleWindowSplitter
(fh, window_length=None)[source]¶ Bases:
sktime.forecasting.model_selection._split.BaseWindowSplitter
Single window splitter
Split time series once into a training and test window.
-
get_cutoffs
(y=None)[source]¶ Get the cutoff time points.
- Parameters
y (pd.Series or pd.Index, optional (default=None)) –
- Returns
cutoffs
- Return type
np.array
-
get_n_splits
(y=None)[source]¶ Return number of splits
- Parameters
y (pd.Series, optional (default=None)) –
- Returns
n_splits
- Return type
-
split_initial
(y)[source]¶ Split initial window
This is useful during forecasting model selection where we want to fit the forecaster on some part of the data first before doing temporal cross-validation
- Parameters
y (pd.Series) –
- Returns
intial_training_window (np.array)
initial_test_window (np.array)
-
-
class
sktime.forecasting.model_selection.
SlidingWindowSplitter
(fh=1, window_length=10, step_length=1, initial_window=None, start_with_window=False)[source]¶ Bases:
sktime.forecasting.model_selection._split.BaseWindowSplitter
Sliding window splitter
- Parameters
Examples
For example for window_length = 5, step_length = 1 and fh = 3 here is a representation of the folds:
|-----------------------| | * * * * * x x x - - - | | - * * * * * x x x - - | | - - * * * * * x x x - | | - - - * * * * * x x x |
*
= training fold.x
= test fold.
-
sktime.forecasting.model_selection.
temporal_train_test_split
(y, X=None, test_size=None, train_size=None, fh=None)[source]¶ Split arrays or matrices into sequential train and test subsets Creates train/test splits over endogenous arrays an optional exogenous arrays. This is a wrapper of scikit-learn’s
train_test_split
that does not shuffle.- Parameters
*series (sequence of pd.Series with same length / shape[0]) –
test_size (float, int or None, optional (default=None)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the relative number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.train_size (float, int, or None, (default=None)) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the relative number of train samples. If None, the value is automatically set to the complement of the test size.
fh (ForecastingHorizon) –
- Returns
splitting – List containing train-test split of inputs.
- Return type
list, length=2 * len(arrays)
References
..[1] adapted from https://github.com/alkaline-ml/pmdarima/