sktime.benchmarking.evaluation

class sktime.benchmarking.evaluation.Evaluator(results)[source]

Bases: object

Analyze results of machine learning experiments.

evaluate(metric, train_or_test='test', cv_fold='all')[source]

Calculates the average prediction error per estimator as well as the prediction error achieved by each estimator on individual datasets.

fit_runtime(unit='s', train_or_test='test', cv_fold='all')[source]

Calculates the average time for fitting the strategy

Parameters

unit (string (must be either 's' for seconds, 'm' for minutes or 'h' for hours)) – the unit in which the run time will be calculated

Returns

run_times – average run times per estimator and strategy

Return type

Pandas DataFrame

friedman_test(metric_name=None)[source]

The Friedman test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Implementation used: scipy.stats.

property metric_names[source]
property metrics[source]
property metrics_by_strategy[source]
property metrics_by_strategy_dataset[source]
nemenyi(metric_name=None)[source]

Post-hoc test run if the friedman_test reveals statistical significance. For more information see Nemenyi test. Implementation used scikit-posthocs.

plot_boxplots(metric_name=None, **kwargs)[source]

Box plot of metric

plot_critical_difference_diagram(metric_name=None, alpha=0.1)[source]

Plot critical difference diagrams

original implementation by Aaron Bostrom, modified by Markus Löning

rank(metric_name=None, ascending=False)[source]

Calculates the average ranks based on the performance of each estimator on each dataset

ranksum_test(metric_name=None)[source]

Non-parametric test for testing consistent differences between pairs of observations. The test counts the number of observations that are greater, smaller and equal to the mean http://en.wikipedia.org/wiki/Wilcoxon_rank-sum_test.

sign_test(metric_name=None)[source]

Non-parametric test for test for consistent differences between pairs of observations. See https://en.wikipedia.org/wiki/Sign_test for details about the test and https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.binom_test.html for details about the scipy implementation.

t_test(metric_name=None)[source]

Runs t-test on all possible combinations between the estimators.

t_test_with_bonferroni_correction(metric_name=None, alpha=0.05)[source]

correction used to counteract multiple comparisons https://en.wikipedia.org/wiki/Bonferroni_correction

wilcoxon_test(metric_name=None)[source]

http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed-rank test. Tests whether two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x-y is symmetric about zero