sktime.benchmarking.evaluation¶
-
class
sktime.benchmarking.evaluation.
Evaluator
(results)[source]¶ Bases:
object
Analyze results of machine learning experiments.
-
evaluate
(metric, train_or_test='test', cv_fold='all')[source]¶ Calculates the average prediction error per estimator as well as the prediction error achieved by each estimator on individual datasets.
-
fit_runtime
(unit='s', train_or_test='test', cv_fold='all')[source]¶ Calculates the average time for fitting the strategy
- Parameters
unit (string (must be either 's' for seconds, 'm' for minutes or 'h' for hours)) – the unit in which the run time will be calculated
- Returns
run_times – average run times per estimator and strategy
- Return type
Pandas DataFrame
-
friedman_test
(metric_name=None)[source]¶ The Friedman test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Implementation used: scipy.stats.
-
nemenyi
(metric_name=None)[source]¶ Post-hoc test run if the friedman_test reveals statistical significance. For more information see Nemenyi test. Implementation used scikit-posthocs.
-
plot_critical_difference_diagram
(metric_name=None, alpha=0.1)[source]¶ Plot critical difference diagrams
original implementation by Aaron Bostrom, modified by Markus Löning
-
rank
(metric_name=None, ascending=False)[source]¶ Calculates the average ranks based on the performance of each estimator on each dataset
-
ranksum_test
(metric_name=None)[source]¶ Non-parametric test for testing consistent differences between pairs of observations. The test counts the number of observations that are greater, smaller and equal to the mean http://en.wikipedia.org/wiki/Wilcoxon_rank-sum_test.
-
sign_test
(metric_name=None)[source]¶ Non-parametric test for test for consistent differences between pairs of observations. See https://en.wikipedia.org/wiki/Sign_test for details about the test and https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.binom_test.html for details about the scipy implementation.
-
t_test_with_bonferroni_correction
(metric_name=None, alpha=0.05)[source]¶ correction used to counteract multiple comparisons https://en.wikipedia.org/wiki/Bonferroni_correction
-
wilcoxon_test
(metric_name=None)[source]¶ http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Wilcoxon signed-rank test. Tests whether two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x-y is symmetric about zero
-