--- title: M5 dataset keywords: fastai sidebar: home_sidebar summary: "Download and evaluate the M5 dataset." description: "Download and evaluate the M5 dataset." nb_path: "nbs/data_datasets__m5.ipynb" ---
Y_df, X_df, S_df = M5.load('./data')
n_series = 30_490
assert Y_df['unique_id'].unique().size == n_series
assert X_df['unique_id'].unique().size == n_series
assert S_df.shape[0] == 30_490
Y_df.head()
X_df.head()
S_df.head()
The method evaluate
from the class M5Evaluation
can receive a url of a submission to the M5 competiton.
The results compared to the on-the-fly evaluation were obtained from the official evaluation.
m5_winner_url = 'https://github.com/Nixtla/m5-forecasts/raw/main/forecasts/0001 YJ_STU.zip'
winner_evaluation = M5Evaluation.evaluate('data', m5_winner_url)
# Test of the same evaluation as the original one
test_close(winner_evaluation.loc['Total'].item(), 0.520, eps=1e-3)
winner_evaluation
Also the method evaluate
can recevie a pandas DataFrame of forecasts.
m5_second_place_url = 'https://github.com/Nixtla/m5-forecasts/raw/main/forecasts/0002 Matthias.zip'
m5_second_place_forecasts = M5Evaluation.load_benchmark('data', m5_second_place_url)
second_place_evaluation = M5Evaluation.evaluate('data', m5_second_place_forecasts)
# Test of the same evaluation as the original one
test_close(second_place_evaluation.loc['Total'].item(), 0.528, eps=1e-3)
second_place_evaluation
The evaluation metric of the Favorita Kaggle competition was the normalized weighted root mean squared logarithmic error (NWRMSLE). Perishable items have a score weight of 1.25; otherwise, the weight is 1.0.
{% raw %} $$ NWRMSLE = \sqrt{\frac{\sum^{n}_{i=1} w_{i}\left(log(\hat{y}_{i}+1) - log(y_{i}+1)\right)^{2}}{\sum^{n}_{i=1} w_{i}}}$$ {% endraw %}
Kaggle Competition Forecasting Methods | 16D ahead NWRMSLE | |
---|---|---|
LGBM [1] | 0.5091 | |
Seq2Seq WaveNet [2] | 0.5129 |