--- title: GEFCom2012 keywords: fastai sidebar: home_sidebar summary: "Download the GEFCom2012 dataset." description: "Download the GEFCom2012 dataset." nb_path: "nbs/data_datasets__gefcom2012.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
%%html
<style> table {float:left} </style>
{% endraw %} {% raw %}
{% endraw %} {% raw %}
import matplotlib.pyplot as plt
from matplotlib import rcParams
plt.rcParams['font.family'] = 'serif'
FONTSIZE = 22
{% endraw %} {% raw %}

class GEFCom2012[source]

GEFCom2012()

{% endraw %} {% raw %}
{% endraw %}

GEFCom2012-L

The GEFCom2012-L dataset was made available as part of a kaggle competition.

The competition asked for the creation of hierarchical forecasts for 20 zones and the system. For this purpose the sum of zonal loads should be equal to the system load. The evaluation metric was the Weighted Root Mean Square Error (WRMSE).

The task was to provide two day ahead hourly forecasts for the power generation of seven wind farms. The dataset contains:

  • Hourly electricity load history for twenty zones from 2004-01-01 to 2008-06-30 for train.
  • Hourly temperature history for eleven weather stations from 2004-01-01 to 2008-06-30 for train.
  • List of US holidays.
  • Hourly forecast benchmark from 2008-07-01 to 2008-07-07 with evaluation weights.
  • Hourly electricity load test history for twenty zones from 2011-01-01 to 2012-07-07.
{% raw %}

class GEFCom2012_L[source]

GEFCom2012_L()

{% endraw %} {% raw %}
{% endraw %} {% raw %}
Y_df, X_df, benchmark_df = GEFCom2012_L.load('data')
Y_df.head()
100%|██████████| 11.6M/11.6M [00:00<00:00, 14.1MiB/s]
INFO:nixtla.data.datasets.utils:Successfully downloaded GEFCom2012.zip?dl=1, 11630887, bytes.
INFO:nixtla.data.datasets.utils:Decompressing zip file...
INFO:nixtla.data.datasets.utils:Successfully decompressed data/gefcom2012/GEFCom2012.zip?dl=1
unique_id ds y
0 1 2004-01-01 01:00:00 16853.0
1 1 2004-01-01 02:00:00 16450.0
2 1 2004-01-01 03:00:00 16517.0
3 1 2004-01-01 04:00:00 16873.0
4 1 2004-01-01 05:00:00 17064.0
{% endraw %} {% raw %}
Y_df, X_df, benchmark_df = GEFCom2012_L.load(directory='data')
Y_df = Y_df[Y_df.unique_id==1]

ds = Y_df.ds.values[-365:]
y_true = Y_df.y.values[-365:]

x_plot = Y_df.ds.values
x_plot_min = pd.to_datetime(x_plot.min()).strftime('%B %d, %Y')
x_plot_max = pd.to_datetime(x_plot.max()).strftime('%B %d, %Y')
x_axis_str = f'Hours [{x_plot_min}  to  {x_plot_max}]'
y_axis_str = 'Load (MW)'

fig = plt.figure(figsize=(20, 4))
fig.tight_layout()
ax0 = plt.subplot2grid((1,1),(0, 0))
axs = [ax0]

axs[0].plot(ds, y_true, color='#628793', linewidth=0.4, label='true')
axs[0].tick_params(labelsize=FONTSIZE-4)
axs[0].set_xlabel(x_axis_str, fontsize=FONTSIZE)
axs[0].set_ylabel(y_axis_str, fontsize=FONTSIZE)
plt.title('GEFCom2012-W Zone=1', fontsize=FONTSIZE)
plt.grid()
plt.show()
{% endraw %}

GEFCom2012-W

The GEFCom2012-W dataset was made available as part of a kaggle competition.

The task was to provide two day ahead hourly forecasts for the power generation of seven wind farms. The dataset contains:

  • Hourly wind power history for the seven farms from 2009-07-01 to 2010-12-31 for train.
  • Two days ahead wind forecasts for the seven farms from 2009-07-01 to 2012-06-26 for train.
  • Hourly wind power history for the seven farms from 2011-01-01 to 2012-06-28 for test.
  • Naive forecast benchmark from 2011-01-01 to 2012-06-28.
{% raw %}

class GEFCom2012_W[source]

GEFCom2012_W()

{% endraw %} {% raw %}
{% endraw %} {% raw %}
Y_df, X_df, benchmark_df = GEFCom2012_W.load(directory='data')
Y_df = Y_df[:168]
X_df = X_df[:168]

fig = plt.figure(figsize=(15, 4))

x_plot = Y_df.ds.values
x_plot_min = pd.to_datetime(x_plot.min()).strftime('%B %d, %Y')
x_plot_max = pd.to_datetime(x_plot.max()).strftime('%B %d, %Y')
x_axis_str = f'Hours [{x_plot_min}  to  {x_plot_max}]'
y_axis_str = 'U Wind Component'

plt.plot(x_plot, X_df.u_lead12, label='12 lead')
plt.plot(x_plot, X_df.u_lead24, label='24 lead')
plt.plot(x_plot, X_df.u_lead36, label='36 lead')
plt.plot(x_plot, X_df.u_lead48, label='48 lead')
plt.xlabel(x_axis_str, fontsize=FONTSIZE)
plt.ylabel(y_axis_str, fontsize=FONTSIZE)
plt.title('GEFCom2014-W', fontsize=FONTSIZE)
plt.legend()
plt.grid()
plt.show()
{% endraw %}

GEFCom2012-W references

Forecasting Method 48H ahead RMSE
ALL-CF [3] 0.14564
GBM + K-Means + LR [2] 0.14567
KNN [1] 0.1472
SGCRF [4] 0.1488
LSBRT [5] 0.1518
SDAE-m-m [6] 0.154
S-GP-ENV [7] 0.1604
GP + NN[8] 0.1752
Naive Forecast 0.361