--- title: Dataset Utils keywords: fastai sidebar: home_sidebar summary: "Utils for dataset processing" description: "Utils for dataset processing" nb_path: "nbs/data_datasets__utils.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Download Utils

{% raw %}

download_file[source]

download_file(directory:str, source_url:str, decompress:bool=False)

Download data from source_ulr inside directory.

Parameters

directory: str, Path Custom directory where data will be downloaded. source_url: str URL where data is hosted. decompress: bool Wheter decompress downloaded file. Default False.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class Info[source]

Info(groups:Tuple[str], class_groups:Tuple[dataclass])

Info Dataclass of datasets. Args: groups (Tuple): Tuple of str groups class_groups (Tuple): Tuple of dataclasses.

{% endraw %} {% raw %}
{% endraw %} {% raw %}

class TimeSeriesDataclass[source]

TimeSeriesDataclass(S:DataFrame, X:DataFrame, Y:DataFrame, idx_categorical_static:Optional[List[T]]=None, group:Union[str, List[str]]=None)

Args: S (pd.DataFrame): DataFrame of static features of shape (n_time_series, n_features). X (pd.DataFrame): DataFrame of exogenous variables of shape (sum n_periods_i for i=1..n_time_series, n_exogenous). Y (pd.DataFrame): DataFrame of target variable of shape (sum n_periods_i for i=1..n_time_series, 1). idx_categorical_static (list, optional): List of categorical indexes of S. group (str, optional): Group name if applies. Example: 'Yearly'

{% endraw %} {% raw %}
{% endraw %}
{% raw %}

get_holiday_dates[source]

get_holiday_dates(holiday, dates)

{% endraw %} {% raw %}

holiday_kernel[source]

holiday_kernel(holiday, dates)

{% endraw %} {% raw %}

create_calendar_variables[source]

create_calendar_variables(X_df:DataFrame)

{% endraw %} {% raw %}

create_us_holiday_distance_variables[source]

create_us_holiday_distance_variables(X_df:DataFrame)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
import numpy as np
import matplotlib.pyplot as plt

ds = pd.date_range(start='2010-01-01', end='2012-12-31')
holiday_dist_new_year = holiday_kernel(holiday='new_year', dates=ds)
holiday_dist_independence = holiday_kernel(holiday='independence', dates=ds)

fig = plt.figure(figsize=(10,4))
plt.plot(ds, holiday_dist_new_year, label='new_year')
plt.plot(ds, holiday_dist_independence, label='independence')
plt.plot(ds, np.zeros(len(ds)))
plt.title('Holiday Kernels')
plt.grid()
plt.legend()
plt.show()
{% endraw %}