scale#

Forecasting at Scale

Inversion of control of the Dask client that connects to, authenticates with and uses for all Divina pipeline computations remote Dask clusters on AWS, GCP, Azure and more via Dask Cloud provider is enabled through the provision of the dask_configuration argument to a Divina pipeline’s fit and predict methods.

Below is an example of a pipeline running on AWS.

import os

import dask.dataframe as dd
import pandas as pd

from divina import Divina
from divina.divina.pipeline.utils import DaskConfiguration

os.environ['AWS_ACCESS_KEY_ID'] = 'your access key'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'your secret key'

dask_configuration = DaskConfiguration(destination='aws', num_workers=3)

example_data = pd.DataFrame(
    data=[
        ["2011-01-01", 3, 6],
        ["2011-01-02", 2, 4],
        ["2011-01-03", 8, 6],
        ["2011-01-04", 1, 1],
        ["2011-01-05", 2, 3],
    ],
    columns=["a", "b", "c"],
)

example_data_dask = dd.from_pandas(example_data, npartitions=1)

example_pipeline = Divina(target="c", time_index="a", frequency="D")

y_hat_insample = example_pipeline.fit(
    example_data_dask,
    dask_configuration=dask_configuration)[0].causal_validation.predictions

y_hat_out_of_sample = example_pipeline.predict(
    example_data_dask.drop(columns="c"),
    dask_configuration=dask_configuration
).causal_predictions.predictions