divina#

Date: Feb 07, 2023

Useful links: Binary Installers | Source Repository | Issues & Ideas | Q&A Support

divina is an open source, BSD3-licensed library providing scalable and hyper-interpretable causal forecasting capabilities written in Python and consumable via CLI.

The aim of divina is to deliver performance-oriented and hypter-interpretable exogenous time series forecasting models by producing accurate and bootstrapped predictions, local and overridable factor summaries and easily configurable feature engineering and experiment management capabilities.

Ensemble Architecture#

divina is essentially a convenience wrapper that facilitates training, prediction, validation and deployment of an ensemble consisting of a causal, interpretable model that is boosted by an endogenous time-series model, allowing for high levels of automation and accuracy while still emphasizing and relying on the causal relationships discovered by the user. This ensemble structure is delivered with swappable model types to be able to suit many different kinds of forecasting problems. divina is also fully integrated with both Dask and Prefect meaning that distributed compute and pipeline orchestration can be enabled with the flip of a switch. For more information of divina’s features, check out the quickstart page.

Installation#

divina is available via pypi and can be installed using the python package manager pip as shown below.

pip install divina

Getting Started#

To train and predict using a divina pipeline, we first create a pandas dataframe full of dummy data, convert that to a dask dataframe, and call the fit() method of our pipeline. Once the pipeline is fit, it can be used to predict on out-of-sample feature sets.

import dask.dataframe as dd
import pandas as pd

from divina import Divina

example_data = pd.DataFrame(
    data=[
        ["2011-01-01", 3, 6],
        ["2011-01-02", 2, 4],
        ["2011-01-03", 8, 6],
        ["2011-01-04", 1, 1],
        ["2011-01-05", 2, 3],
    ],
    columns=["a", "b", "c"],
)

example_data_dask = dd.from_pandas(example_data, npartitions=1)

example_pipeline = Divina(target="c", time_index="a", frequency="D")

y_hat_insample = example_pipeline.fit(example_data_dask)[
    0
].causal_validation.predictions

y_hat_out_of_sample = example_pipeline.predict(
    example_data_dask.drop(columns="c")
).causal_predictions.predictions