kedro.runner.ParallelRunner

class kedro.runner.ParallelRunner[source]

Bases: kedro.runner.runner.AbstractRunner

ParallelRunner is an AbstractRunner implementation. It can be used to run the Pipeline in parallel groups formed by toposort.

Methods

ParallelRunner.__init__() Instantiates the runner by creating a Manager.
ParallelRunner.create_default_data_set(…) Factory method for creating the default data set for the runner.
ParallelRunner.run(pipeline, catalog) Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.
ParallelRunner.run_only_missing(pipeline, …) Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.
__init__()[source]

Instantiates the runner by creating a Manager.

create_default_data_set(ds_name, max_loads)[source]

Factory method for creating the default data set for the runner.

Parameters:
  • ds_name (str) – Name of the missing data set
  • max_loads (int) – Maximum number of times load method of the default data set is allowed to be invoked. Any number of calls is allowed if the argument is not set.
Return type:

AbstractDataSet

Returns:

An instance of an implementation of AbstractDataSet to be used for all unregistered data sets.

run(pipeline, catalog)

Run the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.
  • catalog (DataCatalog) – The DataCatalog from which to fetch data.
Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

Dict[str, Any]

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.

run_only_missing(pipeline, catalog)

Run only the missing outputs from the Pipeline using the DataSet``s provided by ``catalog and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.
  • catalog (DataCatalog) – The DataCatalog from which to fetch data.
Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

Dict[str, Any]

Returns:

Any node outputs that cannot be processed by the DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.