kedro.pipeline.Pipeline

class kedro.pipeline.Pipeline(nodes, name=None)[source]

A Pipeline defined as a collection of Node objects. This class treats nodes as part of a graph representation and provides inputs, outputs and execution order.

__init__(nodes, name=None)[source]

Initialise Pipeline with a list of Node instances.

Parameters:
  • nodes (Iterable[Union[Node, Pipeline]]) – The list of nodes the Pipeline will be made of. If you provide pipelines among the list of nodes, those pipelines will be expanded and all their nodes will become part of this new pipeline.
  • name (Optional[str]) – The name of the pipeline. If specified, this name will be used to tag all of the nodes in the pipeline.
Raises:
  • ValueError – When an empty list of nodes is provided, or when not all nodes have unique names.
  • CircularDependencyError – When visiting all the nodes is not possible due to the existence of a circular dependency.
  • OutputNotUniqueError – When multiple Node instances produce the same output.

Example:

from kedro.pipeline import Pipeline
from kedro.pipeline import node

# In the following scenario first_ds and second_ds
# are data sets provided by io. Pipeline will pass these
# data sets to first_node function and provides the result
# to the second_node as input.

def first_node(first_ds, second_ds):
    return dict(third_ds=first_ds+second_ds)

def second_node(third_ds):
    return third_ds

pipeline = Pipeline([
    node(first_node, ['first_ds', 'second_ds'], ['third_ds']),
    node(second_node, dict(third_ds='third_ds'), 'fourth_ds')])

pipeline.describe()

Methods

__init__(nodes[, name]) Initialise Pipeline with a list of Node instances.
all_inputs() All inputs for all nodes in the pipeline.
all_outputs() All outputs of all nodes in the pipeline.
data_sets() The names of all data sets used by the Pipeline, including inputs and outputs.
decorate(*decorators) Create a new Pipeline by applying the provided decorators to all the nodes in the pipeline.
describe([names_only]) Obtain the order of execution and expected free input variables in a loggable pre-formatted string.
from_inputs(*inputs) Create a new Pipeline object with the nodes which depend
from_nodes(*node_names) Create a new Pipeline object with the nodes which depend directly or transitively on the provided nodes.
inputs() The names of free inputs that must be provided at runtime so that the pipeline is runnable.
only_nodes(*node_names) Create a new Pipeline which will contain only the specified nodes by name.
only_nodes_with_inputs(*inputs) Create a new Pipeline object with the nodes which depend
only_nodes_with_outputs(*outputs) Create a new Pipeline object with the nodes which are directly required to produce the provided outputs.
only_nodes_with_tags(*tags) Create a new Pipeline object with the nodes which contain any of the provided tags.
outputs() The names of outputs produced when the whole pipeline is run.
to_json() Return a json representation of the pipeline.
to_nodes(*node_names) Create a new Pipeline object with the nodes required directly or transitively by the provided nodes.
to_outputs(*outputs) Create a new Pipeline object with the nodes which are directly or transitively required to produce the provided outputs.

Attributes

grouped_nodes Return a list of the pipeline nodes in topologically ordered groups, i.e.
name Get the pipeline name.
node_dependencies All pairs of nodes where the first Node has a direct dependency on the second Node.
nodes Return a list of the pipeline nodes in topological order, i.e.