Kedro architecture overview¶
Note: This documentation is based onKedro 0.17.0
, if you spot anything that is incorrect then please create an issue or pull request.
Here is a diagram showing high-level architecture of Kedro library:
Kedro architecture diagram
Note: The arrowA -> B
on the diagram above should be read as: “A is aware of existence of B or directly uses B, while B does not necessarily know about A”.
Note: Internally we use import-linter to enforce this structure and generally want the lower layer to be unaware of the above layers.
Building blocks¶
The architecture diagram above is formed of the following building blocks:
Project¶
This section contains the building blocks that help accessing core components of the Kedro project (e.g., the data catalog or the pipeline).
cli.py
¶
A Python file that contains project specific CLI commands (e.g., kedro run
, kedro test
, etc.). This file must be located at the root of the project.
run.py
¶
A Python file located in src/<python_package>/run.py
. This file serves as the main entry point of the project and is used to run the project in package mode.
pyproject.toml
¶
pyproject.yml
identifies the project root, which is used by other Kedro components and contains the following metadata entries:
package_name
: A valid Python package name for your project packageproject_name
: A human readable name for your projectproject_version
: Kedro version with which the project was generatedsource_dir
: (Optional) The directory of the source path relative to the project root path. Default directory issrc/
and when customised the path should be separated by a forward slash (e.gsrc/<path_to_src>/
)
pyproject.toml
must be located at the root of the project. We use the pyproject.toml
for all configuration needed to create the Python package and allow the Kedro CLI to find where the source code is.
settings.py
¶
We use the settings.py
for all project settings, which will not change at run time, but at development time. settings.py
contains the following configuration entries:
DISABLE_HOOKS_FOR_PLUGINS
: (Optional,tuple
) A list of the installed plugins for which to disable auto-registryHOOKS
: (Optional,tuple
) A list of instances of Hooks implementations to be registered with the project (default is an instance of<python_project>.hooks.ProjectHooks
)SESSION_STORE_CLASS
: (Optional,class
) Define a store class to be used to save data from aKedroSession
SESSION_STORE_ARGS
: (Optional,dict
) Define keyword arguments to be passed toSESSION_STORE_CLASS
constructorCONTEXT_CLASS
: (Optional,class
) Define a project context class to be used at runtime. Defaults toKedroContext
class
00-kedro-init.py
¶
This script is automatically invoked at IPython kernel startup when calling kedro jupyter notebook
, kedro jupyter lab
and kedro ipython
CLI commands. 00-kedro-init.py
creates an instance of KedroContext
object, which can be used to interact with the current project right away.
Framework¶
This section contains the building blocks that help running native Kedro and plugin CLI commands.
kedro cli
¶
Any Kedro CLI command (e.g., kedro new
, kedro run
) executed by a user.
kedro/cli/cli.py
¶
A Python file that contains Kedro global CLI commands, i.e. the ones that do not require a Kedro project in the current working directory to work (e.g., kedro new
, kedro info
).
plugins
¶
Any CLI command that is implemented by a Kedro plugin (e.g., Kedro-Docker, Kedro-Airflow, Kedro-Viz).
get_project_context()
¶
A python function that instantiates the project context by calling load_context()
. get_project_context()
also preserves backwards compatible interface to ensure old versions of the plugins continue working even if breaking changes are introduced in load_context()
.
Note: This function is intended for plugin use only. It is now deprecated and will be removed in version 0.18.0. To instantiate the project context outside of a plugin, you should callKedroSession.load_context()
.
load_context()
¶
Note: This function is now deprecated in favour ofKedroSession.load_context()
and will be removed in Kedro 0.18.0.
A Python function that locates Kedro project based on pyproject.toml
and instantiates the project context.
KedroSession
¶
KedroSession
is the object that is responsible for managing the lifecycle of a Kedro run.
KedroContext
¶
The base class for project context implementations. It holds the configuration and Kedro’s main functionality, and also serves the purpose of the main entry point for interactions with the core project components.
Library¶
This section lists core Kedro components. These library components can be used both in conjunction and separately.
ConfigLoader
¶
Helper class that enables loading the project configuration in a consistent way.
Pipeline
¶
A collection of Node
objects with the preserved execution order.
AbstractRunner
¶
A base class for all Pipeline
runner implementations.
DataCatalog
¶
A dataset store providing load
and save
capabilities for the underlying datasets.
AbstractDataSet
¶
A base class for all dataset implementations.