Kedro Logo
0.15

Introduction

  • Introduction
    • What is Kedro?
    • Learning about Kedro
    • Assumptions
      • Official Python programming language website
      • List of free programming books and tutorials

Getting Started

  • Installation prerequisites
    • macOS / Linux
    • Windows
    • Python virtual environments
      • Using conda
        • Create an environment with conda
        • Activate an environment with conda
        • Other conda commands
      • Alternatives to conda
  • Installation guide
  • Creating a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Starting with an existing project
    • Install project dependencies
  • A “Hello World” example
    • Project directory structure
    • Project source code
      • Writing code
    • Project components
    • Data
    • Example pipeline
    • Configuration
      • Project-specific configuration
      • Sensitive or personal configuration
    • Running the example
    • Summary

Tutorial

  • Typical Kedro workflow
    • Development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Git workflow
      • Creating a project repository
      • Submitting your changes to GitHub
  • Kedro Spaceflights tutorial
    • Creating the tutorial project
      • Install project dependencies
      • Project configuration
  • Setting up the data
    • Adding your datasets to data
      • reviews.csv
      • companies.csv
      • shuttles.xlsx
    • Reference all datasets
    • Creating custom datasets
      • Contributing a custom dataset implementation
  • Creating a pipeline
    • Node basics
    • Assemble nodes into a pipeline
    • Persisting pre-processed data
    • Creating a master table
      • Working in a Jupyter notebook
      • Extending the project’s code
    • Working with multiple pipelines
    • Partial pipeline runs
      • Using pipeline name
      • Using tags
    • Using decorators for nodes and pipelines
      • Decorating the nodes
      • Decorating the pipeline
    • Kedro runners
  • Packaging a project
    • Add documentation to your project
    • Package your project
    • Manage project dependencies
    • Extend your project
    • What is next?

User Guide

  • Setting up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
  • Setting up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
  • Configuration
    • Local and base configuration
    • Loading
    • Additional configuration environments
    • Templating configuration
    • Parameters
      • Loading parameters
      • Specifying parameters at runtime
      • Using parameters
    • Credentials
      • AWS credentials
    • Configuring kedro run arguments
  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Specifying the location of the dataset
    • Using the Data Catalog with the YAML API
    • Adding parameters
    • Feeding in credentials
    • Loading multiple datasets that have similar configuration
    • Transcoding datasets
      • A typical example of transcoding
      • How does transcoding work?
    • Transforming datasets
      • Applying built-in transformers
      • Developing your own transformer
    • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
    • Configuring a Data Catalog
    • Loading datasets
      • Behind the scenes
      • Viewing the available data sources
    • Saving data
      • Saving data to memory
      • Saving data to a SQL database for querying
      • Saving data in parquet
      • Creating your own dataset
  • Nodes
    • Creating a pipeline node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • Tagging nodes
    • Running nodes
      • Applying decorators to nodes
      • Applying multiple decorators to nodes
  • Pipelines
    • Building pipelines
      • Tagging pipeline nodes
      • Merging pipelines
      • Fetching pipeline nodes
    • Developing modular pipelines
      • What are modular pipelines?
      • How do we construct modular pipelines?
        • Structure
        • Ease of use and portability
      • A modular pipeline example template
      • Configuration
      • Datasets
    • Connecting existing pipelines
    • Using a modular pipeline twice
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
    • Running pipelines
      • Runners
      • Running a pipeline by name
      • Modifying a kedro run
      • Applying decorators on pipelines
    • Running pipelines with IO
    • Outputting to a file
    • Partial pipelines
      • Partial pipeline starting from inputs
      • Partial pipeline starting from nodes
      • Partial pipeline ending at nodes
      • Partial pipeline from nodes with tags
      • Running only some nodes
      • Recreating Missing Outputs
  • Logging
    • Configure logging
    • Use logging
    • Logging for anyconfig
  • Advanced IO
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
    • Partitioned dataset
      • Partitioned dataset definition
        • Dataset definition
        • Partitioned dataset credentials
      • Partitioned dataset load
      • Partitioned dataset save
      • Incremental loads with IncrementalDataSet
        • Incremental dataset load
        • Incremental dataset save
        • Incremental dataset confirm
        • Checkpoint configuration
        • Special checkpoint config keys
  • Working with PySpark
    • Initialising a SparkSession
    • Creating a SparkDataSet
      • Code API
      • YAML API
    • Working with PySpark and Kedro pipelines
  • Developing Kedro plugins
    • Overview
    • Initialisation
    • global and project commands
      • Suggested command convention
    • Working with click
    • Contributing process
    • Example of a simple plugin
    • Supported plugins
    • Community-developed plugins
  • Working with IPython and Jupyter Notebooks / Lab
    • Startup script
    • Working with context
      • Additional parameters for context.run()
    • Adding global variables
    • Working with IPython
      • Loading DataCatalog in IPython
    • Working from Jupyter
      • Idle notebooks
      • What if I cannot run kedro jupyter notebook?
      • Loading DataCatalog in Jupyter
      • Saving DataCatalog in Jupyter
      • Using parameters
      • Running the pipeline
      • Converting functions from Jupyter Notebooks into Kedro nodes
    • Extras
      • IPython loader
        • Installation
        • Prerequisites
  • Working with Databricks
    • Databricks Connect (recommended)
    • GitHub workflow with Databricks
  • Journal
    • Overview
      • Context journal record
      • Dataset journal record
    • Steps to manually reproduce your code and run the previous pipeline
  • Creating a new dataset
    • Scenario
    • Project setup
    • Problem
    • The anatomy of a dataset
    • Implement the _load method with fsspec
    • Implement the _save method with fsspec
    • Implement the _describe method
    • Bringing it all together
    • Integrating with PartitionedDataSet
    • Adding Versioning
    • Thread-safety consideration
    • Handling credentials and different filesystems
    • Contribute your dataset to Kedro

Resources

  • Frequently asked questions
    • What is Kedro?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
      • Kedro vs workflow schedulers
      • Kedro vs other ETL frameworks
    • How can I find out more about Kedro?
      • Articles, podcasts and talks
      • Kedro used on real-world use cases
      • Community interaction
    • What is data engineering convention?
    • What version of Python does Kedro use?
    • How do I upgrade Kedro?
    • What best practice should I follow to avoid leaking confidential data?
    • What is the philosophy behind Kedro?
    • Where do I store my custom editor configuration?
    • How do I look up an API function?
    • How do I build documentation for my project?
    • How do I build documentation about Kedro?
  • Kedro architecture overview
    • Building blocks
      • Project
        • kedro_cli.py
        • run.py
        • .kedro.yml
        • 00-kedro-init.py
        • ProjectContext
      • Framework
        • kedro cli
        • kedro/cli/cli.py
        • plugins
        • get_project_context()
        • load_context()
        • KedroContext
      • Library
        • ConfigLoader
        • Pipeline
        • AbstractRunner
        • DataCatalog
        • AbstractDataSet
  • Guide to CLI commands
    • Autocomplete
    • Global Kedro commands
    • Project-specific Kedro commands
      • kedro run
      • kedro install
      • kedro test
      • kedro package
      • kedro build-docs
      • kedro jupyter notebook, kedro jupyter lab, kedro ipython
      • kedro jupyter convert
      • kedro lint
      • kedro activate-nbstripout
    • Using Python
  • Linting your Kedro project
  • Images and icons
    • White background
      • Icon
      • Icon with text
    • Black background
      • Icon
      • Icon with text

API Docs

  • kedro
    • kedro.config
      • kedro.config.ConfigLoader
      • kedro.config.TemplatedConfigLoader
    • kedro.io
      • Data Catalog
        • kedro.io.DataCatalog
      • Data Sets
        • kedro.io.CSVLocalDataSet
        • kedro.io.CSVHTTPDataSet
        • kedro.io.CSVS3DataSet
        • kedro.io.HDFLocalDataSet
        • kedro.io.HDFS3DataSet
        • kedro.io.JSONLocalDataSet
        • kedro.io.JSONDataSet
        • kedro.io.LambdaDataSet
        • kedro.io.MemoryDataSet
        • kedro.io.ParquetLocalDataSet
        • kedro.io.PartitionedDataSet
        • kedro.io.IncrementalDataSet
        • kedro.io.PickleLocalDataSet
        • kedro.io.PickleS3DataSet
        • kedro.io.SQLTableDataSet
        • kedro.io.SQLQueryDataSet
        • kedro.io.TextLocalDataSet
        • kedro.io.ExcelLocalDataSet
        • kedro.io.CachedDataSet
        • kedro.io.DataCatalogWithDefault
      • Errors
        • kedro.io.DataSetAlreadyExistsError
        • kedro.io.DataSetError
        • kedro.io.DataSetNotFoundError
      • Base Classes
        • kedro.io.AbstractDataSet
        • kedro.io.AbstractVersionedDataSet
        • kedro.io.AbstractTransformer
        • kedro.io.Version
    • kedro.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.node
      • kedro.pipeline.decorators.log_time
    • kedro.runner
      • kedro.runner.AbstractRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ParallelRunner
    • kedro.context
      • Base Classes
        • kedro.context.KedroContext
      • Functions
        • kedro.context.load_context
      • Errors
        • kedro.context.KedroContextError
    • kedro.contrib
      • kedro.contrib.io
        • kedro.contrib.io.catalog_with_default.DataCatalogWithDefault
        • kedro.contrib.io.azure.CSVBlobDataSet
        • kedro.contrib.io.azure.JSONBlobDataSet
        • kedro.contrib.io.bioinformatics.BioSequenceLocalDataSet
        • kedro.contrib.io.cached.CachedDataSet
        • kedro.contrib.io.feather.FeatherLocalDataSet
        • kedro.contrib.io.matplotlib.MatplotlibLocalWriter
        • kedro.contrib.io.matplotlib.MatplotlibS3Writer
        • kedro.contrib.io.parquet.ParquetS3DataSet
        • kedro.contrib.io.pyspark.SparkDataSet
        • kedro.contrib.io.pyspark.SparkHiveDataSet
        • kedro.contrib.io.pyspark.SparkJDBCDataSet
        • kedro.contrib.io.yaml_local.YAMLLocalDataSet
        • kedro.contrib.io.gcs.CSVGCSDataSet
        • kedro.contrib.io.gcs.JSONGCSDataSet
        • kedro.contrib.io.gcs.ParquetGCSDataSet
        • kedro.contrib.io.networkx.NetworkXLocalDataSet
        • kedro.contrib.io.transformers.ProfileTimeTransformer
      • kedro.contrib.config.templated_config.TemplatedConfigLoader
      • kedro.contrib.colors.logging.ColorHandler
      • kedro.contrib.decorators.pyspark.pandas_to_spark
      • kedro.contrib.decorators.pyspark.spark_to_pandas
      • kedro.contrib.decorators.retry.retry
      • kedro.contrib.decorators.memory_profiler.mem_profile
    • kedro.cli
      • kedro.cli.get_project_context
    • kedro.versioning
      • Base Classes
        • kedro.versioning.Journal
      • Modules
      • Errors
    • kedro.extras.datasets
      • Data Sets
        • kedro.extras.datasets.biosequence.BioSequenceDataSet
        • kedro.extras.datasets.dask.ParquetDataSet
        • kedro.extras.datasets.matplotlib.MatplotlibWriter
        • kedro.extras.datasets.networkx.NetworkXDataSet
        • kedro.extras.datasets.pandas.CSVBlobDataSet
        • kedro.extras.datasets.pandas.CSVDataSet
        • kedro.extras.datasets.pandas.ExcelDataSet
        • kedro.extras.datasets.pandas.FeatherDataSet
        • kedro.extras.datasets.pandas.GBQTableDataSet
        • kedro.extras.datasets.pandas.HDFDataSet
        • kedro.extras.datasets.pandas.JSONBlobDataSet
        • kedro.extras.datasets.pandas.JSONDataSet
        • kedro.extras.datasets.pandas.ParquetDataSet
        • kedro.extras.datasets.pandas.SQLQueryDataSet
        • kedro.extras.datasets.pandas.SQLTableDataSet
        • kedro.extras.datasets.spark.SparkDataSet
        • kedro.extras.datasets.spark.SparkHiveDataSet
        • kedro.extras.datasets.spark.SparkJDBCDataSet
        • kedro.extras.datasets.text.TextDataSet
        • kedro.extras.datasets.yaml.YAMLDataSet
    • kedro.extras.decorators
      • kedro.extras.decorators.retry_node.retry
      • kedro.extras.decorators.memory_profiler.mem_profile
    • kedro.extras.transformers
      • kedro.extras.transformers.time_profiler.ProfileTimeTransformer
    • kedro.extras.logging
      • kedro.extras.logging.ColorHandler
Kedro
  • Docs »
  • Search
  • Edit on GitHub


© Copyright 2020, QuantumBlack Visual Analytics Limited

Built with Sphinx using a theme provided by Read the Docs.