Kedro
0.14

Introduction

  • Introduction
    • What is Kedro?
    • Learning about Kedro
    • Assumptions
      • Official Python programming language website
      • List of free programming books and tutorials

Getting Started

  • Installation prerequisites
    • macOS / Linux
    • Windows
    • Python virtual environments
      • Using conda
        • Create an environment with conda
        • Activate an environment with conda
        • Other conda commands
      • Alternatives to conda
  • Installation guide
  • Creating a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Starting with an existing project
  • A “Hello World” example
    • Project directory structure
    • Project source code
      • Writing code
    • Project components
    • Data
    • Example pipeline
    • Configuration
      • Project-specific configuration
      • Sensitive or personal configuration
    • Running the example
    • Summary

Tutorial

  • Typical Kedro workflow
    • Development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Git workflow
      • Creating a project repository
      • Submitting your changes to GitHub
  • Kedro Spaceflights tutorial
    • Creating the tutorial project
      • Project configuration
  • Setting up the data
    • Adding your datasets to data
      • reviews.csv
      • companies.csv
      • shuttles.xlsx
    • Reference all datasets
    • Creating custom datasets
      • Contributing a custom dataset implementation
  • Creating a pipeline
    • Node basics
    • Assemble nodes into a pipeline
    • Persisting pre-processed data
    • Working with Kedro projects from Jupyter
    • Creating a master table
      • Working in a Jupyter notebook
      • Extending the project’s code
    • Working with multiple pipelines
    • Partial pipeline runs
    • Using decorators for nodes and pipelines
      • Decorating the nodes
      • Decorating the pipeline
    • Kedro runners
  • Packaging a project
    • Add documentation to your project
    • Package your project
    • What is next?

User Guide

  • Setting up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
  • Setting up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
  • Configuration
    • Local and base configuration
    • Loading
    • Additional configuration environments
  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Using the Data Catalog with the YAML API
    • Loading multiple datasets that have similar configuration
      • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
      • Configuring a data catalog
      • Loading datasets
        • Behind the scenes
        • Viewing the available data sources
      • Saving data
        • Saving data to memory
        • Saving data to a SQL database for querying
        • Saving data in parquet
        • Creating your own dataset
  • Nodes and pipelines
    • Nodes
    • Creating a pipeline node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • Running nodes
      • Applying decorators to nodes
      • Applying multiple decorators to nodes
    • Building pipelines
      • Merging pipelines
      • Fetching pipeline nodes
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
    • Running pipelines
      • Runners
      • Applying decorators on pipelines
    • Running pipelines with IO
    • Outputting to a file
    • Partial pipelines
      • Partial pipeline starting from inputs
      • Partial pipeline starting from nodes
      • Partial pipeline from nodes with tags
      • Running only some nodes
      • Recreating Missing Outputs
  • Logging
    • Configure logging
    • Use logging
  • Advanced IO
    • AbstractDataSet
    • DataSetError
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
  • Working with PySpark
    • Initialising a SparkSession
    • Creating a SparkDataSet
      • Code API
      • YAML API
    • Working with PySpark and Kedro pipelines
  • Developing Kedro plugins
    • Overview
    • Initialisation
    • global and project commands
    • Working with click
    • Example of a simple plugin

Resources

  • Frequently asked questions
    • What is Kedro?
      • Philosophy
      • Principles
    • What version of Python does Kedro use?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
      • Kedro vs workflow schedulers
      • Kedro vs other ETL frameworks
    • What is data engineering convention?
    • What best practice should I follow to avoid leaking confidential data?
    • Where do I store my custom editor configuration?
    • How do I look up an API function?
    • How do I build documentation for my project?
    • How do I build documentation about Kedro?
    • How can I find out more about Kedro?
  • Guide to CLI commands
    • Global Kedro commands
    • Project-specific Kedro commands
      • kedro run
      • kedro install
      • kedro test
      • kedro package
      • kedro build-docs
      • kedro jupyter notebook, kedro jupyter lab, kedro ipython
      • kedro activate-nbstripout
    • Using Python
  • Working with Kedro and IPython
    • Loading DataCatalog in IPython
  • Linting your Kedro project

API Docs

  • kedro
    • kedro.io
      • Data Catalog
        • kedro.io.DataCatalog
      • Data Sets
        • kedro.io.CSVLocalDataSet
        • kedro.io.CSVS3DataSet
        • kedro.io.HDFLocalDataSet
        • kedro.io.JSONLocalDataSet
        • kedro.io.LambdaDataSet
        • kedro.io.MemoryDataSet
        • kedro.io.ParquetLocalDataSet
        • kedro.io.PickleLocalDataSet
        • kedro.io.PickleS3DataSet
        • kedro.io.SQLTableDataSet
        • kedro.io.SQLQueryDataSet
        • kedro.io.TextLocalDataSet
        • kedro.io.ExcelLocalDataSet
      • Errors
        • kedro.io.DataSetAlreadyExistsError
        • kedro.io.DataSetError
        • kedro.io.DataSetNotFoundError
      • Base Classes
        • kedro.io.AbstractDataSet
        • kedro.io.ExistsMixin
        • kedro.io.FilepathVersionMixIn
        • kedro.io.S3PathVersionMixIn
        • kedro.io.Version
    • kedro.config
      • kedro.config.ConfigLoader
    • kedro.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.node
    • kedro.runner
      • kedro.runner.AbstractRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ParallelRunner
    • kedro.contrib
      • kedro.contrib.io
        • kedro.contrib.io.catalog_with_default package
        • kedro.contrib.io.azure package
        • kedro.contrib.io.bioinformatics package
        • kedro.contrib.io.pyspark package
      • kedro.contrib.colors package
        • Subpackages
      • kedro.contrib.decorators package
        • Submodules
        • kedro.contrib.decorators.decorators module
Kedro
  • Docs »
  • Python Module Index

Python Module Index

k
 
k
- kedro
    kedro.config
    kedro.contrib
    kedro.contrib.colors
    kedro.contrib.colors.logging
    kedro.contrib.colors.logging.color_logger
    kedro.contrib.decorators
    kedro.contrib.decorators.decorators
    kedro.contrib.io
    kedro.contrib.io.azure
    kedro.contrib.io.azure.csv_blob
    kedro.contrib.io.bioinformatics
    kedro.contrib.io.bioinformatics.sequence_dataset
    kedro.contrib.io.catalog_with_default
    kedro.contrib.io.catalog_with_default.data_catalog_with_default
    kedro.contrib.io.pyspark
    kedro.contrib.io.pyspark.spark_data_set
    kedro.contrib.io.pyspark.spark_jdbc
    kedro.io
    kedro.pipeline
    kedro.runner

© Copyright 2018-2019, QuantumBlack Visual Analytics Limited

Built with Sphinx using a theme provided by Read the Docs.