kedro.contrib.decorators package

This module contains function decorators, which can be used as Node decorators. See kedro.pipeline.node.decorate

kedro.contrib.decorators.pandas_to_spark(spark)[source]

Inspects the decorated function’s inputs and converts all pandas DataFrame inputs to spark DataFrames.

Note that in the example below we have enabled spark.sql.execution.arrow.enabled. For this to work, you should first pip install pyarrow and add pyarrow to requirements.txt. Enabling this option makes the convertion between pyspark <-> DataFrames much faster.

Parameters:spark (SparkSession) –

The spark session singleton object to use for the creation of the pySpark DataFrames. A possible pattern you can use here is the following:

spark.py

from pyspark.sql import SparkSession

def get_spark():
  return (
    SparkSession.builder
      .master("local[*]")
      .appName("kedro")
      .config("spark.driver.memory", "4g")
      .config("spark.driver.maxResultSize", "3g")
      .config("spark.sql.execution.arrow.enabled", "true")
      .getOrCreate()
    )

nodes.py

from spark import get_spark
@pandas_to_spark(get_spark())
def node_1(data):
    data.show() # data is pyspark.sql.DataFrame
Return type:Callable
Returns:The original function with any pandas DF inputs translated to spark.
kedro.contrib.decorators.retry(exceptions=<class 'Exception'>, n_times=1, delay_sec=0)[source]

Catches exceptions from the wrapped function at most n_times and then bundles them and propagates them.

Make sure your function does not mutate the arguments

Parameters:
  • exceptions (Exception) – The superclass of exceptions to catch. By default catch all exceptions.
  • n_times (int) – At most let the function fail n_times. The bundle the errors and propagate them. By default retry only once.
  • delay_sec (float) – Delay between failure and next retry in seconds
Return type:

Callable

Returns:

The original function with retry functionality.

kedro.contrib.decorators.spark_to_pandas()[source]

Inspects the decorated function’s inputs and converts all pySpark DataFrame inputs to pandas DataFrames.

Return type:Callable
Returns:The original function with any pySpark DF inputs translated to pandas.

Submodules

kedro.contrib.decorators.decorators module

This module contains function decorators, which can be used as Node decorators. See kedro.pipeline.node.decorate

kedro.contrib.decorators.decorators.pandas_to_spark(spark)[source]

Inspects the decorated function’s inputs and converts all pandas DataFrame inputs to spark DataFrames.

Note that in the example below we have enabled spark.sql.execution.arrow.enabled. For this to work, you should first pip install pyarrow and add pyarrow to requirements.txt. Enabling this option makes the convertion between pyspark <-> DataFrames much faster.

Parameters:spark (SparkSession) –

The spark session singleton object to use for the creation of the pySpark DataFrames. A possible pattern you can use here is the following:

spark.py

from pyspark.sql import SparkSession

def get_spark():
  return (
    SparkSession.builder
      .master("local[*]")
      .appName("kedro")
      .config("spark.driver.memory", "4g")
      .config("spark.driver.maxResultSize", "3g")
      .config("spark.sql.execution.arrow.enabled", "true")
      .getOrCreate()
    )

nodes.py

from spark import get_spark
@pandas_to_spark(get_spark())
def node_1(data):
    data.show() # data is pyspark.sql.DataFrame
Return type:Callable
Returns:The original function with any pandas DF inputs translated to spark.
kedro.contrib.decorators.decorators.retry(exceptions=<class 'Exception'>, n_times=1, delay_sec=0)[source]

Catches exceptions from the wrapped function at most n_times and then bundles them and propagates them.

Make sure your function does not mutate the arguments

Parameters:
  • exceptions (Exception) – The superclass of exceptions to catch. By default catch all exceptions.
  • n_times (int) – At most let the function fail n_times. The bundle the errors and propagate them. By default retry only once.
  • delay_sec (float) – Delay between failure and next retry in seconds
Return type:

Callable

Returns:

The original function with retry functionality.

kedro.contrib.decorators.decorators.spark_to_pandas()[source]

Inspects the decorated function’s inputs and converts all pySpark DataFrame inputs to pandas DataFrames.

Return type:Callable
Returns:The original function with any pySpark DF inputs translated to pandas.