kedro.contrib.decorators package¶
This module contains function decorators, which can be used as Node
decorators. See kedro.pipeline.node.decorate
-
kedro.contrib.decorators.
pandas_to_spark
(spark)[source]¶ Inspects the decorated function’s inputs and converts all pandas DataFrame inputs to spark DataFrames.
Note that in the example below we have enabled
spark.sql.execution.arrow.enabled
. For this to work, you should firstpip install pyarrow
and addpyarrow
torequirements.txt
. Enabling this option makes the convertion between pyspark <-> DataFrames much faster.Parameters: spark ( SparkSession
) –The spark session singleton object to use for the creation of the pySpark DataFrames. A possible pattern you can use here is the following:
spark.py
from pyspark.sql import SparkSession def get_spark(): return ( SparkSession.builder .master("local[*]") .appName("kedro") .config("spark.driver.memory", "4g") .config("spark.driver.maxResultSize", "3g") .config("spark.sql.execution.arrow.enabled", "true") .getOrCreate() )
nodes.py
from spark import get_spark @pandas_to_spark(get_spark()) def node_1(data): data.show() # data is pyspark.sql.DataFrame
Return type: Callable
Returns: The original function with any pandas DF inputs translated to spark.
-
kedro.contrib.decorators.
retry
(exceptions=<class 'Exception'>, n_times=1, delay_sec=0)[source]¶ Catches exceptions from the wrapped function at most n_times and then bundles them and propagates them.
Make sure your function does not mutate the arguments
Parameters: - exceptions (
Exception
) – The superclass of exceptions to catch. By default catch all exceptions. - n_times (
int
) – At most let the function fail n_times. The bundle the errors and propagate them. By default retry only once. - delay_sec (
float
) – Delay between failure and next retry in seconds
Return type: Callable
Returns: The original function with retry functionality.
- exceptions (
-
kedro.contrib.decorators.
spark_to_pandas
()[source]¶ Inspects the decorated function’s inputs and converts all pySpark DataFrame inputs to pandas DataFrames.
Return type: Callable
Returns: The original function with any pySpark DF inputs translated to pandas.
Submodules¶
kedro.contrib.decorators.decorators module¶
This module contains function decorators, which can be used as Node
decorators. See kedro.pipeline.node.decorate
-
kedro.contrib.decorators.decorators.
pandas_to_spark
(spark)[source]¶ Inspects the decorated function’s inputs and converts all pandas DataFrame inputs to spark DataFrames.
Note that in the example below we have enabled
spark.sql.execution.arrow.enabled
. For this to work, you should firstpip install pyarrow
and addpyarrow
torequirements.txt
. Enabling this option makes the convertion between pyspark <-> DataFrames much faster.Parameters: spark ( SparkSession
) –The spark session singleton object to use for the creation of the pySpark DataFrames. A possible pattern you can use here is the following:
spark.py
from pyspark.sql import SparkSession def get_spark(): return ( SparkSession.builder .master("local[*]") .appName("kedro") .config("spark.driver.memory", "4g") .config("spark.driver.maxResultSize", "3g") .config("spark.sql.execution.arrow.enabled", "true") .getOrCreate() )
nodes.py
from spark import get_spark @pandas_to_spark(get_spark()) def node_1(data): data.show() # data is pyspark.sql.DataFrame
Return type: Callable
Returns: The original function with any pandas DF inputs translated to spark.
-
kedro.contrib.decorators.decorators.
retry
(exceptions=<class 'Exception'>, n_times=1, delay_sec=0)[source]¶ Catches exceptions from the wrapped function at most n_times and then bundles them and propagates them.
Make sure your function does not mutate the arguments
Parameters: - exceptions (
Exception
) – The superclass of exceptions to catch. By default catch all exceptions. - n_times (
int
) – At most let the function fail n_times. The bundle the errors and propagate them. By default retry only once. - delay_sec (
float
) – Delay between failure and next retry in seconds
Return type: Callable
Returns: The original function with retry functionality.
- exceptions (