kedro.io.PickleS3DataSet

class kedro.io.PickleS3DataSet(filepath, bucket_name, credentials=None, load_args=None, save_args=None, version=None)[source]

PickleS3DataSet loads and saves a Python object to a pickle file on S3. The underlying functionality is supported by the pickle library, so it supports all allowed options for loading and saving pickle files.

Example:

from kedro.io import PickleLocalDataSet
import pandas as pd

dummy_data =  pd.DataFrame({'col1': [1, 2],
                            'col2': [4, 5],
                            'col3': [5, 6]})
data_set = PickleS3DataSet(filepath="data.pkl",
                           bucket_name="test_bucket",
                           load_args=None,
                           save_args=None)
data_set.save(dummy_data)
reloaded = data_set.load()
__init__(filepath, bucket_name, credentials=None, load_args=None, save_args=None, version=None)[source]

Creates a new instance of PickleS3DataSet pointing to a concrete file on S3. PickleS3DataSet uses pickle backend to serialise objects to disk:

pickle.dumps: https://docs.python.org/3/library/pickle.html#pickle.dumps

and to load serialised objects into memory:

pickle.loads: https://docs.python.org/3/library/pickle.html#pickle.loads

Parameters:
  • filepath (str) – path to a pkl file.
  • bucket_name (str) – S3 bucket name.
  • credentials (Optional[Dict[str, Any]]) – Credentials to access the S3 bucket, such as aws_access_key_id, aws_secret_access_key.
  • load_args (Optional[Dict[str, Any]]) – Options for loading pickle files. Refer to the help file of pickle.loads for options.
  • save_args (Optional[Dict[str, Any]]) – Options for saving pickle files. Refer to the help file of pickle.dumps for options.
  • version (Optional[Version]) – If specified, should be an instance of kedro.io.core.Version. If its load attribute is None, the latest version will be loaded. If its save attribute is None, save version will be autogenerated.
Return type:

None

Methods

__init__(filepath, bucket_name[, …]) Creates a new instance of PickleS3DataSet pointing to a concrete file on S3.
exists() Checks whether a data set’s output already exists by calling the provided _exists() method.
from_config(name, config[, load_version, …]) Create a data set instance using the configuration provided.
load() Loads data by delegation to the provided load method.
save(data) Saves data by delegation to the provided save method.