kedro.contrib.io.azure.CSVBlobDataSet

class kedro.contrib.io.azure.CSVBlobDataSet(filepath, container_name, credentials, blob_to_text_args=None, blob_from_text_args=None, load_args=None, save_args=None)[source]

Bases: kedro.io.core.AbstractDataSet

CSVBlobDataSet loads and saves csv files in Microsoft’s Azure blob storage. It uses azure storage SDK to read and write in azure and pandas to handle the csv file locally.

Example:

import pandas as pd

data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
                     'col3': [5, 6]})

data_set = CSVBlobDataSet(filepath="test.csv",
                           bucket_name="test_bucket",
                           load_args=None,
                           save_args={"index": False})
data_set.save(data)
reloaded = data_set.load()

assert data.equals(reloaded)

Methods

CSVBlobDataSet.__init__(filepath, …[, …]) Creates a new instance of CSVBlobDataSet pointing to a concrete csv file on Azure blob storage.
CSVBlobDataSet.from_config(name, config[, …]) Create a data set instance using the configuration provided.
CSVBlobDataSet.load() Loads data by delegation to the provided load method.
CSVBlobDataSet.save(data) Saves data by delegation to the provided save method.
__init__(filepath, container_name, credentials, blob_to_text_args=None, blob_from_text_args=None, load_args=None, save_args=None)[source]

Creates a new instance of CSVBlobDataSet pointing to a concrete csv file on Azure blob storage.

Parameters:
Return type:

None

classmethod from_config(name, config, load_version=None, save_version=None)

Create a data set instance using the configuration provided.

Parameters:
  • name (str) – Data set name.
  • config (Dict[str, Any]) – Data set config dictionary.
  • load_version (Optional[str]) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
  • save_version (Optional[str]) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
Return type:

AbstractDataSet

Returns:

An instance of an AbstractDataSet subclass.

Raises:

DataSetError – When the function fails to create the data set from its config.

load()

Loads data by delegation to the provided load method.

Return type:Any
Returns:Data returned by the provided load method.
Raises:DataSetError – When underlying load method raises error.
save(data)

Saves data by delegation to the provided save method.

Parameters:data (Any) – the value to be saved by provided save method.
Raises:DataSetError – when underlying save method raises error.
Return type:None