kedro.contrib.io.catalog_with_default.DataCatalogWithDefault¶
-
class
kedro.contrib.io.catalog_with_default.
DataCatalogWithDefault
(data_sets=None, default=None, remember=False)[source]¶ Bases:
kedro.io.data_catalog.DataCatalog
A
DataCatalog
with a defaultDataSet
implementation for any data set which is not registered in the catalog.Methods
DataCatalogWithDefault.__init__
([data_sets, …])A DataCatalog
with a defaultDataSet
implementation for any data set which is not registered in the catalog.DataCatalogWithDefault.add
(data_set_name, …)Adds a new AbstractDataSet
object to theDataCatalog
.DataCatalogWithDefault.add_all
(data_sets[, …])Adds a group of new data sets to the DataCatalog
.DataCatalogWithDefault.add_feed_dict
(feed_dict)Adds instances of MemoryDataSet
, containing the data provided through feed_dict.DataCatalogWithDefault.add_transformer
(…)Add a DataSet
Transformer to the:class:~kedro.io.DataCatalog.DataCatalogWithDefault.confirm
(name)Confirm a DataSet by its name DataCatalogWithDefault.exists
(name)Checks whether registered data set exists by calling its exists() method. DataCatalogWithDefault.from_config
(catalog)To create a DataCatalogWithDefault
from configuration, please use: .DataCatalogWithDefault.from_data_catalog
(…)Convenience factory method to create a DataCatalogWithDefault
from aDataCatalog
DataCatalogWithDefault.list
()List of DataSet
names registered in the catalog.DataCatalogWithDefault.load
(name[, version])Loads a registered data set DataCatalogWithDefault.release
(name)Release any cached data associated with a data set DataCatalogWithDefault.save
(name, data)Save data to a registered data set. DataCatalogWithDefault.shallow_copy
()Returns a shallow copy of the current object. -
__init__
(data_sets=None, default=None, remember=False)[source]¶ A
DataCatalog
with a defaultDataSet
implementation for any data set which is not registered in the catalog.Parameters: - data_sets (
Optional
[Dict
[str
,AbstractDataSet
]]) – A dictionary of data set names and data set instances. - default (
Optional
[Callable
[[str
],AbstractDataSet
]]) – A callable which accepts a single argument of type string, the key of the data set, and returns anAbstractDataSet
.load
andsave
calls on data sets which are not registered to the catalog will be delegated to thisAbstractDataSet
. - remember (
bool
) – If True, then store in the catalog anyAbstractDataSet
s provided by thedefault
callable argument. Useful when one want to transition from aDataCatalogWithDefault
to aDataCatalog
: just callDataCatalogWithDefault.to_yaml
, after all required data sets have been saved/loaded, and use the generated YAML file with a newDataCatalog
.
Raises: TypeError
– If default is not a callable.Example:
from kedro.io import CSVLocalDataSet def default_data_set(name): return CSVLocalDataSet(filepath='data/01_raw/' + name) io = DataCatalog(data_sets={}, default=default_data_set) # load the file in data/raw/cars.csv df = io.load("cars.csv")
- data_sets (
-
add
(data_set_name, data_set, replace=False)¶ Adds a new
AbstractDataSet
object to theDataCatalog
.Parameters: - data_set_name (
str
) – A unique data set name which has not been registered yet. - data_set (
AbstractDataSet
) – A data set object to be associated with the given data set name. - replace (
bool
) – Specifies whether to replace an existingDataSet
with the same name is allowed.
Raises: DataSetAlreadyExistsError
– When a data set with the same name has already been registered.Example:
from kedro.extras.datasets.pandas import CSVDataSet io = DataCatalog(data_sets={ 'cars': CSVDataSet(filepath="cars.csv") }) io.add("boats", CSVDataSet(filepath="boats.csv"))
Return type: None
- data_set_name (
-
add_all
(data_sets, replace=False)¶ Adds a group of new data sets to the
DataCatalog
.Parameters: - data_sets (
Dict
[str
,AbstractDataSet
]) – A dictionary ofDataSet
names and data set instances. - replace (
bool
) – Specifies whether to replace an existingDataSet
with the same name is allowed.
Raises: DataSetAlreadyExistsError
– When a data set with the same name has already been registered.Example:
from kedro.extras.datasets.pandas import CSVDataSet, ParquetDataSet io = DataCatalog(data_sets={ "cars": CSVDataSet(filepath="cars.csv") }) additional = { "planes": ParquetDataSet("planes.parq"), "boats": CSVDataSet(filepath="boats.csv") } io.add_all(additional) assert io.list() == ["cars", "planes", "boats"]
Return type: None
- data_sets (
-
add_feed_dict
(feed_dict, replace=False)¶ Adds instances of
MemoryDataSet
, containing the data provided through feed_dict.Parameters: - feed_dict (
Dict
[str
,Any
]) – A feed dict with data to be added in memory. - replace (
bool
) – Specifies whether to replace an existingDataSet
with the same name is allowed.
Example:
import pandas as pd df = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5], 'col3': [5, 6]}) io = DataCatalog() io.add_feed_dict({ 'data': df }, replace=True) assert io.load("data").equals(df)
Return type: None
- feed_dict (
-
add_transformer
(transformer, data_set_names=None)¶ Add a
DataSet
Transformer to the:class:~kedro.io.DataCatalog. Transformers can modify the way Data Sets are loaded and saved.Parameters: - transformer (
AbstractTransformer
) – The transformer instance to add. - data_set_names (
Union
[str
,Iterable
[str
],None
]) – The Data Sets to add the transformer to. Or None to add the transformer to all Data Sets.
Raises: DataSetNotFoundError
– When a transformer is being added to a non existent data set.TypeError
– When transformer isn’t an instance ofAbstractTransformer
- transformer (
-
confirm
(name)¶ Confirm a DataSet by its name
Return type: None
-
exists
(name)¶ Checks whether registered data set exists by calling its exists() method. Raises a warning and returns False if exists() is not implemented.
Parameters: name ( str
) – A data set to be checked.Return type: bool
Returns: Whether the data set output exists. Raises: DataSetNotFoundError
– When a data set with the given name has not yet been registered.
-
classmethod
from_config
(catalog, credentials=None, load_versions=None, save_version=None, journal=None)[source]¶ To create a
DataCatalogWithDefault
from configuration, please use:DataCatalogWithDefault.from_data_catalog( DataCatalog.from_config(catalog, credentials))
Parameters: - catalog (
Optional
[Dict
[str
,Dict
[str
,Any
]]]) – SeeDataCatalog.from_config
- credentials (
Optional
[Dict
[str
,Dict
[str
,Any
]]]) – SeeDataCatalog.from_config
- load_versions (
Optional
[Dict
[str
,str
]]) – SeeDataCatalog.from_config
- save_version (
Optional
[str
]) – SeeDataCatalog.from_config
- journal (
Optional
[Journal
]) – SeeDataCatalog.from_config
Raises: ValueError
– If you try to instantiate aDataCatalogWithDefault
- directly with this method.
- catalog (
-
classmethod
from_data_catalog
(data_catalog, default)[source]¶ Convenience factory method to create a
DataCatalogWithDefault
from aDataCatalog
A
DataCatalog
with a defaultDataSet
implementation for any data set which is not registered in the catalog.Parameters: - data_catalog (
DataCatalog
) – TheDataCatalog
to convert to aDataCatalogWithDefault
. - default (
Callable
[[str
],AbstractDataSet
]) – A callable which accepts a single argument of type string, the key of the data set, and returns anAbstractDataSet
.load
andsave
calls on data sets which are not registered to the catalog will be delegated to thisAbstractDataSet
.
Return type: DataCatalogWithDefault
Returns: A new
DataCatalogWithDefault
which contains all theAbstractDataSets
from the provided data-catalog.- data_catalog (
-
list
()¶ List of
DataSet
names registered in the catalog.Return type: List
[str
]Returns: A List of DataSet
names, corresponding to the entries that are registered in the current catalog object.
-
load
(name, version=None)[source]¶ Loads a registered data set
Parameters: - name (
str
) – A data set to be loaded. - version (
Optional
[str
]) – Optional version to be loaded.
Return type: Any
Returns: The loaded data as configured.
Raises: DataSetNotFoundError
– When a data set with the given name has not yet been registered.- name (
-
release
(name)¶ Release any cached data associated with a data set
Parameters: name ( str
) – A data set to be checked.Raises: DataSetNotFoundError
– When a data set with the given name has not yet been registered.
-