matminer.datasets package¶
Submodules¶
matminer.datasets.convenience_loaders module¶
matminer.datasets.dataset_retrieval module¶
-
matminer.datasets.dataset_retrieval.
available_datasets
(print_datasets=True, print_descriptions=True, sort_method='alphabetical')¶ Function for retrieving the datasets available within matminer.
- Args:
- print_datasets (bool): Whether to, along with returning a
- list of dataset names, also print info on each dataset
- print_descriptions (bool): Whether to print the description of the
- dataset along with the name. Ignored if print_datasets is False
- sort_method (str): By what metric to sort the datasets when retrieving
their information.
alphabetical: sorts by dataset name, num_entries: sorts by number of dataset entries
Returns: (list)
-
matminer.datasets.dataset_retrieval.
load_dataset
(name, data_home=None, download_if_missing=True, include_metadata=False)¶ Loads a dataframe containing the dataset specified with the ‘name’ field.
Dataset file is stored/loaded from data_home if specified, otherwise at the MATMINER_DATA environment variable if set or at matminer/datasets by default.
- Args:
- name (str): keyword specifying what dataset to load, run
- matminer.datasets.available_datasets() for options
data_home (str): path to folder to look for dataset file
- download_if_missing (bool): whether to download the dataset if is not
- found on disk
- include_metadata (bool): optional argument for some datasets with
- metadata fields
Returns: (pd.DataFrame)