matminer.data_retrieval package¶
Subpackages¶
- matminer.data_retrieval.tests package
- Submodules
- matminer.data_retrieval.tests.base module
- matminer.data_retrieval.tests.test_retrieve_AFLOW module
- matminer.data_retrieval.tests.test_retrieve_Citrine module
- matminer.data_retrieval.tests.test_retrieve_MDF module
- matminer.data_retrieval.tests.test_retrieve_MP module
- matminer.data_retrieval.tests.test_retrieve_MPDS module
- matminer.data_retrieval.tests.test_retrieve_MongoDB module
- Module contents
Submodules¶
matminer.data_retrieval.retrieve_AFLOW module¶
matminer.data_retrieval.retrieve_Citrine module¶
matminer.data_retrieval.retrieve_MDF module¶
matminer.data_retrieval.retrieve_MP module¶
-
class
matminer.data_retrieval.retrieve_MP.
MPDataRetrieval
(api_key=None)¶ Bases:
matminer.data_retrieval.retrieve_base.BaseDataRetrieval
Retrieves data from the Materials Project database.
If you use this data retrieval class, please additionally cite:
Ong, S.P., Cholia, S., Jain, A., Brafman, M., Gunter, D., Ceder, G., Persson, K.A., 2015. The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science 97, 209–215. https://doi.org/10.1016/j.commatsci.2014.10.037
-
__init__
(api_key=None)¶ - Args:
- api_key: (str) Your Materials Project API key, or None if you’ve
set up your pymatgen config.
-
api_link
()¶ The link to comprehensive API documentation or data source.
- Returns:
(str): A link to the API documentation for this DataRetrieval class.
-
citations
()¶ Retrieve a list of formatted strings of bibtex citations which should be cited when using a data retrieval method.
- Returns:
([str]): Bibtext formatted entries
-
get_data
(criteria, properties, mp_decode=True, index_mpid=True)¶ - Args:
- criteria: (str/dict) see MPRester.query() for a description of this
parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
- properties: (list) see MPRester.query() for a description of this
parameter. Example: [“formula”, “formation_energy_per_atom”]
- mp_decode: (bool) see MPRester.query() for a description of this
parameter. Whether to decode to a Pymatgen object where possible.
- index_mpid: (bool) Whether to set the materials_id as the dataframe
index.
- Returns ([dict]):
a list of jsons that match the criteria and contain properties
-
get_dataframe
(criteria, properties, index_mpid=True, **kwargs)¶ Gets data from MP in a dataframe format. See api_link for more details.
- Args:
criteria (dict): the same as in get_data properties ([str]): the same properties supported as in get_data
plus: “structure”, “initial_structure”, “final_structure”, “bandstructure” (line mode), “bandstructure_uniform”, “phonon_bandstructure”, “phonon_ddb”, “phonon_bandstructure”, “phonon_dos”. Note that for a long list of compounds, it may take a long time to retrieve some of these objects.
index_mpid (bool): the same as in get_data kwargs (dict): the same keyword arguments as in get_data
Returns (pandas.Dataframe):
-
try_get_prop_by_material_id
(prop, material_id_list, **kwargs)¶ Call the relevant get_prop_by_material_id. “prop” is a property such as bandstructure that is not readily available in supported properties of the get_data function but via the get_bandstructure_by_material_id method for example.
- Args:
- prop (str): the name of the property. Options are:
“bandstructure”, “dos”, “phonon_dos”, “phonon_bandstructure”, “phonon_ddb”
material_id_list ([str]): list of material_id of compounds kwargs (dict): other keyword arguments that get_*_by_material_id
may have; e.g. line_mode in get_bandstructure_by_material_id
- Returns ([target prop object or NaN]):
If the target property is not available for a certain material_id, NaN is returned.
-
matminer.data_retrieval.retrieve_MPDS module¶
matminer.data_retrieval.retrieve_MongoDB module¶
-
class
matminer.data_retrieval.retrieve_MongoDB.
MongoDataRetrieval
(coll)¶ Bases:
matminer.data_retrieval.retrieve_base.BaseDataRetrieval
-
__init__
(coll)¶ Retrieves data from a MongoDB collection to a pandas.Dataframe object
- Args:
coll: A MongoDB collection object
-
api_link
()¶ The link to comprehensive API documentation or data source.
- Returns:
(str): A link to the API documentation for this DataRetrieval class.
-
get_dataframe
(criteria, properties=None, limit=0, sort=None, idx_field=None, strict=False)¶ - Args:
criteria: (dict) - a pymongo-style query to filter data records properties: ([str] or None) - a list of str fields to retrieve;
dot-notation is allowed (e.g. “structure.lattice.a”). Set to “None” to try to auto-detect the fields.
limit: (int) - max number of entries. 0 means no limit sort: (tuple) - pymongo-style sort option idx_field: (str) - name of field to use as index (must have unique
entries)
strict: (bool) - if False, replaces missing values with NaN
Returns (pandas.DataFrame):
-
-
matminer.data_retrieval.retrieve_MongoDB.
clean_projection
(projection)¶ Projecting on e.g. ‘a.b.’ and ‘a’ is disallowed in MongoDb, so project inclusively. See unit tests for examples of what this is doing.
- Args:
projection: (list) - list of fields to retrieve; dot-notation is allowed.
-
matminer.data_retrieval.retrieve_MongoDB.
is_int
(x)¶
-
matminer.data_retrieval.retrieve_MongoDB.
remove_ints
(projection)¶ Transforms a string like “a.1.x” to “a.x” - for Mongo projection purposes
- Args:
projection: (str) the projection to remove ints from
Returns (str)
matminer.data_retrieval.retrieve_base module¶
-
class
matminer.data_retrieval.retrieve_base.
BaseDataRetrieval
¶ Bases:
object
Abstract class to retrieve data from various material APIs while adhering to a quasi-standard format for querying.
## Implementing a new DataRetrieval class
If you have an API which you’d like to incorporate into matminer’s data retrieval tools, using BaseDataRetrieval is the preferred way of doing so. All DataRetrieval classes should subclass BaseDataRetrieval and implement the following:
get_dataframe()
api_link()
Retrieving data should be done by the user with get_dataframe. Criteria should be a dictionary which will be used to form a query to the database. Properties should be a list which defines the columns that will be returned. While the ‘criteria’ and ‘properties’ arguments may have different valid values depending on the database, they should always have sensible formats and names if possible. For example, the user should be calling this:
- df = MyDataRetrieval().get_dataframe(criteria={‘band_gap’: 0.0},
properties=[‘structure’])
…or this:
- df = MyDataRetrieval().get_dataframe(criteria={‘band_gap’: [0.0, 0.15]},
properties=[“density of states”])
NOT this:
- df = MyDataRetrieval().get_dataframe(criteria={‘query.bg[0] && band_gap’: 0.0},
properties=[‘Struct.page[Value]’])
The implemented DataRetrieval class should handle the conversion from a ‘sensible’ query to a query fit for the individual API and database.
There may be cases where a ‘sensible’ query is not sufficient to define a query to the API; in this case, use the get_dataframe kwargs sparingly to augment the criteria, properties, or form of the underlying API query.
A method for accessing raw DB data with an API-native query may be provided by overriding get_data. The link to the original API documentation must be provided by overriding api_link().
## Documenting a DataRetrieval class
The class documentation for each DataRetrieval class must contain a brief description of the possible data that can be retrieved with the API source. It should also detail the form of the criteria and properties that can be retrieved with the class, and/or should link to a web page showing this information. The options of the class must all be defined in the __init__ function of the class, and we recommend documenting them using the [Google style](https://google.github.io/styleguide/pyguide.html).
-
api_link
()¶ The link to comprehensive API documentation or data source.
- Returns:
(str): A link to the API documentation for this DataRetrieval class.
-
citations
()¶ Retrieve a list of formatted strings of bibtex citations which should be cited when using a data retrieval method.
- Returns:
([str]): Bibtext formatted entries
-
get_dataframe
(criteria, properties, **kwargs)¶ Retrieve a dataframe of properties from the database which satisfy criteria.
- Args:
- criteria (dict): The name of each criterion is the key; the value
or range of the criterion is the value.
- properties (list): Properties to return from the query matching
the criteria. For example, [‘structure’, ‘formula’]
- Returns:
- (pandas DataFrame) The dataframe containing properties as columns
and samples as rows.