matminer.data_retrieval package

Submodules

matminer.data_retrieval.retrieve_Citrine module

class matminer.data_retrieval.retrieve_Citrine.CitrineDataRetrieval(api_key=None)

Bases: object

CitrineDataRetrieval is used to retrieve data from the Citrination database. See API client docs at http://citrineinformatics.github.io/api-documentation/

__init__(api_key=None)
Args:
api_key: (str) Your Citrine API key, or None if
you’ve set the CITRINE_KEY environment variable

Returns: None

get_api_data(formula=None, prop=None, data_type=None, reference=None, min_measurement=None, max_measurement=None, from_record=None, data_set_id=None, max_results=None)

Gets raw api data from Citrine in json format. See client docs at http://citrineinformatics.github.io/api-documentation/ for more details on these parameters.

Args:
formula: (str) filter for the chemical formula field; only those
results that have chemical formulas that contain this string will be returned

prop: (str) name of the property to search for data_type: (str) ‘EXPERIMENTAL’/’COMPUTATIONAL’/’MACHINE_LEARNING’;

filter for properties obtained from experimental work, computational methods, or machine learning.
reference: (str) filter for the reference field; only those
results that have contributors that contain this string will be returned

min_measurement: (str/num) minimum of the property value range max_measurement: (str/num) maximum of the property value range from_record: (int) index of first record to return (indexed from 0) data_set_id: (int) id of the particular data set to search on max_results: (int) number of records to limit the results to

Returns: (list) of jsons/pifs returned by Citrine’s API

get_dataframe(formula=None, prop=None, data_type=None, reference=None, min_measurement=None, max_measurement=None, from_record=None, data_set_id=None, max_results=None, show_columns=None)

Gets a Pandas dataframe object from data retrieved from the Citrine API. See client docs at http://citrineinformatics.github.io/api-documentation/ for more details on input parameters.

Args:
formula: (str) filter for the chemical formula field; only those
results that have chemical formulas that contain this string will be returned

prop: (str) name of the property to search for data_type: (str) ‘EXPERIMENTAL’/’COMPUTATIONAL’/’MACHINE_LEARNING’;

filter for properties obtained from experimental work, computational methods, or machine learning.
reference: (str) filter for the reference field; only those
results that have contributors that contain this string will be returned

min_measurement: (str/num) minimum of the property value range max_measurement: (str/num) maximum of the property value range from_record: (int) index of first record to return (indexed from 0) data_set_id: (int) id of the particular data set to search on max_results: (int) number of records to limit the results to show_columns: (list) list of columns to show from the

resulting dataframe

Returns: (object) Pandas dataframe object containing the results

matminer.data_retrieval.retrieve_Citrine.get_value(dict_item)
matminer.data_retrieval.retrieve_Citrine.parse_scalars(scalars)

matminer.data_retrieval.retrieve_MDF module

class matminer.data_retrieval.retrieve_MDF.MDFDataRetrieval(anonymous=False, **kwargs)

Bases: object

MDFDataRetrieval is used to retrieve data from the Materials Data Facility database and convert them into a Pandas dataframe. Note that invocation with full access to MDF will require authentication via https://materialsdatafacility.org/, but an anonymous mode is supported, which can be used with anonymous=True as a keyword arg.

Examples:

>>>mdf_dr = MDFDataRetrieval(anonymous=True) >>>results = mdf_dr.get_dataframe(elements=[“Ag”, “Be”], sources=[“oqmd”])

>>>results = mdf_dr.get_dataframe(sources=[‘oqmd’], >>> match_ranges={“oqmd.band_gap.value”: [4.0, “*”]})

__init__(anonymous=False, **kwargs)
Args:
anonymous (bool): whether to use anonymous login (i. e. no
globus authentication)
**kwargs: kwargs for Forge, including index (globus search index
to search on), local_ep, anonymous
get_dataframe(sources=None, elements=None, titles=None, tags=None, resource_types=None, match_fields=None, exclude_fields=None, match_ranges=None, exclude_ranges=None, unwind_arrays=True)

Retrieves data from the MDF API and formats it as a Pandas Dataframe

Args:

sources ([str]): source names to include, e. g. [“oqmd”] elements ([str]): elements to include, e. g. [“Ag”, “Si”] titles ([str]): titles to include, e. g. [“Coarsening of a semisolid

Al-Cu alloy”]

tags ([str]): tags to include, e. g. [“outcar”] resource_types ([str]): resources to include, e. g. [“record”] match_fields ({}): field-value mappings to include, e. g.

{“oqdm.converged”: True}
exclude_fields ({}): field-value mappings to exclude, e. g.
{“oqdm.converged”: False}
match_ranges ({}): field-range mappings to include, e. g.
{“oqdm.band_gap.value”: [1, 5]}, use “*” for no lower or upper bound, e. g. {“oqdm.band_gap.value”: [1, “*”]},
exclude_ranges ({}): field-range mapping to exclude,
{“oqdm.band_gap.value”: [3, “*”]} to exclude all results with band gap higher than 3.
raw (bool): whether or not to return raw (non-dataframe)
output, defaults to False
unwind_arrays (bool): whether or not to unwind arrays in
flattening docs for dataframe
Returns:
DataFrame corresponding to all documents from aggregated query
get_dataframe_by_query(query, unwind_arrays=True, **kwargs)

Gets a dataframe from the MDF API from an explicit string query (rather than input args like get_dataframe).

Args:

query (str): String for explicit query unwind_arrays (bool): whether or not to unwind arrays in

flattening docs for dataframe

**kwargs: kwargs for query

Returns:
dataframe corresponding to query
matminer.data_retrieval.retrieve_MDF.make_dataframe(docs, unwind_arrays=True)

Formats raw docs returned from MDF API search into a dataframe

Args:
docs [{}]: list of documents from forge search
or aggregation

Returns: DataFrame corresponding to formatted docs

matminer.data_retrieval.retrieve_MP module

class matminer.data_retrieval.retrieve_MP.MPDataRetrieval(api_key=None)

Bases: object

MPDataRetrieval is used to retrieve data from the Materials Project database, print the results, and convert them into an indexed Pandas dataframe.

__init__(api_key=None)
Args:
api_key: (str) Your Materials Project API key, or None if you’ve
set up your pymatgen config.
get_dataframe(criteria, properties, mp_decode=False, index_mpid=True)

Gets data from MP in a dataframe format. See API docs at https://materialsproject.org/wiki/index.php/The_Materials_API for more details.

Args:
criteria: (str/dict) see MPRester.query() for a description of this
parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
properties: (list) see MPRester.query() for a description of this
parameter. Example: [“formula”, “formation_energy_per_atom”]
mp_decode: (bool) see MPRester.query() for a description of this
parameter. Whether to decode to a Pymatgen object where possible.
index_mpid: (bool) Whether to set the materials_id as the dataframe
index.

Returns: A pandas Dataframe object

matminer.data_retrieval.retrieve_MPDS module

matminer.data_retrieval.retrieve_MongoDB module

class matminer.data_retrieval.retrieve_MongoDB.MongoDataRetrieval(coll)

Bases: object

__init__(coll)

Tool to retrieve data from a MongoDB collection and put into a pandas Dataframe object

Args:
coll: A MongoDB collection object
get_dataframe(projection, query=None, sort=None, limit=None, idx_field=None, strict=False)
Args:
projection: (list) - a list of str fields to retrieve; dot-notation is
allowed. Set to “None” to try to auto-detect the fields.

query: (JSON) - a pymongo-style query to filter data records sort: (tuple) - pymongo-style sort option limit: (int) - max number of entries idx_field: (str) - name of field to use as index (must have unique

entries)

strict: (bool) - if False, replaces missing values with NaN

matminer.data_retrieval.retrieve_MongoDB.clean_projection(projection)

Projecting on e.g. ‘a.b.’ and ‘a’ is disallowed in MongoDb, so project inclusively. See unit tests for examples of what this is doing. Args:

projection: (list) - list of fields to retrieve; dot-notation is allowed.
matminer.data_retrieval.retrieve_MongoDB.is_int(x)
matminer.data_retrieval.retrieve_MongoDB.remove_ints(projection)

Transforms a string like “a.1.x” to “a.x” - for Mongo projection purposes Args:

projection: (str) the projection to remove ints from

Returns:

Module contents