matminer.data_retrieval package

Submodules

matminer.data_retrieval.retrieve_Citrine module

class matminer.data_retrieval.retrieve_Citrine.CitrineDataRetrieval(api_key=None)
__init__(api_key=None)
Args:
api_key: (str) Your Citrine API key, or None if you’ve set the
CITRINE_KEY environment variable

Returns: None

get_dataframe(formula=None, property=None, data_type=None, reference=None, min_measurement=None, max_measurement=None, from_record=None, data_set_id=None, max_results=None, show_columns=None)

Gets data from Citrine in a dataframe format. See client docs at http://citrineinformatics.github.io/api-documentation/ for more details on these parameters.

Args:
formula: (str) filter for the chemical formula field; only those
results that have chemical formulas that contain this string will be returned

property: (str) name of the property to search for data_type: (str) ‘EXPERIMENTAL’/’COMPUTATIONAL’/’MACHINE_LEARNING’;

filter for properties obtained from experimental work, computational methods, or machine learning.
reference: (str) filter for the reference field; only those results
that have contributors that contain this string will be returned

min_measurement: (str/num) minimum of the property value range max_measurement: (str/num) maximum of the property value range from_record: (int) index of the first record to return (indexed

from 0)

data_set_id: (int) id of the particular data set to search on max_results: (int) number of records to limit the results to show_columns: (list) list of columns to show from the resulting

dataframe

Returns: (object) Pandas dataframe object containing the results

get_value(dict_item)

Extract values from ‘Property’ objects

Args:
dict_item: ‘Property’ object
Returns:
  • if ‘value’, returns string/float
  • if ‘minimum’ or ‘maximum’, returns string
parse_matrix(matrix_column)

Parse matrix/(array of array) value items from a ‘Property’ column

Args:
matrix_column: ‘Property’ column with array of array objects

Returns: column with extracted array of arrays

parse_scalars(scalar_column)

Parse scalar/single value items from a ‘Property’ column

Args:
scalar_column: ‘Property’ column with scalar objects

Returns: column with extracted values

parse_vectors(vector_column)

Parse vector/array value items from a ‘Property’ column

Args:
vector_column: ‘Property’ column with vector objects

Returns: column with extracted arrays

matminer.data_retrieval.retrieve_MP module

class matminer.data_retrieval.retrieve_MP.MPDataRetrieval(api_key=None)

MPDataRetrieval is used to retrieve data from the Materials Project database, print the results, and convert them into an indexed Pandas dataframe.

__init__(api_key=None)
Args:
api_key: (str) Your Materials Project API key, or None if you’ve
set up your pymatgen config.
get_dataframe(criteria, properties, mp_decode=False, index_mpid=True)

Gets data from MP in a dataframe format. See API docs at https://materialsproject.org/wiki/index.php/The_Materials_API for more details.

Args:
criteria: (str/dict) see MPRester.query() for a description of this
parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
properties: (list) see MPRester.query() for a description of this
parameter. Example: [“formula”, “formation_energy_per_atom”]
mp_decode: (bool) see MPRester.query() for a description of this
parameter. Whether to decode to a Pymatgen object where possible.
index_mpid: (bool) Whether to set the materials_id as the dataframe
index.

Returns: A pandas Dataframe object

matminer.data_retrieval.retrieve_MPDS module

matminer.data_retrieval.retrieve_MongoDB module

class matminer.data_retrieval.retrieve_MongoDB.MongoDataRetrieval(coll)
__init__(coll)

Tool to retrieve data from a MongoDB collection and put into a pandas Dataframe object

Args:
coll: A MongoDB collection object
get_dataframe(projection, query=None, sort=None, limit=None, idx_field=None, strict=False)
Args:
projection: (list) - a list of str fields to grab; dot-notation is
allowed. Set to “None” to try to auto-detect the fields.

query: (JSON) - a pymongo-style query to filter data records sort: (tuple) - pymongo-style sort option limit: (int) - max number of entries idx_field: (str) - name of field to use as index (must have unique

entries)

strict: (bool) - if False, replaces missing values with NaN

matminer.data_retrieval.retrieve_MongoDB.clean_projection(projection)

Projecting on e.g. ‘a.b.’ and ‘a’ is disallowed in MongoDb, so project inclusively. See unit tests for examples of what this is doing. Args:

projection: (list) - list of fields to grab; dot-notation is allowed.
matminer.data_retrieval.retrieve_MongoDB.is_int(x)
matminer.data_retrieval.retrieve_MongoDB.remove_ints(projection)

Transforms a string like “a.1.x” to “a.x” - for Mongo projection purposes Args:

projection: (str) the projection to remove ints from

Returns:

Module contents