matminer.featurizers package¶
Subpackages¶
Submodules¶
matminer.featurizers.bandstructure module¶
-
matminer.featurizers.bandstructure.
absolute_band_positions_bpe
(bs, target_gap=None, **kwargs)¶ Absolute VBM and CBM positions with respect to branch point energy
- Args:
- bs: Bandstructure object target_gap: if a better band gap is known, shift band positions by this gap **kwargs: arguments to feed into branch point energy code
- Returns:
- (vbm, cbm) - tuple of floats
-
matminer.featurizers.bandstructure.
branch_point_energy
(bs, n_vb=1, n_cb=1)¶ Get the branch point energy as defined by: Schleife, Fuchs, Rodi, Furthmuller, Bechstedt, APL 94, 012104 (2009)
- Args:
- bs: (BandStructure) - uniform mesh bandstructure object n_vb: number of valence bands to include n_cb: number of conduction bands to include
Returns: (int) branch point energy on same energy scale as BS eigenvalues
matminer.featurizers.base module¶
-
class
matminer.featurizers.base.
BaseFeaturizer
¶ Bases:
object
Abstract class to calculate attributes for compounds
-
citations
()¶ Citation / reference for feature
- Returns:
- array - each element should be str citation, ideally in BibTeX format
-
feature_labels
()¶ Generate attribute names
- Returns:
- list of strings for attribute labels
-
featurize
(x)¶ Main featurizer function. Only defined in feature subclasses.
- Args:
- x: input data to featurize (type depends on featurizer)
- Returns:
- list of one or more features
-
featurize_dataframe
(df, col_id)¶ Compute features for all entries contained in input dataframe
- Args:
- df (Pandas dataframe): Dataframe containing input data col_id (string): column label containing objects to featurize
- Returns:
- updated Dataframe
-
implementors
()¶ List of implementors of the feature.
- Returns:
- array - each element should either be str with author name (e.g., “Anubhav Jain”) or
- dict with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).
-
matminer.featurizers.composition module¶
-
class
matminer.featurizers.composition.
BandCenter
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
feature_labels
()¶
-
featurize
(comp)¶ Estimate absolution position of band center using geometric mean of electronegativity Ref: Butler, M. a. & Ginley, D. S. Prediction of Flatband Potentials at Semiconductor-Electrolyte Interfaces from Atomic Electronegativities. J. Electrochem. Soc. 125, 228 (1978).
- Args:
- comp: (Composition)
Returns: (float) band center
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
CohesiveEnergy
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
feature_labels
()¶
-
featurize
(comp)¶ Get cohesive energy of compound by subtracting elemental cohesive energies from the formation energy of the compund. Elemental cohesive energies are taken from http://www. knowledgedoor.com/2/elements_handbook/cohesive_energy.html. Most of them are taken from “Charles Kittel: Introduction to Solid State Physics, 8th edition. Hoboken, NJ: John Wiley & Sons, Inc, 2005, p. 50.”
- Args:
- comp: (str) compound composition, eg: “NaCl”
Returns: (float) cohesive energy of compound
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
ElectronAffinity
(data_source=<matminer.featurizers.data.DemlData object>)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate average electron affinity times formal charge of anion elements
- Parameters:
- data_source (data class): source from which to retrieve element data
Generates average (electron affinity*formal charge) of anions
-
__init__
(data_source=<matminer.featurizers.data.DemlData object>)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- avg_anion_affin (single-element list): average electron affinity*formal charge of anions
-
implementors
()¶
-
class
matminer.featurizers.composition.
ElectronegativityDiff
(data_source=<matminer.featurizers.data.DemlData object>, stats=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate average electronegativity difference
- Parameters:
- data_source (data class): source from which to retrieve element data stats: Property statistics to compute
Generates average electronegativity difference between cations and anions
-
__init__
(data_source=<matminer.featurizers.data.DemlData object>, stats=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- en_diff_stats (list of floats): Property stats of electronegativity difference
-
implementors
()¶
-
class
matminer.featurizers.composition.
ElementFraction
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate the atomic fraction of each element in a composition.
Generates: vector where each index represents an element in atomic number order.
-
__init__
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- vector (list of floats): fraction of each element in a composition
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
ElementProperty
(method='magpie', stats=None, attributes=None, data_source=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate elemental property attributes
- Parameters:
- attributes (list of strings): List of elemental properties to use method (string): pre-packaged sets of property sets to compute data_source (data object): source from which to retrieve element property data
-
__init__
(method='magpie', stats=None, attributes=None, data_source=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Get elemental property attributes
- Args:
- comp: Pymatgen composition object
- Returns:
- all_attributes: Specified property statistics of descriptors
-
implementors
()¶
-
class
matminer.featurizers.composition.
FERECorrection
(data_source=<matminer.featurizers.data.DemlData object>, stats=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate difference between fitted elemental-phase reference energy (FERE) and GGA+U energy
- Parameters:
- data_source (data class): source from which to retrieve element data stats: Property statistics to compute
Generates: Property statistics of difference between FERE and GGA+U energy
-
__init__
(data_source=<matminer.featurizers.data.DemlData object>, stats=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- fere_corr_stats (list of floats): Property stats of FERE Correction
-
implementors
()¶
-
class
matminer.featurizers.composition.
IonProperty
(data_source=<matminer.featurizers.data.MagpieData object>)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate ionic property attributes
- Parameters:
- data_source (data class): source from which to retrieve element data
-
__init__
(data_source=<matminer.featurizers.data.MagpieData object>)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Ionic character attributes
- Args:
- comp: Pymatgen composition object
- Returns:
- cpd_possible (bool): Indicates if a neutral ionic compound is possible max_ionic_char (float): Maximum ionic character between two atoms avg_ionic_char (float): Average ionic character
-
implementors
()¶
-
class
matminer.featurizers.composition.
Stoichiometry
(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate stoichiometric attributes.
- Parameters:
- p_list (list of ints): list of norms to calculate num_atoms (bool): whether to return number of atoms
-
__init__
(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Get stoichiometric attributes Args:
comp: Pymatgen composition object p_list (list of ints)- Returns:
- p_norm (list of floats): Lp norm-based stoichiometric attributes.
- Returns number of atoms if no p-values specified.
-
implementors
()¶
-
class
matminer.featurizers.composition.
TMetalFraction
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate fraction of magnetic transition metals in a composition.
- Parameters:
- data_source (data class): source from which to retrieve element data
Generates: Fraction of magnetic transition metal atoms in a compound
-
__init__
()¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- frac_magn_atoms (single-element list): fraction of magnetic transitional metal atoms in a compound
-
implementors
()¶
-
class
matminer.featurizers.composition.
ValenceOrbital
(data_source=<matminer.featurizers.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate valence orbital attributes
- Parameters:
data_source (data object): source from which to retrieve element data orbitals (list): orbitals to calculate props (list): specifies whether to return average number of electrons in each orbital,
fraction of electrons in each orbital, or both
-
__init__
(data_source=<matminer.featurizers.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Weighted fraction of valence electrons in each orbital
- Args:
- comp: Pymatgen composition object
- Returns:
- valence_attributes (list of floats): Average number and/or fraction of valence electrons in specfied orbitals
-
implementors
()¶
matminer.featurizers.data module¶
-
class
matminer.featurizers.data.
AbstractData
¶ Bases:
object
-
get_property
(comp, property_name, return_per_element=True)¶ Gets data for a composition object.
- Args:
- comp (Composition/str): composition property_name (str): Name of descriptor return_per_element (bool): If true, returns one value per element rather than per atom
- Returns:
- (list): list of values for each atom in comp_obj. Note: the returned values are sorted by the corresponding element’s electronegativity. This is done for the sake of consistency.
-
-
class
matminer.featurizers.data.
PymatgenData
¶ Bases:
matminer.featurizers.data.AbstractData
-
static
get_composition_oxidation_state
(formula)¶ Returns the composition and oxidation states from the given formula. Formula examples: “NaCl”, “Na+1Cl-1”, “Fe2+3O3-2” or “Fe2 +3 O3 -2”
- Args:
- formula (str):
- Returns:
- pymatgen.core.composition.Composition, dict of oxidation states as strings
-
get_property
(comp, property_name, return_per_element=True)¶ Get descriptor data for elements in a compound from pymatgen.
- Args:
- comp (str/Composition): Either pymatgen Composition object or string formula,
eg: “NaCl”, “Na+1Cl-1”, “Fe2+3O3-2” or “Fe2 +3 O3 -2” Notes:
- For ‘ionic_radii’ property, the Composition object must be made of oxidation
state decorated Specie objects not the plain Element objects. eg. fe2o3 = Composition({Specie(“Fe”, 3): 2, Specie(“O”, -2): 3})
- For string formula, the oxidation state sign(+ or -) must be specified explicitly.
eg. “Fe2+3O3-2”
- property_name (str): pymatgen element attribute name, as defined in the Element class at
- http://pymatgen.org/_modules/pymatgen/core/periodic_table.html
- Returns:
- (list) of values containing descriptor floats for each atom in the compound(sorted by the
- electronegativity of the contituent atoms)
-
static
matminer.featurizers.stats module¶
File containing general methods for computing property statistics
-
class
matminer.featurizers.stats.
PropertyStats
¶ Bases:
object
-
static
avg_dev
(data_lst, weights=None)¶ Average absolute deviation of list of element data Args:
data_lst (list of floats): Value of a property for each atom in a compound weights (list of floats): Atomic fractions- Returns:
- average absolute deviation
-
static
calc_stat
(stat, data_lst, weights=None)¶ Compute a property statistic
- Args:
- str (str) - Name of property to be compute. If there are arguments to the statistics function, these
- should be added after the name and separated by two underscores. For example, the 2nd Holder mean would be “holder_mean__2”
data_lst (list of floats): list of values weights (list of floats): (Optional) weights for each element in data_lst
- Reteurn:
- float - Desired statistic
-
static
holder_mean
(data_lst, weights=None, power=1)¶ Get Holder mean Args:
data_lst: (list/array) of values weights: (list/array) of weights power: (int/float/str) which holder mean to computeReturns: Holder mean
-
static
maximum
(data_lst, weights=None)¶ Maximum value in a list of element data Args:
data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)- Returns:
- maximum value
-
static
mean
(data_lst, weights=None, **kwargs)¶ Mean of list of element data Args:
data_lst (list of floats): Value of a property for each atom or element in a compound weights (list of floats): Weights for each value- Returns:
- mean value
-
static
minimum
(data_lst, weights=None)¶ Minimum value in a list of element data Args:
data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)- Returns:
- minimum value
-
static
mode
(data_lst, weights=None)¶ Mode of a list of element data. If multiple elements occur equally-frequently (or same weight, if weights are provided), this function will return the average of those values Args:
data_lst (list of floats): Value of a property for each atom in a compound- Returns:
- mode
-
static
range
(data_lst, weights=None)¶ Range of a list of element data Args:
data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)- Returns:
- range
-
static
std_dev
(data_lst, weights=None)¶ Standard deviation of a list of element data Args:
data_lst (list of floats): Value of a property for each atom in a compound weights (list of floats): Atomic fractions
-
static
matminer.featurizers.structure module¶
-
matminer.featurizers.structure.
get_coulomb_matrix
(struct, diag_elems=False)¶ This function generates the Coulomb matrix, M, of the input structure (or molecule). The Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301, 2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge and the position of atom i, respectively.
- Args:
struct (Structure or Molecule): input structure (or molecule). diag_elems (bool): flag indicating whether (True) to use
the original definition of the diagonal elements; if set to False (default), the diagonal elements are set to zero.- Returns:
- (Nsites x Nsites matrix) Coulomb matrix.
-
matminer.featurizers.structure.
get_density
(s)¶
-
matminer.featurizers.structure.
get_min_relative_distances
(struct, cutoff=10.0)¶ This function determines the relative distance of each site to its closest neighbor. We use the relative distance, f_ij = r_ij / (r^atom_i + r^atom_j), as a measure rather than the absolute distances, r_ij, to account for the fact that different atoms/species have different sizes. The function uses the valence-ionic radius estimator implemented in pymatgen.
- Args:
struct (Structure): input structure. cutoff (float): (absolute) distance up to which tentative closest
neighbors (on the basis of relative distances) are to be determined.- Returns:
- ([float]) list of all minimum relative distances (i.e., for all sites).
-
matminer.featurizers.structure.
get_neighbors_of_site_with_index
(struct, n, p=None)¶ Determine the neighbors around the site that has index n in the input Structure object struct, given the approach defined by parameters p. All supported neighbor-finding approaches and listed and explained in the following. All approaches start by creating a tentative list of neighbors using a large cutoff radius defined in parameter dictionary p via key “cutoff”. “min_dist”: find nearest neighbor and its distance d_nn; consider all
neighbors which are within a distance of d_nn * (1 + delta), where delta is an additional parameter provided in the dictionary p via key “delta”.- “scaled_VIRE”: compute the radii, r_i, of all sites on the basis of
- the valence-ionic radius evaluator (VIRE); consider all neighbors for which the distance to the central site is less than the sum of the radii multiplied by an a priori chosen parameter, delta, (i.e., dist < delta * (r_central + r_neighbor)).
- “min_relative_VIRE”: same approach as “min_dist”, except that we
- use relative distances (i.e., distances divided by the sum of the atom radii from VIRE).
- “min_relative_OKeeffe”: same approach as “min_relative_VIRE”, except
- that we use the bond valence parameters from O’Keeffe’s bond valence method (J. Am. Chem. Soc. 1991, 3226-3229) to calculate relative distances.
- Args:
struct (Structure): input structure. n (int): index of site in Structure object for which
neighbors are to be determined.- p (dict): specification (via “approach” key; default is “min_dist”)
- and parameters of neighbor-finding approach. Default cutoff radius is 6 Angstrom (key: “cutoff”). Other default parameters are as follows. min_dist: “delta”: 0.15; min_relative_OKeeffe: “delta”: 0.05; min_relative_VIRE: “delta”: 0.05; scaled_VIRE: “delta”: 2.
- Returns: ([site]) list of sites that are considered to be nearest
- neighbors to site with index n in Structure object struct.
-
matminer.featurizers.structure.
get_okeeffe_distance_prediction
(el1, el2)¶ Returns an estimate of the bond valence parameter (bond length) using the derived parameters from ‘Atoms Sizes and Bond Lengths in Molecules and Crystals’ (O’Keeffe & Brese, 1991). The estimate is based on two experimental parameters: r and c. The value for r is based off radius, while c is (usually) the Allred-Rochow electronegativity. Values used are not generated from pymatgen, and are found in ‘okeeffe_params.json’.
- Args:
- el1, el2 (Element): two Element objects
- Returns:
- a float value of the predicted bond length
-
matminer.featurizers.structure.
get_okeeffe_params
(el_symbol)¶ Returns the elemental parameters related to atom size and electronegativity which are used for estimating bond-valence parameters (bond length) of pairs of atoms on the basis of data provided in ‘Atoms Sizes and Bond Lengths in Molecules and Crystals’ (O’Keeffe & Brese, 1991).
- Args:
- el_symbol (str): element symbol.
- Returns:
- (dict): atom-size (‘r’) and electronegativity-related (‘c’)
- parameter.
-
matminer.featurizers.structure.
get_order_parameter_feature_vectors_difference
(struct1, struct2, pneighs=None, convert_none_to_zero=True, delta_op=0.01, ignore_op_types=None)¶ Determine the difference vector between two order parameter-statistics feature vector resulting from two input structures.
- Args:
struct1 (Structure): first input structure. struct2 (Structure): second input structure. pneighs (dict): specification and parameters of
neighbor-finding approach (see get_neighbors_of_site_with_index function for more details).- convert_none_to_zero (bool): flag indicating whether or not
- to convert None values in OPs to zero (cf., get_order_parameters function).
- delta_op (float): bin size of histogram that is computed
- in order to identify peak locations (cf., get_order_parameters_stats function).
- ignore_op_types ([str]): list of OP types to be ignored in
- output dictionary (cf., get_order_parameters_stats function).
- Returns: ([float]) difference vector between order
- parameter-statistics feature vectors obtained from the two input structures (structure 1 - structure 2).
-
matminer.featurizers.structure.
get_order_parameter_stats
(struct, pneighs=None, convert_none_to_zero=True, delta_op=0.01, ignore_op_types=None)¶ Determine the order parameter statistics accumulated across all sites in Structure object struct using the get_order_parameters function.
- Args:
struct (Structure): input structure. pneighs (dict): specification and parameters of
neighbor-finding approach (see get_neighbors_of_site_with_index function for more details).- convert_none_to_zero (bool): flag indicating whether or not
- to convert None values in OPs to zero (cf., get_order_parameters function).
- delta_op (float): bin size of histogram that is computed
- in order to identify peak locations.
- ignore_op_types ([str]): list of OP types to be ignored in
- output dictionary (e.g., [“cn”, “bent”]). Default (None) will consider all OPs.
- Returns: ({}) dictionary, the keys of which represent
- the order parameter type (e.g., “bent5”, “tet”, “sq_pyr”) and the values of which are dictionaries carring the statistics (“min”, “max”, “mean”, “std”, “peak1”, “peak2”).
-
matminer.featurizers.structure.
get_order_parameters
(struct, pneighs=None, convert_none_to_zero=True)¶ Calculate all order parameters (OPs) for all sites in Structure object struct.
- Args:
struct (Structure): input structure. pneighs (dict): specification and parameters of
neighbor-finding approach (see get_neighbors_of_site_with_index function for more details).- convert_none_to_zero (bool): flag indicating whether or not
- to convert None values in OPs to zero.
- Returns: ([[float]]) matrix of all sites’ (1st dimension)
- order parameters (2nd dimension). 46 order parameters are computed per site: q_cn (coordination number), q_lin, 35 x q_bent (starting with a target angle of 5 degrees and, increasing by 5 degrees, until 175 degrees), q_tet, q_oct, q_bcc, q_2, q_4, q_6, q_reg_tri, q_sq, q_sq_pyr.
-
matminer.featurizers.structure.
get_packing_fraction
(s)¶
-
matminer.featurizers.structure.
get_prdf
(structure, cutoff=20.0, bin_size=0.1)¶ Compute the partial radial distribution function for a structure
The partial radial distribution function is the radial distibution function broken down for each pair of atom types
The PRDF was proposed as a structural descriptor by [Schutt et al.] (https://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118)
- Args:
- structure: pymatgen structure object cutoff: (int/float) distance to calculate rdf up to bin_size: (int/float) size of bin to obtain rdf for
- Returns: (tuple) First element is a dict where keys are tuples of element names
- and values are PRDFs,
-
matminer.featurizers.structure.
get_rdf
(structure, cutoff=20.0, bin_size=0.1)¶ Calculate rdf fingerprint of a given structure
- Args:
- structure: pymatgen structure object cutoff: (int/float) distance to calculate rdf up to bin_size: (int/float) size of bin to obtain rdf for
Returns: (tuple of ndarray) first element is the normalized RDF, second is the inner radius of the RDF bin
-
matminer.featurizers.structure.
get_rdf_peaks
(rdf, rdf_bins, n_peaks=2)¶ Get location of highest peaks in rdf of a structure.
- Args:
- rdf: (ndarray) as output by the function “get_rdf” rdf_bins: (ndarray) inner radius of the rdf bin n_peaks: (int) Number of the top peaks to return
Returns: (ndarray) of distances highest peaks, listed by descending height
-
matminer.featurizers.structure.
get_redf
(struct, cutoff=None, dr=0.05)¶ This function permits the calculation of the crystal structure-inherent electronic radial distribution function (ReDF) according to Willighagen et al., Acta Cryst., 2005, B61, 29-36. The ReDF is a structure-integral RDF (i.e., summed over all sites) in which the positions of neighboring sites are weighted by electrostatic interactions inferred from atomic partial charges. Atomic charges are obtained from the ValenceIonicRadiusEvaluator class.
- Args:
struct (Structure): input Structure object. cutoff (float): distance up to which the ReDF is to be
calculated (default: longest diagaonal in primitive cell)dr (float): width of bins (“x”-axis) of ReDF (default: 0.05 A).
- Returns:
- (dict) a copy of the electronic radial distribution functions (ReDF) as a dictionary. The distance list (“x”-axis values of ReDF) can be accessed via key ‘distances’; the ReDF itself via key ‘redf’.
-
matminer.featurizers.structure.
get_vol_per_site
(s)¶
-
matminer.featurizers.structure.
site_is_of_motif_type
(struct, n, pneighs=None, thresh=None)¶ Returns the motif type of site with index n in structure struct; currently featuring “tetrahedral”, “octahedral”, “bcc”, and “cp” (close-packed: fcc and hcp). If the site is not recognized or if it has been recognized as two different motif types, the function labels the site as “unrecognized”.
- Args:
struct (Structure): input structure. n (int): index of site in Structure object for which motif type
is to be determined.- pneighs (dict): specification and parameters of neighbor-finding
- approach (cf., function get_neighbors_of_site_with_index).
- thresh (dict): thresholds for motif criteria (currently, required
- keys and their default values are “qtet”: 0.5, “qoct”: 0.5, “qbcc”: 0.5, “q6”: 0.4).
Returns: motif type (str).