matminer.featurizers package¶
Submodules¶
matminer.featurizers.bandstructure module¶
-
class
matminer.featurizers.bandstructure.
BandFeaturizer
(kpoints=None, find_method='nearest', nbands=2)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Featurizes a pymatgen band structure object. Args:
- kpoints ([1x3 numpy array]): list of fractional coordinates of
- k-points at which energy is extracted.
- find_method (str): the method for finding or interpolating for energy
at given kpoints. It does nothing if kpoints is None. options are:
- ‘nearest’: the energy of the nearest available k-point to
- the input k-point is returned.
‘linear’: the result of linear interpolation is returned see the documentation for scipy.interpolate.griddata
nbands (int): the number of valence/conduction bands to be featurized
-
__init__
(kpoints=None, find_method='nearest', nbands=2)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(bs)¶ - Args:
- bs (pymatgen BandStructure or BandStructureSymmLine or their dict):
- The band structure to featurize. To obtain all features, bs should include the structure attribute.
- Returns:
- ([float]): a list of band structure features. If not bs.structure,
- features that require the structure will be returned as NaN.
- List of currently supported features:
band_gap (eV): the difference between the CBM and VBM energy is_gap_direct (0.0|1.0): whether the band gap is direct or not direct_gap (eV): the minimum direct distance of the last
valence band and the first conduction band- p_ex1_norm (float): k-space distance between Gamma point
- and k-point of VBM
- n_ex1_norm (float): k-space distance between Gamma point
- and k-point of CBM
p_ex1_degen: degeneracy of VBM n_ex1_degen: degeneracy of CBM if kpoints is provided (e.g. for kpoints == [[0.0, 0.0, 0.0]]):
- n_0.0;0.0;0.0_en: (energy of the first conduction band at
- [0.0, 0.0, 0.0] - CBM energy)
- p_0.0;0.0;0.0_en: (energy of the last valence band at
- [0.0, 0.0, 0.0] - VBM energy)
-
static
get_bindex_bspin
(extremum, is_cbm)¶ Returns the band index and spin of band extremum
- Args:
- extremum (dict): dictionary containing the CBM/VBM, i.e. output of
- Bandstructure.get_cbm()
is_cbm (bool): whether the extremum is the CBM or not
-
implementors
()¶
-
class
matminer.featurizers.bandstructure.
BranchPointEnergy
(n_vb=1, n_cb=1, calculate_band_edges=True)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
__init__
(n_vb=1, n_cb=1, calculate_band_edges=True)¶ Calculates the branch point energy and (optionally) an absolute band edge position assuming the branch point energy is the center of the gap
- Args:
n_vb: (int) number of valence bands to include in BPE calc n_cb: (int) number of conduction bands to include in BPE calc calculate_band_edges: (bool) whether to also return band edge
positions
-
citations
()¶
-
feature_labels
()¶
-
featurize
(bs, target_gap=None)¶ - Args:
- bs: (BandStructure) Uniform (not symm line) band structure
- Returns:
- (int) branch point energy on same energy scale as BS eigenvalues
-
implementors
()¶
-
-
class
matminer.featurizers.bandstructure.
DOSFeaturizer
(contributors=1, significance_threshold=0.1, coordination_features=True, energy_cutoff=0.5, sampling_resolution=100, gaussian_smear=0.1)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Featurizes a pymatgen dos object.
-
__init__
(contributors=1, significance_threshold=0.1, coordination_features=True, energy_cutoff=0.5, sampling_resolution=100, gaussian_smear=0.1)¶ - Args:
- contributors (int):
- Sets the number of top contributors to the DOS that are returned as features. (i.e. contributors=1 will only return the main cb and main vb orbital)
- significance_threshold (float):
- Sets the significance threshold for orbitals in the DOS. Does not impact the number of contributors returned. Only determines the feature value xbm_significant_contributors. The threshold is a fractional value between 0 and 1.
- coordination_features (bool):
- If true, the coordination environment of the PDOS contributors will also be returned. Only limited environments are currently supported. If the environment is neither, “unrecognized” will be returned.
- energy_cutoff (float in eV):
- The extent (into the bands) to sample the DOS
- sampling_resolution (int):
- Number of points to sample DOS
- gaussian_smear (float in eV):
- Gaussian smearing (sigma) around each sampled point in the DOS
-
feature_labels
()¶
-
featurize
(dos)¶ - Args:
- dos (pymatgen CompleteDos or their dict):
- The density of states to featurize. Must be a complete DOS, (i.e. contains PDOS and structure, in addition to total DOS) and must contain the structure.
- Returns:
xbm_score_i (float): fractions of ith contributor orbital xbm_location_i (str): cartesian coordinate of ith contributor.
For example, ‘0.0;0.0;0.0’ if Gammaxbm_specie_i: (str) elemental specie of ith contributor (ex: ‘Ti’) xbm_character_i: (str) orbital character of ith contributor (s p d or f) xbm_coordination_i: (str) the coordination geometry that the ith
contributor orbital reside in. (the coordination environment of the site the orbital is associated with)- xbm_nsignificant: (int) the number of orbitals with contributions
- above the significance_threshold
-
implementors
()¶
-
-
matminer.featurizers.bandstructure.
get_cbm_vbm_scores
(dos, coordination_features, energy_cutoff, sampling_resolution, gaussian_smear)¶ - Args:
- dos (pymatgen CompleteDos or their dict):
- The density of states to featurize. Must be a complete DOS, (i.e. contains PDOS and structure, in addition to total DOS)
- coordination_features (bool):
- if true, will also return the coordination enviornment of the PDOS features
- energy_cutoff (float in eV):
- The extent (into the bands) to sample the DOS
- sampling_resolution (int):
- Number of points to sample DOS
- gaussian_smear (float in eV):
- Gaussian smearing (sigma) around each sampled point in the DOS
- Returns:
- orbital_scores [(dict)]:
A list of how much each orbital contributes to the partial density of states up to energy_cutoff. Dictionary items are: .. cbm_score: (float) fractional contribution to conduction band .. vbm_score: (float) fractional contribution to valence band .. species: (pymatgen Specie) the Specie of the orbital .. character: (str) is the orbital character s, p, d, or f .. location: [(float)] cartesian coordinates of the orbital .. coordination (str) optional-coordination environment from op
site feature vector
matminer.featurizers.base module¶
-
class
matminer.featurizers.base.
BaseFeaturizer
¶ Bases:
object
Abstract class to calculate attributes for compounds
-
citations
()¶ Citation / reference for feature
- Returns:
- array - each element should be str citation, ideally in BibTeX
- format
-
feature_labels
()¶ Generate attribute names
- Returns:
- list of strings for attribute labels
-
featurize
(*x)¶ Main featurizer function. Only defined in feature subclasses.
- Args:
- x: input data to featurize (type depends on featurizer)
- Returns:
- list of one or more features
-
featurize_dataframe
(df, col_id, ignore_errors=False)¶ Compute features for all entries contained in input dataframe
- Args:
df (Pandas dataframe): Dataframe containing input data col_id (str or list of str): column label containing objects to
featurize. Can be multiple labels if the featurize function requires multiple inputs- ignore_errors (bool): Returns NaN for dataframe rows where
- exceptions are thrown if True. If False, exceptions are thrown as normal.
- Returns:
- updated Dataframe
-
implementors
()¶ List of implementors of the feature
- Returns:
- array - each element should either be str with author name (e.g.,
- “Anubhav Jain”) or dict with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).
-
matminer.featurizers.composition module¶
-
class
matminer.featurizers.composition.
BandCenter
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ (Rough) estimation of absolution position of band center using geometric mean of electronegativity.
- Args:
- comp: (Composition)
Returns: (float) band center
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
CohesiveEnergy
(mapi_key=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
__init__
(mapi_key=None)¶ Class to get cohesive energy per atom of a compound by adding known elemental cohesive energies from the formation energy of the compound.
- Parameters:
- mapi_key (str): Materials API key for looking up formation energy
- by composition alone (if you don’t set the formation energy yourself).
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp, formation_energy_per_atom=None)¶ - Args:
comp: (str) compound composition, eg: “NaCl” formation_energy_per_atom: (float) the formation energy per atom of
your compound. If not set, will look up the most stable formation energy from the Materials Project database.
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
ElectronAffinity
(data_source=<matminer.utils.data.DemlData object>)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate average electron affinity times formal charge of anion elements
- Parameters:
- data_source (data class): source from which to retrieve element data
Generates average (electron affinity*formal charge) of anions
-
__init__
(data_source=<matminer.utils.data.DemlData object>)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- avg_anion_affin (single-element list): average electron affinity*formal charge of anions
-
implementors
()¶
-
class
matminer.featurizers.composition.
ElectronegativityDiff
(data_source=<matminer.utils.data.DemlData object>, stats=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Calculate electronegativity difference between cations and anions (average, max, range, etc.)
- Parameters:
- data_source (data class): source from which to retrieve element data stats: Property statistics to compute
Generates average electronegativity difference between cations and anions
-
__init__
(data_source=<matminer.utils.data.DemlData object>, stats=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- en_diff_stats (list of floats): Property stats of electronegativity difference
-
implementors
()¶
-
class
matminer.featurizers.composition.
ElementFraction
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate the atomic fraction of each element in a composition.
Generates: vector where each index represents an element in atomic number order.
-
__init__
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- vector (list of floats): fraction of each element in a composition
-
implementors
()¶
-
-
class
matminer.featurizers.composition.
ElementProperty
(data_source, features, stats)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate elemental property attributes. To initialize quickly, use the from_preset() method.
- Parameters:
- data_source (AbstractData or str): source from which to retrieve
- element property data (or use str for preset: “pymatgen”, “magpie”, or “deml”)
- features (list of strings): List of elemental properties to use
- (these must be supported by data_source)
- stats (string): a list of weighted statistics to compute to for each
- property (see PropertyStats for available stats)
-
__init__
(data_source, features, stats)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Get elemental property attributes
- Args:
- comp: Pymatgen composition object
- Returns:
- all_attributes: Specified property statistics of features
-
static
from_preset
(preset_name)¶ Return ElementProperty from a preset string Args:
preset_name: (str) can be one of “magpie”, “deml”, or “matminer”Returns:
-
implementors
()¶
-
class
matminer.featurizers.composition.
FERECorrection
(data_source=<matminer.utils.data.DemlData object>, stats=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate features related to the difference between fitted elemental-phase reference energy (FERE) and GGA+U energy
- Parameters:
- data_source (data class): source from which to retrieve element data stats: Property statistics to compute
Generates: Property statistics of difference between FERE and GGA+U energy
-
__init__
(data_source=<matminer.utils.data.DemlData object>, stats=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- fere_corr_stats (list of floats): Property stats of FERE correction
-
implementors
()¶
-
class
matminer.featurizers.composition.
IonProperty
(data_source=<matminer.utils.data.MagpieData object>)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate ionic property attributes
- Parameters:
- data_source (data class): source from which to retrieve element data
-
__init__
(data_source=<matminer.utils.data.MagpieData object>)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Ionic character attributes
- Args:
- comp: Pymatgen composition object
- Returns:
- cpd_possible (bool): Indicates if a neutral ionic compound is possible max_ionic_char (float): Maximum ionic character between two atoms avg_ionic_char (float): Average ionic character
-
implementors
()¶
-
class
matminer.featurizers.composition.
Miedema
(struct='inter', dataset='Miedema')¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate the formation enthalpies of the intermetallic compound, solid solution and amorphous phase of a given composition, based on the semi-empirical Miedema model for transitional metals. (use the original formulation in 1980s, see citation)
**Currently only elemental or binary composition is supported, may extend to ternary or more later.
- Parameters:
- struct (String): one target structure or a list of target structures separated by ‘|’
‘inter’ : intermetallic compound –by default ‘ss’ : solid solution ‘amor’ : amorphous phase ‘inter|ss’ : intermetallic compound and solid solution, as an example ‘all’ : same for ‘inter|ss|amor’
for ‘ss’, one can designate the lattice type: if entering ‘ss, bcc’, ‘ss, fcc’, ‘ss, hcp’, then the lattice type of ss is fixed; if not, returning the minimum formation enthalpy of possible lattice types- dataset (String): source of parameters:
- ‘Miedema’: the original paramerization by Miedema et al. in 1989 ‘MP’: extract some features from MP to replace the original ones in ‘Miedema’ ‘Citrine’: extract some features from Citrine to replace the original ones in ‘Miedema’ **Currently not done yet
-
__init__
(struct='inter', dataset='Miedema')¶
-
citations
()¶
-
data_dir
= '/Users/ajain/Documents/code_matgen/matminer/matminer/featurizers/../utils/data_files'¶
-
delta_H_chem
(elements, fracs, struct)¶
-
delta_H_elast
(elements, fracs)¶
-
delta_H_struct
(elements, fracs, lattice)¶
-
delta_S
(fracs)¶
-
feature_labels
()¶
-
featurize
(comp)¶ Get Miedema formation enthalpy of target structures :param comp: Pymatgen composition object :return: delta_H_inter : formation enthalpy of intermetallic compound
delta_H_ss : formation enthalpy of solid solution delta_H_amor : formation enthalpy of amorphous phase
-
implementors
()¶
-
class
matminer.featurizers.composition.
Stoichiometry
(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate stoichiometric attributes.
- Parameters:
- p_list (list of ints): list of norms to calculate num_atoms (bool): whether to return number of atoms
-
__init__
(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Get stoichiometric attributes Args:
comp: Pymatgen composition object p_list (list of ints)- Returns:
- p_norm (list of floats): Lp norm-based stoichiometric attributes.
- Returns number of atoms if no p-values specified.
-
implementors
()¶
-
class
matminer.featurizers.composition.
TMetalFraction
¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate fraction of magnetic transition metals in a composition.
- Parameters:
- data_source (data class): source from which to retrieve element data
Generates: Fraction of magnetic transition metal atoms in a compound
-
__init__
()¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ - Args:
- comp: Pymatgen Composition object
- Returns:
- frac_magn_atoms (single-element list): fraction of magnetic transitional metal atoms in a compound
-
implementors
()¶
-
class
matminer.featurizers.composition.
ValenceOrbital
(data_source=<matminer.utils.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Class to calculate valence orbital attributes
- Parameters:
data_source (data object): source from which to retrieve element data orbitals (list): orbitals to calculate props (list): specifies whether to return average number of electrons in each orbital,
fraction of electrons in each orbital, or both
-
__init__
(data_source=<matminer.utils.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(comp)¶ Weighted fraction of valence electrons in each orbital
- Args:
- comp: Pymatgen composition object
- Returns:
- valence_attributes (list of floats): Average number and/or fraction of valence electrons in specfied orbitals
-
implementors
()¶
matminer.featurizers.site module¶
-
class
matminer.featurizers.site.
AGNIFingerprints
(directions=(None, 'x', 'y', 'z'), etas=None, cutoff=8)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
- Integral of the product of the radial distribution function and a
- Gaussian window function. Originally used by [Botu et al] (http://pubs.acs.org/doi/abs/10.1021/acs.jpcc.6b10908) to fit empiricial potentials. These features come in two forms: atomic fingerprints and direction-resolved fingerprints. Atomic fingerprints describe the local environment of an atom and are computed using the function: :math:`A_i(eta) = sumlimits_{i
e j} e^{-( rac{r_{ij}}{eta})^2} f(r_{ij})`
whereis the index of the atom,
is the index of a neighboring atom,
is a scaling function,
is the distance between atoms
and
, and
is a cutoff function where :math:`f(r) = 0.5[cos(
rac{pi r_{ij}}{R_c}) + 1]` if
and 0 otherwise.
The direction-resolved fingerprints are computed using :math:`V_i^k(eta) = sumlimits_{ie j} rac{r_{ij}^k}{r_{ij}} e^{-( rac{r_{ij}}{eta})^2} f(r_{ij})`
where
.is the
component of
Parameters:
TODO: Differentiate between different atom types (maybe as another class)
-
__init__
(directions=(None, 'x', 'y', 'z'), etas=None, cutoff=8)¶ - Args:
- directions (iterable): List of directions for the fingerprints. Can
- be one or more of ‘None`, ‘x’, ‘y’, or ‘z’
etas (iterable of floats): List of which window widths to compute cutoff (float): Cutoff distance (Angstroms)
-
citations
()¶
-
feature_labels
()¶
-
featurize
(struct, idx)¶
-
implementors
()¶
-
class
matminer.featurizers.site.
ChemEnvSiteFingerprint
(cetypes, strategy, geom_finder, max_csm=8)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Site fingerprint computed from pymatgen’s ChemEnv package that provides resemblance percentages of a given site to ideal environments. Args:
- cetypes ([str]): chemical environments (CEs) to be
- considered.
strategy (ChemenvStrategy): ChemEnv neighbor-finding strategy. geom_finder (LocalGeometryFinder): ChemEnv local geometry finder. max_csm (float): maximum continuous symmetry measure (CSM;
default of 8 taken from chemenv). Note that any CSM larger than max_csm will be set to max_csm in order to avoid negative values (i.e., all features are constrained to be between 0 and 1).-
__init__
(cetypes, strategy, geom_finder, max_csm=8)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(struct, idx)¶ Get ChemEnv fingerprint of site with given index in input structure. Args:
struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure struct.- Returns:
- (numpy array): resemblance fraction of target site to ideal
- local environments.
-
static
from_preset
(preset)¶ Use a standard collection of CE types and choose your ChemEnv neighbor-finding strategy. Args:
- preset (str): preset types (“simple” or
- “multi_weights”).
- Returns:
- ChemEnvSiteFingerprint object from a preset.
-
implementors
()¶
-
class
matminer.featurizers.site.
CrystalSiteFingerprint
(optypes, override_cn1=True, cutoff_radius=8, tol=0.01, cation_anion=False)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
A site fingerprint intended for periodic crystals. The fingerprint represents the value of various order parameters for the site; each value is the product two quantities: (i) the value of the order parameter itself and (ii) a factor that describes how consistent the number of neighbors is with that order parameter. Note that we can include only factor (ii) using the “wt” order parameter which is always set to 1.
-
__init__
(optypes, override_cn1=True, cutoff_radius=8, tol=0.01, cation_anion=False)¶ Initialize the CrystalSiteFingerprint. Use the from_preset() function to use default params.
- Args:
- optypes (dict): a dict of coordination number (int) to a list of str
- representing the order parameter types
- override_cn1 (bool): whether to use a special function for the single
- neighbor case. Suggest to keep True.
cutoff_radius (int): radius in Angstroms for neighbor finding tol (float): numerical tolerance (in case your site distances are
not perfect or to correct for float tolerances)- cation_anion (bool): whether to only consider cation<->anion bonds
- (bonds with zero charge are also allowed)
-
citations
()¶
-
feature_labels
()¶
-
featurize
(struct, idx)¶ Get crystal fingerprint of site with given index in input structure. Args:
struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.- Returns:
- list of weighted order parameters of target site.
-
static
from_preset
(preset, cation_anion=False)¶ Use preset parameters to get the fingerprint
- Args:
preset (str): name of preset (“cn” or “ops”) cation_anion (bool): whether to only consider cation<->anion bonds
(bonds with zero charge are also allowed)
-
implementors
()¶
-
-
class
matminer.featurizers.site.
OPSiteFingerprint
(optypes=None, dr=0.1, ddr=0.01, ndr=1, dop=0.001, dist_exp=2, zero_ops=True)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Local structure order parameters computed from the neighbor environment of a site. For each order parameter, we determine the neighbor shell that complies with the expected coordination number. For example, we find the 4 nearest neighbors for the tetrahedral OP, the 6 nearest for the octahedral OP, and the 8 nearest neighbors for the bcc OP. If we don’t find such a shell, the OP is either set to zero or evaluated with the shell of the next largest observed coordination number. Args:
- dr (float): width for binning neighbors in unit of relative
- distances (= distance/nearest neighbor distance). The binning is necessary to make the neighbor-finding step robust against small numerical variations in neighbor distances (default: 0.1).
ddr (float): variation of width for finding stable OP values. ndr (int): number of width variations for each variation direction
(e.g., ndr = 0 only uses the input dr, whereas ndr=1 tests dr = dr - ddr, dr, and dr + ddr.- dop (float): binning width to compute histogram for each OP
- if ndr > 0.
- dist_exp (boolean): exponent for distance factor to multiply
- order parameters with that penalizes (large) variations in distances in a given motif. 0 will switch the option off (default: 2).
- zero_ops (boolean): set an OP to zero if there is no neighbor
- shell that complies with the expected coordination number of a given OP (e.g., CN=4 for tetrahedron; default: True).
-
__init__
(optypes=None, dr=0.1, ddr=0.01, ndr=1, dop=0.001, dist_exp=2, zero_ops=True)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(struct, idx)¶ Get OP fingerprint of site with given index in input structure. Args:
struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.- Returns:
- opvals (numpy array): order parameters of target site.
-
implementors
()¶
-
class
matminer.featurizers.site.
VoronoiIndex
(cutoff=6.0)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
The Voronoi indices n_i and the fractional Voronoi indices n_i/sum(n_i) that reflects the i-fold symmetry in the local sites. n_i denotes the number of the i-edged faces, and i is in the range of 3-10 here. e.g. for bcc lattice, the Voronoi indices are [0,6,0,8,0,0…]
for fcc/hcp lattice, the Voronoi indices are [0,12,0,0,…] for icosahedra, the Voronoi indices are [0,0,12,0,…]-
__init__
(cutoff=6.0)¶ - Args:
- cutoff (float): cutoff distance in determining the potential
- neighbors for Voronoi tessellation analysis
-
citations
()¶
-
feature_labels
()¶
-
featurize
(struct, idx)¶ - Args:
- struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.
- Returns:
- list including Voronoi indices, sum of Voronoi indices, and fractional Voronoi indices
-
implementors
()¶
-
matminer.featurizers.stats module¶
-
class
matminer.featurizers.stats.
PropertyStats
¶ Bases:
object
This class contains statistical operations that are commonly employed when computing features.
The primary way for interacting with this class is to call the
calc_stat
function, which takes the name of the statistic you would like to compute and the weights/values of data to be assessed. For example, computing the mean of a list looks like:x = [1, 2, 3] PropertyStats.calc_stat(x, 'mean') # Result is 2 PropertyStats.calc_stat(x, 'mean', weights=[0, 0, 1]) # Result is 3
Some of the statistics functions take options (e.g., Holder means). You can pass them to the the statistics functions by adding them after the name and two colons. For example, the 0th Holder mean would be:
PropertyStats.calc_stat(x, 'holder_mean::0')
You can, of course, call the statistical functions directly. All take at least two arguments. The first is the data being assessed and the second, optional, argument is the weights.
-
static
avg_dev
(data_lst, weights=None)¶ Mean absolute deviation of list of element data.
This is computed by first calculating the mean of the list, and then computing the average absolute difference between each value and the mean.
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- mean absolute deviation
-
static
calc_stat
(data_lst, stat, weights=None)¶ Compute a property statistic
- Args:
data_lst (list of floats): list of values stat (str) - Name of property to be compute. If there are arguments to the statistics function, these
should be added after the name and separated by two colons. For example, the 2nd Holder mean would be “holder_mean::2”weights (list of floats): (Optional) weights for each element in data_lst
- Returns:
- float - Desired statistic
-
static
eigenvalues
(data_lst, symm=False, sort=False)¶ Return the eigenvalues of a matrix as a numpy array Args:
data_lst: (matrix-like) of values symm: whether to assume the matrix is symmetric sort: wheter to sort the eigenvaluesReturns: eigenvalues
-
static
flatten
(data_lst)¶ Returns a flattened copy of data_lst-as a numpy array
-
static
geom_std_dev
(data_lst, weights=None)¶ Geometric standard deviation
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- geometric standard deviation
-
static
holder_mean
(data_lst, weights=None, power=1)¶ Get Holder mean Args:
data_lst: (list/array) of values weights: (list/array) of weights power: (int/float/str) which holder mean to computeReturns: Holder mean
-
static
inverse_mean
(data_lst, weights=None)¶ Mean of the inverse of each entry
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- inverse mean
-
static
maximum
(data_lst, weights=None)¶ Maximum value in a list
- Args:
- data_lst (list of floats): List of values to be assessed weights: (ignored)
- Returns:
- maximum value
-
static
mean
(data_lst, weights=None)¶ Arithmetic mean of list
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- mean value
-
static
minimum
(data_lst, weights=None)¶ Minimum value in a list
- Args:
- data_lst (list of floats): List of values to be assessed weights: (ignored)
- Returns:
- minimum value
-
static
mode
(data_lst, weights=None)¶ Mode of a list of data.
If multiple elements occur equally-frequently (or same weight, if weights are provided), this function will return the minimum of those values.
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- mode
-
static
range
(data_lst, weights=None)¶ Range of a list
- Args:
- data_lst (list of floats): List of values to be assessed weights: (ignored)
- Returns:
- range
-
static
sorted
(data_lst)¶ Returns the sorted data_lst
-
static
std_dev
(data_lst, weights=None)¶ Standard deviation of a list of element data
- Args:
- data_lst (list of floats): List of values to be assessed weights (list of floats): Weights for each value
- Returns:
- standard deviation
-
static
matminer.featurizers.structure module¶
-
class
matminer.featurizers.structure.
CoulombMatrix
(diag_elems=True)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Generate the Coulomb matrix, M, of the input structure (or molecule). The Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301, 2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge and the position of atom i, respectively.
- Args:
- diag_elems: (bool) flag indicating whether (True, default) to use
- the original definition of the diagonal elements; if set to False, the diagonal elements are set to zero.
-
__init__
(diag_elems=True)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Get Coulomb matrix of input structure.
- Args:
- s: input Structure (or Molecule) object.
- Returns:
- m: (Nsites x Nsites matrix) Coulomb matrix.
-
implementors
()¶
-
class
matminer.featurizers.structure.
DensityFeatures
(desired_features=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
__init__
(desired_features=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶
-
implementors
()¶
-
-
class
matminer.featurizers.structure.
ElectronicRadialDistributionFunction
(cutoff=None, dr=0.05)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Calculate the crystal structure-inherent electronic radial distribution function (ReDF) according to Willighagen et al., Acta Cryst., 2005, B61, 29-36. The ReDF is a structure-integral RDF (i.e., summed over all sites) in which the positions of neighboring sites are weighted by electrostatic interactions inferred from atomic partial charges. Atomic charges are obtained from the ValenceIonicRadiusEvaluator class. Args:
- cutoff: (float) distance up to which the ReDF is to be
- calculated (default: longest diagaonal in primitive cell).
dr: (float) width of bins (“x”-axis) of ReDF (default: 0.05 A).
-
__init__
(cutoff=None, dr=0.05)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Get ReDF of input structure.
- Args:
- s: input Structure object.
- Returns: (dict) a copy of the electronic radial distribution
- functions (ReDF) as a dictionary. The distance list (“x”-axis values of ReDF) can be accessed via key ‘distances’; the ReDF itself is accessible via key ‘redf’.
-
implementors
()¶
-
class
matminer.featurizers.structure.
GlobalSymmetryFeatures
(desired_features=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
-
__init__
(desired_features=None)¶
-
citations
()¶
-
crystal_idx
= {'triclinic': 7, 'monoclinic': 6, 'orthorhombic': 5, 'tetragonal': 4, 'trigonal': 3, 'hexagonal': 2, 'cubic': 1}¶
-
feature_labels
()¶
-
featurize
(s)¶
-
implementors
()¶
-
-
class
matminer.featurizers.structure.
MinimumRelativeDistances
(cutoff=10.0)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Determines the relative distance of each site to its closest neighbor. We use the relative distance, f_ij = r_ij / (r^atom_i + r^atom_j), as a measure rather than the absolute distances, r_ij, to account for the fact that different atoms/species have different sizes. The function uses the valence-ionic radius estimator implemented in Pymatgen. Args:
- cutoff: (float) (absolute) distance up to which tentative
- closest neighbors (on the basis of relative distances) are to be determined.
-
__init__
(cutoff=10.0)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s, cutoff=10.0)¶ Get minimum relative distances of all sites of the input structure.
- Args:
- s: Pymatgen Structure object.
- Returns:
- min_rel_dists: (list of floats) list of all minimum relative
- distances (i.e., for all sites).
-
implementors
()¶
-
class
matminer.featurizers.structure.
OPStructureFingerprint
(op_site_fp=None, stats=('mean', 'std_dev', 'minimum', 'maximum'), min_oxi=None, max_oxi=None)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Calculates all order parameters (OPs) for all sites in a crystal structure. Args:
- op_site_fp (OPSiteFingerprint): defines the types of order
- parameters to be calculated.
- stats ([str]): list of weighted statistics to compute for each feature.
- If stats is None, for each order parameter, a list is returned that contains the calculated parameter for each site in the structure. *Note for nth mode, stat must be ‘n*_mode’; e.g. stat=‘2nd_mode’
- min_oxi (int): minimum site oxidation state for inclusion (e.g.,
- zero means metals/cations only)
max_oxi (int): maximum site oxidation state for inclusion
-
__init__
(op_site_fp=None, stats=('mean', 'std_dev', 'minimum', 'maximum'), min_oxi=None, max_oxi=None)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Calculate all sites’ local structure order parameters (LSOPs).
- Args:
s: Pymatgen Structure object.
- Returns:
- opvals: (2D array of floats) LSOP values of all sites’ (1st dimension) order parameters (2nd dimension). 46 order parameters are computed per site: q_cn (coordination number), q_lin, 35 x q_bent (starting with a target angle of 5 degrees and, increasing by 5 degrees, until 175 degrees), q_tet, q_oct, q_bcc, q_2, q_4, q_6, q_reg_tri, q_sq, q_sq_pyr.
-
implementors
()¶
-
static
n_numerical_modes
(data_lst, n=2, dl=0.1)¶ - Returns the n first modes of a data set that are obtained with
- a finite bin size for the underlying frequency distribution.
- Args:
- data_lst ([float]): data values. n (integer): number of most frequent elements to be determined. dl (float): bin size of underlying (coarsened) distribution.
- Returns:
- ([float]): first n most frequent entries (or nan if not found).
-
class
matminer.featurizers.structure.
OrbitalFieldMatrix
(period_tag=False)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
This function generates an orbital field matrix (OFM) as developed by Pham et al (arXiv, May 2017). Each atom is described by a 32-element vector (or 39-element vector, see period tag for details) uniquely representing the valence subshell. A 32x32 (39x39) matrix is formed by multiplying two atomic vectors. An OFM for an atomic environment is the sum of these matrices for each atom the center atom coordinates with multiplied by a distance function (In this case, 1/r times the weight of the coordinating atom in the Voronoi Polyhedra method). The OFM of a structure or molecule is the average of the OFMs for all the sites in the structure.
- Args:
- period_tag (bool): In the original OFM, an element is represented
- by a vector of length 32, where each element is 1 or 0, which represents the valence subshell of the element. With period_tag=True, the vector size is increased to 39, where the 7 extra elements represent the period of the element. Note lanthanides are treated as period 6, actinides as period 7. Default False as in the original paper.
- …attribute:: size
- Either 32 or 39, the size of the vectors used to describe elements.
-
__init__
(period_tag=False)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Makes a supercell for structure s (to protect sites from coordinating with themselves), and then finds the mean of the orbital field matrices of each site to characterize a structure
- Args:
- s (Structure): structure to characterize
- Returns:
- mean_ofm (size X size matrix): orbital field matrix
- characterizing s
-
get_atom_ofms
(struct, symm=False)¶ Calls get_single_ofm for every site in struct. If symm=True, get_single_ofm is called for symmetrically distinct sites, and counts is constructed such that ofms[i] occurs counts[i] times in the structure
- Args:
struct (Structure): structure for find ofms for symm (bool): whether to calculate ofm for only symmetrically
distinct sites- Returns:
ofms ([size X size matrix] X len(struct)): ofms for struct if symm:
- ofms ([size X size matrix] X number of symmetrically distinct sites):
- ofms for struct
counts: number of identical sites for each ofm
-
get_mean_ofm
(ofms, counts)¶ Averages a list of ofms, weights by counts
-
get_ohv
(sp, period_tag)¶ Get the “one-hot-vector” for pymatgen Element sp. This 32 or 39-length vector represents the valence shell of the given element. Args:
sp (Element): element whose ohv should be returned period_tag (bool): If true, the vector contains items
corresponding to the period of the element- Returns:
- my_ohv (numpy array length 39 if period_tag, else 32): ohv for sp
-
get_single_ofm
(site, site_dict)¶ Gets the orbital field matrix for a single chemical environment, where site is the center atom whose environment is characterized and site_dict is a dictionary of site : weight, where the weights are the Voronoi Polyhedra weights of the corresponding coordinating sites.
- Args:
- site (Site): center atom site_dict (dict of Site:float): chemical environment
- Returns:
- atom_ofm (size X size numpy matrix): ofm for site
-
get_structure_ofm
(struct)¶ Calls get_mean_ofm on the results of get_atom_ofms to give a size X size matrix characterizing a structure
-
implementors
()¶
-
class
matminer.featurizers.structure.
PartialRadialDistributionFunction
(cutoff=20.0, bin_size=0.1)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Compute the partial radial distribution function (PRDF) of a crystal structure, which is the radial distibution function broken down for each pair of atom types. The PRDF was proposed as a structural descriptor by [Schutt et al.] (https://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118) Args:
cutoff: (float) distance up to which to calculate the RDF. bin_size: (float) size of each bin of the (discrete) RDF.-
__init__
(cutoff=20.0, bin_size=0.1)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Get PRDF of the input structure. Args:
s: Pymatgen Structure object.- Returns:
- prdf, dist: (tuple of arrays) the first element is a
- dictionary where keys are tuples of element names and values are PRDFs.
-
implementors
()¶
-
-
class
matminer.featurizers.structure.
RadialDistributionFunction
(cutoff=20.0, bin_size=0.1)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Calculate the radial distribution function (RDF) of a crystal structure. Args:
cutoff: (float) distance up to which to calculate the RDF. bin_size: (float) size of each bin of the (discrete) RDF.-
__init__
(cutoff=20.0, bin_size=0.1)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ Get RDF of the input structure. Args:
s: Pymatgen Structure object.- Returns:
- rdf, dist: (tuple of arrays) the first element is the
- normalized RDF, whereas the second element is the inner radius of the RDF bin.
-
implementors
()¶
-
-
class
matminer.featurizers.structure.
RadialDistributionFunctionPeaks
(n_peaks=2)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
Determine the location of the highest peaks in the radial distribution function (RDF) of a structure. Args:
n_peaks: (int) number of the top peaks to return .-
__init__
(n_peaks=2)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(rdf)¶ Get location of highest peaks in RDF.
- Args:
- rdf: (ndarray) RDF as obtained from the
- RadialDistributionFunction class.
- Returns: (ndarray) distances of highest peaks in descending order
- of the peak height
-
implementors
()¶
-
-
class
matminer.featurizers.structure.
SineCoulombMatrix
(diag_elems=True)¶ Bases:
matminer.featurizers.base.BaseFeaturizer
This function generates a variant of the Coulomb matrix developed for periodic crystals by Faber et al. (Inter. J. Quantum Chem. 115, 16, 2015). It is identical to the Coulomb matrix, except that the inverse distance function is replaced by the inverse of a sin**2 function of the vector between the sites which is periodic in the dimensions of the structure lattice. See paper for details.
- Args:
- diag_elems (bool): flag indication whether (True, default) to use
- the original definition of the diagonal elements; if set to False, the diagonal elements are set to 0
-
__init__
(diag_elems=True)¶
-
citations
()¶
-
feature_labels
()¶
-
featurize
(s)¶ - Args:
- s (Structure or Molecule): input structure (or molecule)
- Returns:
- (Nsites x Nsites matrix) Sine matrix.
-
implementors
()¶
-
matminer.featurizers.structure.
get_op_stats_vector_diff
(s1, s2, max_dr=0.2, ddr=0.01, ddist=0.01)¶ Determine the difference vector between two order parameter-statistics feature vector resulting from two input structures.
- Args:
- s1 (Structure): first input structure. s2 (Structure): second input structure. max_dr (float): maximum neighbor-finding parameter to be tested. ddr (float): step size for increasing neighbor-finding parameter. ddist (float): bin size for histogramming distances of varying dr.
- Returns: (float, [float]) optimal neighbor-finding parameter
- and difference vector between order parameter-statistics feature vectors obtained from the two input structures (s1 - s2).