matminer.featurizers package

Submodules

matminer.featurizers.bandstructure module

class matminer.featurizers.bandstructure.BandFeaturizer

Bases: matminer.featurizers.base.BaseFeaturizer

Featurizes a pymatgen band structure object.

__init__()
citations()
feature_labels()
featurize(bs)
Args:
bs (pymatgen BandStructure or BandStructureSymmLine or their dict):
The band structure to featurize()
Returns ([float]):
a list of band structure features. If not bs.structure, the
features that require the structure will be returned as NaN.
List of currently supported features:

band_gap (eV): the difference between the CBM and VBM energy is_gap_direct (0.0|1.0): whether the band gap is direct or not direct_gap (eV): the minimum direct distance of the last

valence band and the first conduction band
{n,p}_ex{#}_en (eV): for example p_ex2_en is the absolute value
of the energy of the second valence (p) band extremum w.r.t. VBM
{n,p}_ex{#}_norm (float): e.g. n_ex1_norm is norm of the
fractional coordinates of k-points of the 1st conduction (n) band extremum, i.e., the CBM
static get_bindex_bspin(extremum, is_cbm)

Returns the band index and spin of band extremum

Args:
extremum (dict): dictionary containing the CBM/VBM, i.e. output of
Bandstructure.get_cbm()

is_cbm (bool): whether the extremum is the CBM or not

implementors()
class matminer.featurizers.bandstructure.BranchPointEnergy(n_vb=1, n_cb=1, calculate_band_edges=True)

Bases: matminer.featurizers.base.BaseFeaturizer

__init__(n_vb=1, n_cb=1, calculate_band_edges=True)

Calculates the branch point energy and (optionally) an absolute band edge position assuming the branch point energy is the center of the gap

Args:

n_vb: (int) number of valence bands to include in BPE calc n_cb: (int) number of conduction bands to include in BPE calc calculate_band_edges: (bool) whether to also return band edge

positions
citations()
feature_labels()
featurize(bs, target_gap=None)
Args:
bs: (BandStructure)
Returns:
(int) branch point energy on same energy scale as BS eigenvalues
implementors()

matminer.featurizers.base module

class matminer.featurizers.base.BaseFeaturizer

Bases: object

Abstract class to calculate attributes for compounds

citations()

Citation / reference for feature

Returns:
array - each element should be str citation, ideally in BibTeX format
feature_labels()

Generate attribute names

Returns:
list of strings for attribute labels
featurize(*x)

Main featurizer function. Only defined in feature subclasses.

Args:
x: input data to featurize (type depends on featurizer)
Returns:
list of one or more features
featurize_dataframe(df, col_id)

Compute features for all entries contained in input dataframe

Args:

df (Pandas dataframe): Dataframe containing input data col_id (str or list of str): column label containing objects to featurize. Can be multiple labels, if the featurize

function requires multiple inputs
Returns:
updated Dataframe
implementors()

List of implementors of the feature.

Returns:
array - each element should either be str with author name (e.g., “Anubhav Jain”) or
dict with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition module

class matminer.featurizers.composition.BandCenter

Bases: matminer.featurizers.base.BaseFeaturizer

citations()
feature_labels()
featurize(comp)

(Rough) estimation of absolution position of band center using geometric mean of electronegativity.

Args:
comp: (Composition)

Returns: (float) band center

implementors()
class matminer.featurizers.composition.CohesiveEnergy(mapi_key=None)

Bases: matminer.featurizers.base.BaseFeaturizer

__init__(mapi_key=None)

Class to get cohesive energy per atom of a compound by adding known elemental cohesive energies from the formation energy of the compound.

Parameters:
mapi_key (str): Materials API key for looking up formation energy
by composition alone (if you don’t set the formation energy yourself).
citations()
feature_labels()
featurize(comp, formation_energy_per_atom=None)
Args:

comp: (str) compound composition, eg: “NaCl” formation_energy_per_atom: (float) the formation energy per atom of

your compound. If not set, will look up the most stable formation energy from the Materials Project database.
implementors()
class matminer.featurizers.composition.ElectronAffinity(data_source=<matminer.featurizers.data.DemlData object>)

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate average electron affinity times formal charge of anion elements

Parameters:
data_source (data class): source from which to retrieve element data

Generates average (electron affinity*formal charge) of anions

__init__(data_source=<matminer.featurizers.data.DemlData object>)
citations()
feature_labels()
featurize(comp)
Args:
comp: Pymatgen Composition object
Returns:
avg_anion_affin (single-element list): average electron affinity*formal charge of anions
implementors()
class matminer.featurizers.composition.ElectronegativityDiff(data_source=<matminer.featurizers.data.DemlData object>, stats=None)

Bases: matminer.featurizers.base.BaseFeaturizer

Calculate electronegativity difference between cations and anions (average, max, range, etc.)

Parameters:
data_source (data class): source from which to retrieve element data stats: Property statistics to compute

Generates average electronegativity difference between cations and anions

__init__(data_source=<matminer.featurizers.data.DemlData object>, stats=None)
citations()
feature_labels()
featurize(comp)
Args:
comp: Pymatgen Composition object
Returns:
en_diff_stats (list of floats): Property stats of electronegativity difference
implementors()
class matminer.featurizers.composition.ElementFraction

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate the atomic fraction of each element in a composition.

Generates: vector where each index represents an element in atomic number order.

__init__()
feature_labels()
featurize(comp)
Args:
comp: Pymatgen Composition object
Returns:
vector (list of floats): fraction of each element in a composition
implementors()
class matminer.featurizers.composition.ElementProperty(data_source, features, stats)

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate elemental property attributes. To initialize quickly, use the from_preset() method.

Parameters:
data_source (AbstractData or str): source from which to retrieve
element property data (or use str for preset: “pymatgen”, “magpie”, or “deml”)
attributes (list of strings): List of elemental properties to use
(these must be supported by data_source)
stats (string): a list of weighted statistics to compute to for each
property (see PropertyStats for available stats)
__init__(data_source, features, stats)
citations()
feature_labels()
featurize(comp)

Get elemental property attributes

Args:
comp: Pymatgen composition object
Returns:
all_attributes: Specified property statistics of features
static from_preset(preset_name)

Return ElementProperty from a preset string Args:

preset_name: (str) can be one of “magpie”, “deml”, or “matminer”

Returns:

implementors()
class matminer.featurizers.composition.FERECorrection(data_source=<matminer.featurizers.data.DemlData object>, stats=None)

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate difference between fitted elemental-phase reference energy (FERE) and GGA+U energy

Parameters:
data_source (data class): source from which to retrieve element data stats: Property statistics to compute

Generates: Property statistics of difference between FERE and GGA+U energy

__init__(data_source=<matminer.featurizers.data.DemlData object>, stats=None)
citations()
feature_labels()
featurize(comp)
Args:
comp: Pymatgen Composition object
Returns:
fere_corr_stats (list of floats): Property stats of FERE correction
implementors()
class matminer.featurizers.composition.IonProperty(data_source=<matminer.featurizers.data.MagpieData object>)

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate ionic property attributes

Parameters:
data_source (data class): source from which to retrieve element data
__init__(data_source=<matminer.featurizers.data.MagpieData object>)
citations()
feature_labels()
featurize(comp)

Ionic character attributes

Args:
comp: Pymatgen composition object
Returns:
cpd_possible (bool): Indicates if a neutral ionic compound is possible max_ionic_char (float): Maximum ionic character between two atoms avg_ionic_char (float): Average ionic character
implementors()
class matminer.featurizers.composition.Stoichiometry(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate stoichiometric attributes.

Parameters:
p_list (list of ints): list of norms to calculate num_atoms (bool): whether to return number of atoms
__init__(p_list=[0, 2, 3, 5, 7, 10], num_atoms=False)
citations()
feature_labels()
featurize(comp)

Get stoichiometric attributes Args:

comp: Pymatgen composition object p_list (list of ints)
Returns:
p_norm (list of floats): Lp norm-based stoichiometric attributes.
Returns number of atoms if no p-values specified.
implementors()
class matminer.featurizers.composition.TMetalFraction

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate fraction of magnetic transition metals in a composition.

Parameters:
data_source (data class): source from which to retrieve element data

Generates: Fraction of magnetic transition metal atoms in a compound

__init__()
citations()
feature_labels()
featurize(comp)
Args:
comp: Pymatgen Composition object
Returns:
frac_magn_atoms (single-element list): fraction of magnetic transitional metal atoms in a compound
implementors()
class matminer.featurizers.composition.ValenceOrbital(data_source=<matminer.featurizers.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])

Bases: matminer.featurizers.base.BaseFeaturizer

Class to calculate valence orbital attributes

Parameters:

data_source (data object): source from which to retrieve element data orbitals (list): orbitals to calculate props (list): specifies whether to return average number of electrons in each orbital,

fraction of electrons in each orbital, or both
__init__(data_source=<matminer.featurizers.data.MagpieData object>, orbitals=['s', 'p', 'd', 'f'], props=['avg', 'frac'])
citations()
feature_labels()
featurize(comp)

Weighted fraction of valence electrons in each orbital

Args:
comp: Pymatgen composition object
Returns:
valence_attributes (list of floats): Average number and/or fraction of valence electrons in specfied orbitals
implementors()

matminer.featurizers.data module

class matminer.featurizers.data.AbstractData

Bases: object

get_property(comp, property_name, return_per_element=True)

Gets data for a composition object.

Args:
comp (Composition/str): composition property_name (str): Name of descriptor return_per_element (bool): If true, returns one value per element rather than per atom
Returns:
(list): list of values for each atom in comp_obj. Note: the returned values are sorted by the corresponding element’s atomic number. This is done for the sake of consistency.
class matminer.featurizers.data.PymatgenData

Bases: matminer.featurizers.data.AbstractData

static get_composition_oxidation_state(formula)

Returns the composition and oxidation states from the given formula. Formula examples: “NaCl”, “Na+1Cl-1”, “Fe2+3O3-2” or “Fe2 +3 O3 -2”

Args:
formula (str):
Returns:
pymatgen.core.composition.Composition, dict of oxidation states as strings
get_property(comp, property_name, return_per_element=True)

Get descriptor data for elements in a compound from pymatgen.

Args:
comp (str/Composition): Either pymatgen Composition object or string formula,

eg: “NaCl”, “Na+1Cl-1”, “Fe2+3O3-2” or “Fe2 +3 O3 -2” Notes:

  • For ‘ionic_radii’ property, the Composition object must be made of oxidation

    state decorated Specie objects not the plain Element objects. eg. fe2o3 = Composition({Specie(“Fe”, 3): 2, Specie(“O”, -2): 3})

  • For string formula, the oxidation state sign(+ or -) must be specified explicitly.

    eg. “Fe2+3O3-2”

property_name (str): pymatgen element attribute name, as defined in the Element class at
http://pymatgen.org/_modules/pymatgen/core/periodic_table.html
Returns:
(list) of values containing descriptor floats for each atom in the compound(sorted by the
electronegativity of the contituent atoms)

matminer.featurizers.site module

Features that describe the local environment of a single atom

The featurize function takes two arguments:
strc (Structure): Object representing the structure containing the site of interest site (int): Index of the site to be featurized

We have to use two options because the Site object does not hold a pointer back to its structure. To run featurize_dataframe, you must pass the column names for both the site and the structure. For example:

class matminer.featurizers.site.AGNIFingerprints(directions=(None, 'x', 'y', 'z'), etas=array([ 0.8, 1.22730192, 1.88283751, 2.88851263, 4.43134638, 6.79824993, 10.42938152, 16. ]), cutoff=8)

Bases: matminer.featurizers.base.BaseFeaturizer

Integral of the product of the radial distribution function and a Gaussian window function.

Originally used by [Botu et al](http://pubs.acs.org/doi/abs/10.1021/acs.jpcc.6b10908) to fit empiricial potentials, these features come in two forms: atomic fingerprints, and direction-resolved fingerprints.

Atomic fingerprints describe the local environment of an atom and are computed using the function:

:math:`A_i(eta) = sumlimits_{i

e j} e^{-( rac{r_{ij}}{eta})^2} f(r_{ij})`

where i is the index of the atom, j is the index of a neighboring atom, \eta is a scaling function, r_{ij} is the distance between atoms i and j, and f(r) is a cutoff function where :math:`f(r) = 0.5[cos(

rac{pi r_{ij}}{R_c}) + 1]` if r < R_c:math: and 0 otherwise.

The direction-resolved fingerprints are computed using

:math:`V_i^k(eta) = sumlimits_{i

e j} rac{r_{ij}^k}{r_{ij}} e^{-( rac{r_{ij}}{eta})^2} f(r_{ij})`

where r_{ij}^k is the k^{th} component of

System Message: WARNING/2 (old{r}_i - old{r}_j)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2015/01/01> Babel <3.9l> and hyphenation patterns for 79 languages loaded. (/usr/local/texlive/2015/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/local/texlive/2015/texmf-dist/tex/latex/base/size12.clo)) (/usr/local/texlive/2015/texmf-dist/tex/latex/base/inputenc.sty (/usr/local/texlive/2015/texmf-dist/tex/latex/ucs/utf8x.def)) (/usr/local/texlive/2015/texmf-dist/tex/latex/ucs/ucs.sty (/usr/local/texlive/2015/texmf-dist/tex/latex/ucs/data/uni-global.def)) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?’ option. (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/local/texlive/2015/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/local/texlive/2015/texmf-dist/tex/latex/tools/bm.sty) (./math.aux) (/usr/local/texlive/2015/texmf-dist/tex/latex/ucs/ucsencs.def) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/local/texlive/2015/texmf-dist/tex/latex/amsfonts/umsb.fd) ! Package inputenc Error: Keyboard character used is undefined (inputenc) in inputencoding `utf8x’. See the inputenc package documentation for explanation. Type H <return> for immediate help. ... l.12 $^^H old{r}_i - ^^Hold{r}_j$ ! Package inputenc Error: Keyboard character used is undefined (inputenc) in inputencoding `utf8x’. See the inputenc package documentation for explanation. Type H <return> for immediate help. ... l.12 $^^Hold{r}_i - ^^H old{r}_j$ [1] (./math.aux) ) (see the transcript file for additional information) Output written on math.dvi (1 page, 328 bytes). Transcript written on math.log.
.

Parameters:
directions (iterable): List of directions for the fingerprints. Can be none, ‘x’, ‘y’, or ‘z’ etas (iterable of floats): List of which window widths to compute cutoff (float): Cutoff distance

TODO: Differentiate between different atom types (maybe as another class)

__init__(directions=(None, 'x', 'y', 'z'), etas=array([ 0.8, 1.22730192, 1.88283751, 2.88851263, 4.43134638, 6.79824993, 10.42938152, 16. ]), cutoff=8)
citations()
directions
feature_labels()
featurize(strc, site)
implementors()

matminer.featurizers.stats module

File containing general methods for computing property statistics

class matminer.featurizers.stats.PropertyStats

Bases: object

static avg_dev(data_lst, weights=None)

Average absolute deviation of list of element data Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (list of floats): Atomic fractions
Returns:
average absolute deviation
static calc_stat(data_lst, stat, weights=None)

Compute a property statistic

Args:

data_lst (list of floats): list of values stat (str) - Name of property to be compute. If there are arguments to the statistics function, these

should be added after the name and separated by two underscores. For example, the 2nd Holder mean would be “holder_mean__2”

weights (list of floats): (Optional) weights for each element in data_lst

Reteurn:
float - Desired statistic
static eigenvalues(data_lst, symm=False, sort=False)

Return the eigenvalues of a matrix as a numpy array Args:

data_lst: (matrix-like) of values symm: whether to assume the matrix is symmetric sort: wheter to sort the eigenvalues

Returns: eigenvalues

static flatten(data_lst)

oxi Returns a flattened copy of data_lst-as a numpy array

static gaussian_kernel(arr0, arr1, SIGMA)

Returns a Gaussian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

static holder_mean(data_lst, weights=None, power=1)

Get Holder mean Args:

data_lst: (list/array) of values weights: (list/array) of weights power: (int/float/str) which holder mean to compute

Returns: Holder mean

static laplacian_kernel(arr0, arr1, SIGMA)

Returns a Laplacian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

static maximum(data_lst, weights=None)

Maximum value in a list of element data Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)
Returns:
maximum value
static mean(data_lst, weights=None, **kwargs)

Mean of list of element data Args:

data_lst (list of floats): Value of a property for each atom or element in a compound weights (list of floats): Weights for each value
Returns:
mean value
static minimum(data_lst, weights=None)

Minimum value in a list of element data Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)
Returns:
minimum value
static mode(data_lst, weights=None)

Mode of a list of element data. If multiple elements occur equally-frequently (or same weight, if weights are provided), this function will return the average of those values Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (list of floats): Atomic fractions
Returns:
mode
static n_numerical_modes(data_lst, n, dl=0.1)
Returns the n first modes of a data set that are obtained with
a finite bin size for the underlying frequency distribution.
Args:
data_lst ([float]): data values. n (integer): number of most frequent elements to be determined. dl (float): bin size of underlying (coarsened) distribution.
Returns:
([float]): first n most frequent entries (or nan if not found).
static range(data_lst, weights=None)

Range of a list of element data Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (ignored)
Returns:
range
static sorted(data_lst)

Returns the sorted data_lst

static std_dev(data_lst, weights=None)

Standard deviation of a list of element data Args:

data_lst (list of floats): Value of a property for each atom in a compound weights (list of floats): Atomic fractions

matminer.featurizers.structure module

class matminer.featurizers.structure.CoulombMatrix(diag_elems=True)

Bases: matminer.featurizers.base.BaseFeaturizer

Generate the Coulomb matrix, M, of the input structure (or molecule). The Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301, 2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge and the position of atom i, respectively.

Args:
diag_elems: (bool) flag indicating whether (True, default) to use
the original definition of the diagonal elements; if set to False, the diagonal elements are set to zero.
__init__(diag_elems=True)
citations()
feature_labels()
featurize(s)

Get Coulomb matrix of input structure.

Args:
s: input Structure (or Molecule) object.
Returns:
m: (Nsites x Nsites matrix) Coulomb matrix.
implementors()
class matminer.featurizers.structure.DensityFeatures(desired_features=None)

Bases: matminer.featurizers.base.BaseFeaturizer

__init__(desired_features=None)
citations()
feature_labels()
featurize(s)
implementors()
class matminer.featurizers.structure.ElectronicRadialDistributionFunction(cutoff=None, dr=0.05)

Bases: matminer.featurizers.base.BaseFeaturizer

Calculate the crystal structure-inherent electronic radial distribution function (ReDF) according to Willighagen et al., Acta Cryst., 2005, B61, 29-36. The ReDF is a structure-integral RDF (i.e., summed over all sites) in which the positions of neighboring sites are weighted by electrostatic interactions inferred from atomic partial charges. Atomic charges are obtained from the ValenceIonicRadiusEvaluator class. Args:

cutoff: (float) distance up to which the ReDF is to be
calculated (default: longest diagaonal in primitive cell).

dr: (float) width of bins (“x”-axis) of ReDF (default: 0.05 A).

__init__(cutoff=None, dr=0.05)
citations()
feature_labels()
featurize(s)

Get ReDF of input structure.

Args:
s: input Structure object.
Returns: (dict) a copy of the electronic radial distribution
functions (ReDF) as a dictionary. The distance list (“x”-axis values of ReDF) can be accessed via key ‘distances’; the ReDF itself is accessible via key ‘redf’.
implementors()
class matminer.featurizers.structure.MinimumRelativeDistances(cutoff=10.0)

Bases: matminer.featurizers.base.BaseFeaturizer

Determines the relative distance of each site to its closest neighbor. We use the relative distance, f_ij = r_ij / (r^atom_i + r^atom_j), as a measure rather than the absolute distances, r_ij, to account for the fact that different atoms/species have different sizes. The function uses the valence-ionic radius estimator implemented in Pymatgen. Args:

cutoff: (float) (absolute) distance up to which tentative
closest neighbors (on the basis of relative distances) are to be determined.
__init__(cutoff=10.0)
citations()
feature_labels()
featurize(s, cutoff=10.0)

Get minimum relative distances of all sites of the input structure.

Args:
s: Pymatgen Structure object.
Returns:
min_rel_dists: (list of floats) list of all minimum relative
distances (i.e., for all sites).
implementors()
class matminer.featurizers.structure.OrbitalFieldMatrix(period_tag=False)

Bases: matminer.featurizers.base.BaseFeaturizer

This function generates an orbital field matrix (OFM) as developed by Pham et al (arXiv, May 2017). Each atom is described by a 32-element vector (or 39-element vector, see period tag for details) uniquely representing the valence subshell. A 32x32 (39x39) matrix is formed by multiplying two atomic vectors. An OFM for an atomic environment is the sum of these matrices for each atom the center atom coordinates with multiplied by a distance function (In this case, 1/r times the weight of the coordinating atom in the Voronoi Polyhedra method). The OFM of a structure or molecule is the average of the OFMs for all the sites in the structure.

Args:
period_tag (bool): In the original OFM, an element is represented
by a vector of length 32, where each element is 1 or 0, which represents the valence subshell of the element. With period_tag=True, the vector size is increased to 39, where the 7 extra elements represent the period of the element. Note lanthanides are treated as period 6, actinides as period 7. Default False as in the original paper.
...attribute:: size
Either 32 or 39, the size of the vectors used to describe elements.
__init__(period_tag=False)
citations()
feature_labels()
featurize(s)

Makes a supercell for structure s (to protect sites from coordinating with themselves), and then finds the mean of the orbital field matrices of each site to characterize a structure

Args:
s (Structure): structure to characterize
Returns:
mean_ofm (size X size matrix): orbital field matrix
characterizing s
get_atom_ofms(struct, symm=False)

Calls get_single_ofm for every site in struct. If symm=True, get_single_ofm is called for symmetrically distinct sites, and counts is constructed such that ofms[i] occurs counts[i] times in the structure

Args:

struct (Structure): structure for find ofms for symm (bool): whether to calculate ofm for only symmetrically

distinct sites
Returns:

ofms ([size X size matrix] X len(struct)): ofms for struct if symm:

ofms ([size X size matrix] X number of symmetrically distinct sites):
ofms for struct

counts: number of identical sites for each ofm

get_mean_ofm(ofms, counts)

Averages a list of ofms, weights by counts

get_ohv(sp, period_tag)

Get the “one-hot-vector” for pymatgen Element sp. This 32 or 39-length vector represents the valence shell of the given element. Args:

sp (Element): element whose ohv should be returned period_tag (bool): If true, the vector contains items

corresponding to the period of the element
Returns:
my_ohv (numpy array length 39 if period_tag, else 32): ohv for sp
get_single_ofm(site, site_dict)

Gets the orbital field matrix for a single chemical environment, where site is the center atom whose environment is characterized and site_dict is a dictionary of site : weight, where the weights are the Voronoi Polyhedra weights of the corresponding coordinating sites.

Args:
site (Site): center atom site_dict (dict of Site:float): chemical environment
Returns:
atom_ofm (size X size numpy matrix): ofm for site
get_structure_ofm(struct)

Calls get_mean_ofm on the results of get_atom_ofms to give a size X size matrix characterizing a structure

implementors()
class matminer.featurizers.structure.PartialRadialDistributionFunction(cutoff=20.0, bin_size=0.1)

Bases: matminer.featurizers.base.BaseFeaturizer

Compute the partial radial distribution function (PRDF) of a crystal structure, which is the radial distibution function broken down for each pair of atom types. The PRDF was proposed as a structural descriptor by [Schutt et al.] (https://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118) Args:

cutoff: (float) distance up to which to calculate the RDF. bin_size: (float) size of each bin of the (discrete) RDF.
__init__(cutoff=20.0, bin_size=0.1)
citations()
feature_labels()
featurize(s)

Get PRDF of the input structure. Args:

s: Pymatgen Structure object.
Returns:
prdf, dist: (tuple of arrays) the first element is a
dictionary where keys are tuples of element names and values are PRDFs.
implementors()
class matminer.featurizers.structure.RadialDistributionFunction(cutoff=20.0, bin_size=0.1)

Bases: matminer.featurizers.base.BaseFeaturizer

Calculate the radial distribution function (RDF) of a crystal structure. Args:

cutoff: (float) distance up to which to calculate the RDF. bin_size: (float) size of each bin of the (discrete) RDF.
__init__(cutoff=20.0, bin_size=0.1)
citations()
feature_labels()
featurize(s)

Get RDF of the input structure. Args:

s: Pymatgen Structure object.
Returns:
rdf, dist: (tuple of arrays) the first element is the
normalized RDF, whereas the second element is the inner radius of the RDF bin.
implementors()
class matminer.featurizers.structure.RadialDistributionFunctionPeaks(n_peaks=2)

Bases: matminer.featurizers.base.BaseFeaturizer

Determine the location of the highest peaks in the radial distribution function (RDF) of a structure. Args:

n_peaks: (int) number of the top peaks to return .
__init__(n_peaks=2)
citations()
feature_labels()
featurize(rdf)

Get location of highest peaks in RDF.

Args:
rdf: (ndarray) RDF as obtained from the
RadialDistributionFunction class.
Returns: (ndarray) distances of highest peaks in descending order
of the peak height
implementors()
class matminer.featurizers.structure.SineCoulombMatrix(diag_elems=True)

Bases: matminer.featurizers.base.BaseFeaturizer

This function generates a variant of the Coulomb matrix developed for periodic crystals by Faber et al. (Inter. J. Quantum Chem. 115, 16, 2015). It is identical to the Coulomb matrix, except that the inverse distance function is replaced by the inverse of a sin**2 function of the vector between the sites which is periodic in the dimensions of the structure lattice. See paper for details.

Args:
diag_elems (bool): flag indication whether (True, default) to use
the original definition of the diagonal elements; if set to False, the diagonal elements are set to 0
__init__(diag_elems=True)
citations()
feature_labels()
featurize(s)
Args:
s (Structure or Molecule): input structure (or molecule)
Returns:
(Nsites x Nsites matrix) Sine matrix.
implementors()
class matminer.featurizers.structure.SitesOrderParameters(features=None, stats=None, pneighs=None, bond_angles=None)

Bases: matminer.featurizers.base.BaseFeaturizer

Calculates all order parameters (OPs) for all sites in a crystal structure. Args:

features ([str]): list of order parameters supported by OrderParameters stats ([str]): list of weighted statistics to compute for each feature.

If stats is None, for each order parameter, a list is returned that contains the calculated parameter for each site in the structure. *Note for nth mode, stat must be ‘n*_mode’; e.g. stat=‘2nd_mode’
pneighs (dict): specification and parameters of neighbor-finding
approach (see get_neighbors_of_site_with_index).
bond_angles ([float]): list of bond angles for which order parameters
are calculated explicitly (in addition to features)
__init__(features=None, stats=None, pneighs=None, bond_angles=None)
citations()
feature_labels()
featurize(s)

Calculate all sites’ local structure order parameters (LSOPs).

Args:

s: Pymatgen Structure object.

Returns:
opvals: (2D array of floats) LSOP values of all sites’ (1st dimension) order parameters (2nd dimension). 46 order parameters are computed per site: q_cn (coordination number), q_lin, 35 x q_bent (starting with a target angle of 5 degrees and, increasing by 5 degrees, until 175 degrees), q_tet, q_oct, q_bcc, q_2, q_4, q_6, q_reg_tri, q_sq, q_sq_pyr.
static from_preset(preset_name)

Returns OrderParameters from a preset string. Args:

preset_name (str): options are ‘matminer’,

Returns:

implementors()
matminer.featurizers.structure.get_neighbors_of_site_with_index(struct, n, p=None)

Determine the neighbors around the site that has index n in the input Structure object struct, given the approach defined by parameters p. All supported neighbor-finding approaches and listed and explained in the following. All approaches start by creating a tentative list of neighbors using a large cutoff radius defined in parameter dictionary p via key “cutoff”. “min_dist”: find nearest neighbor and its distance d_nn; consider all

neighbors which are within a distance of d_nn * (1 + delta), where delta is an additional parameter provided in the dictionary p via key “delta”.
“scaled_VIRE”: compute the radii, r_i, of all sites on the basis of
the valence-ionic radius evaluator (VIRE); consider all neighbors for which the distance to the central site is less than the sum of the radii multiplied by an a priori chosen parameter, delta, (i.e., dist < delta * (r_central + r_neighbor)).
“min_relative_VIRE”: same approach as “min_dist”, except that we
use relative distances (i.e., distances divided by the sum of the atom radii from VIRE).
“min_relative_OKeeffe”: same approach as “min_relative_VIRE”, except
that we use the bond valence parameters from O’Keeffe’s bond valence method (J. Am. Chem. Soc. 1991, 3226-3229) to calculate relative distances.
Args:

struct (Structure): input structure. n (int): index of site in Structure object for which

neighbors are to be determined.
p (dict): specification (via “approach” key; default is “min_dist”)
and parameters of neighbor-finding approach. Default cutoff radius is 6 Angstrom (key: “cutoff”). Other default parameters are as follows. min_dist: “delta”: 0.15; min_relative_OKeeffe: “delta”: 0.05; min_relative_VIRE: “delta”: 0.05; scaled_VIRE: “delta”: 2.
Returns: ([site]) list of sites that are considered to be nearest
neighbors to site with index n in Structure object struct.
matminer.featurizers.structure.get_neighbors_of_site_with_index_future(struct, n, approach=u'min_dist', delta=0.1, cutoff=10.0)

Returns the neighbors of a given site using a specific neighbor-finding method.

Args:

struct (Structure): input structure. n (int): index of site in Structure object for which motif type

is to be determined.
approach (str): type of neighbor-finding approach, where
“min_dist” will use the MinimumDistanceNN class, “voronoi” the VoronoiNN class, “min_OKeeffe” the MinimumOKeeffe class, and “min_VIRE” the MinimumVIRENN class.

delta (float): tolerance involved in neighbor finding. cutoff (float): (large) radius to find tentative neighbors.

Returns: neighbor sites.

matminer.featurizers.structure.get_order_parameter_feature_vectors_difference(struct1, struct2, pneighs=None, convert_none_to_zero=True, delta_op=0.01, ignore_op_types=None)

Determine the difference vector between two order parameter-statistics feature vector resulting from two input structures.

Args:

struct1 (Structure): first input structure. struct2 (Structure): second input structure. pneighs (dict): specification and parameters of

neighbor-finding approach (see get_neighbors_of_site_with_index function for more details).
convert_none_to_zero (bool): flag indicating whether or not
to convert None values in OPs to zero (cf., get_order_parameters function).
delta_op (float): bin size of histogram that is computed
in order to identify peak locations (cf., get_order_parameters_stats function).
ignore_op_types ([str]): list of OP types to be ignored in
output dictionary (cf., get_order_parameters_stats function).
Returns: ([float]) difference vector between order
parameter-statistics feature vectors obtained from the two input structures (structure 1 - structure 2).
matminer.featurizers.structure.get_order_parameter_stats(struct, pneighs=None, convert_none_to_zero=True, delta_op=0.01, ignore_op_types=None, bond_angles=None)

Determine the order parameter statistics accumulated across all sites in Structure object struct using the get_order_parameters function.

Args:

struct (Structure): input structure. pneighs (dict): specification and parameters of

neighbor-finding approach (see get_neighbors_of_site_with_index function for more details).
convert_none_to_zero (bool): flag indicating whether or not
to convert None values in LSOPs to zero (cf., get_order_parameters function).
delta_op (float): bin size of histogram that is computed
in order to identify peak locations.
ignore_op_types ([str]): list of OP types to be ignored in
output dictionary (e.g., [“cn”, “bent”]). Default (None) will consider all OPs.
Returns: ({}) dictionary, the keys of which represent
the order parameter type (e.g., “bent5”, “tet”, “sq_pyr”) and the values of which are dictionaries carring the statistics (“min”, “max”, “mean”, “std”, “peak1”, “peak2”).

Module contents