rxn_insight package

Submodules

rxn_insight.classification module

Reaction classification module

class rxn_insight.classification.ReactionClassifier(reaction: str, rxn_mapper: RXNMapper | None = None, keep_mapping: bool = False, search_template: bool = True)[source]

Bases: object

This class handles operations related to chemical reaction classification.

balance_reaction(fgr: list[str], fgp: list[str]) list[str][source]

Balances the reaction based on functional groups present in reactants and products.

Parameters:
  • fgr (list[str]) – Functional groups in reactants.

  • fgp (list[str]) – Functional groups in products.

Returns:

A list of potential by-products or missing elements in the balanced reaction.

Return type:

list[str]

check_nos() bool[source]

Checks if nitrogen, oxygen, or sulfur atoms are involved in the reaction center.

Returns:

True if N, O, or S atoms are involved in the reaction center; otherwise, False.

Return type:

bool

classify_reaction() str[source]

Classifies the reaction based on its chemical characteristics and transformation patterns.

Returns:

The classification of the reaction, such as ‘Reduction’, ‘Oxidation’, etc.

Return type:

str

get_atom_mapping_indices() tuple[dict[int, int], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], int][source]

Generates a mapping from atom indices to their positions in the transformation matrix.

Returns:

Contains a dictionary for atom mapping to indices, arrays for atom numbers and mapping, and the matrix size.

Return type:

tuple

get_be_matrix(molecule: Mol) ndarray[Any, dtype[Any]][source]

Calculates the bond-electron matrix for the given molecule.

Parameters:

molecule (Mol) – The molecule for which to calculate the bond-electron matrix.

Returns:

A matrix representing the bond-electron relationships in the molecule.

Return type:

npt.NDArray[Any]

get_functional_group_smarts(molecule: Mol, matrix: ndarray[Any, dtype[Any]], map_dict: dict[int, int]) tuple[str, ...][source]

Identifies and returns SMARTS strings for functional groups in the molecule based on the specified matrix and mapping.

Parameters:
  • molecule (Mol) – The RDKit molecule object.

  • matrix (npt.NDArray[Any]) – A matrix representing chemical properties or structure.

  • map_dict (dict[int, int]) – Mapping of atom indices to their corresponding mapping numbers in the reaction.

Returns:

A tuple containing SMARTS strings of the identified functional groups.

Return type:

tuple[str, …]

get_functional_groups(mol: Mol, map_dict: dict[int, int], df: DataFrame) list[str][source]

Extracts functional groups from the molecule using the specified mapping and reference DataFrame.

Parameters:
  • mol (Mol) – The molecule from which to extract functional groups.

  • map_dict (dict[int, int]) – A dictionary mapping atom indices to mapping numbers.

  • df (pd.DataFrame) – DataFrame containing functional group definitions.

Returns:

A list of names of identified functional groups.

Return type:

list[str]

get_reaction_center_info(df: DataFrame) dict[str, list[str] | str | int][source]

Compiles detailed information about the reaction center from the reaction.

Parameters:

df (pd.DataFrame) – DataFrame containing additional data required for analysis.

Returns:

A dictionary containing detailed information about the reaction center.

Return type:

dict[str, Union[list[str], str, int]]

get_ring_type(mol: Mol, map_dict: dict[int, int] | None = None) list[str][source]

Determines the types of ring structures present in the molecule.

Parameters:
  • mol (Mol) – The molecule to analyze.

  • map_dict (Optional[dict[int, int]]) – Mapping of atom indices to their mapping numbers, if available.

Returns:

A list of ring types identified in the molecule.

Return type:

list[str]

get_template_smiles() str | None[source]

Generates a reaction SMILES from the reaction SMARTS template.

Returns:

The reaction SMILES of the reaction template, or None if no template is generated.

Return type:

str | None

is_acylation() bool[source]

Evaluates whether the reaction involves acylation, specifically focusing on the transformation around carbonyl groups.

Returns:

True if the reaction involves acylation, otherwise False.

Return type:

bool

is_aromatic_heterocycle() bool[source]

Assesses whether the reaction involves the formation or modification of an aromatic heterocycle.

Returns:

True if the reaction pertains to aromatic heterocycle changes, otherwise False.

Return type:

bool

is_cc_coupling() bool[source]

Checks if the reaction is a carbon-carbon coupling process.

Returns:

True if the reaction involves carbon-carbon coupling, otherwise False.

Return type:

bool

is_deprotection() bool[source]

Evaluates whether the reaction is a deprotection, which involves the removal of protective groups from functional sites.

Returns:

True if the reaction is a deprotection process, otherwise False.

Return type:

bool

is_fga() bool[source]

Determines if the reaction involves the addition of functional groups to the existing molecular framework.

Returns:

True if the reaction is classified as functional group addition, otherwise False.

Return type:

bool

is_fgi() bool[source]

Determines if the reaction involves a functional group interconversion (FGI).

Returns:

True if the reaction is classified as a functional group interconversion, otherwise False.

Return type:

bool

is_heteroatom_alkylation() bool[source]

Determines if the reaction involves alkylation of heteroatoms (N, O, S).

Returns:

True if the reaction is a heteroatom alkylation, otherwise False.

Return type:

bool

is_oxidation() bool[source]

Checks if the reaction is an oxidation by examining changes in oxidation states and the involvement of key functional groups.

Returns:

True if the reaction involves oxidation, otherwise False.

Return type:

bool

is_protection() bool[source]

Determines if the reaction is a protection, which involves adding protective groups to functional sites.

Returns:

True if the reaction is classified as a protection process, otherwise False.

Return type:

bool

is_reduction() bool[source]

Determines if the reaction is a reduction process based on the change in oxidation states and functional group transformation.

Returns:

True if the reaction can be classified as a reduction, otherwise False.

Return type:

bool

name_reaction(smirks_db: DataFrame) str[source]

Determines the name of the reaction from a database based on SMIRKS transformations.

Parameters:

smirks_db (pd.DataFrame) – DataFrame containing SMIRKS patterns and corresponding reaction names.

Returns:

The name of the reaction, or ‘OtherReaction’ if no specific name can be determined.

Return type:

str

remove_metals_and_halogens() tuple[ndarray[Any, dtype[Any]], ...][source]

Removes metal and halogen atoms from the R-matrix, as they are generally not useful for reaction classification.

Returns:

A tuple containing the sanitized transformation matrix, reaction center atoms, and mappings.

Return type:

tuple[npt.NDArray[Any], …]

ring_changing() int[source]

Calculates the net change in the number of ring structures between reactants and products.

Returns:

The net change in the number of rings; positive for ring formation, negative for ring breaking.

Return type:

int

sanitize_r_matrix() tuple[ndarray[Any, dtype[Any]], ...][source]

Sanitizes the R-matrix by removing all-zero rows and columns.

Returns:

A tuple containing the cleaned R-matrix and arrays for atom numbers and mappings.

Return type:

tuple[npt.NDArray[Any], …]

rxn_insight.database module

class rxn_insight.database.Database(df: DataFrame | None = None)[source]

Bases: object

A class to manage and analyze reaction datasets, providing functionalities for creating databases, analyzing reactions, and saving results.

Example

>>> from rxn_insight.database import Database
>>> import pandas as pd
>>> # Create a sample DataFrame
>>> data = {
...     "reaction": ["OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1"],
...     "solvent": ["CN(C)C=O"],
...     "reagent": ["F[Cs]"],
...     "catalyst": ["[Pd]"],
...     "yield": [85],
...     "reference": ["Ref1"]
... }
>>> df = pd.DataFrame(data)
>>> # Initialize a Database object
>>> db = Database()
>>> # Create a database from the DataFrame
>>> reaction_df = db.create_database_from_df(
...     df,
...     reaction_column="reaction",
...     solvent_column="solvent",
...     reagent_column="reagent",
...     catalyst_column="catalyst",
...     yield_column="yield",
...     ref_column="reference"
...     )
create_database_from_csv(fname: str, reaction_column: str, solvent_column: str = 'SOLVENT', reagent_column: str = 'REAGENT', catalyst_column: str = 'CATALYST', yield_column: str = 'YIELD', ref_column: str = 'REF') DataFrame[source]

Creates a reaction database from a CSV file.

Parameters:
  • fname – Path to the CSV file.

  • reaction_column – Name of the column containing reaction SMILES.

  • solvent_column – Name of the solvent column (default: “SOLVENT”).

  • reagent_column – Name of the reagent column (default: “REAGENT”).

  • catalyst_column – Name of the catalyst column (default: “CATALYST”).

  • yield_column – Name of the yield column (default: “YIELD”).

  • ref_column – Name of the reference column (default: “REF”).

Returns:

A DataFrame with analyzed reaction data.

create_database_from_df(df: DataFrame, reaction_column: str, solvent_column: str = 'SOLVENT', reagent_column: str = 'REAGENT', catalyst_column: str = 'CATALYST', yield_column: str = 'YIELD', ref_column: str = 'REF') DataFrame[source]

Creates a reaction database from a given DataFrame.

Parameters:
  • df – A DataFrame containing reaction data.

  • reaction_column – Name of the column containing reaction SMILES.

  • solvent_column – Name of the solvent column (default: “SOLVENT”).

  • reagent_column – Name of the reagent column (default: “REAGENT”).

  • catalyst_column – Name of the catalyst column (default: “CATALYST”).

  • yield_column – Name of the yield column (default: “YIELD”).

  • ref_column – Name of the reference column (default: “REF”).

Returns:

A DataFrame with analyzed reaction data.

get_class_distribution()[source]

Retrieves the class distribution of reactions in the database.

Returns:

A DataFrame summarizing the reaction class distribution.

get_name_distribution()[source]

Retrieves the distribution of reaction names in the database.

Returns:

A DataFrame summarizing the reaction name distribution.

save_to_csv(fname: str)[source]

Saves the reaction database to a CSV file.

Parameters:

fname – The name of the output file (without extension).

save_to_excel(fname: str)[source]

Saves the reaction database to an Excel file.

Parameters:

fname – The name of the output file (without extension).

save_to_parquet(fname: str)[source]

Saves the reaction database to a Parquet file.

Parameters:

fname – The name of the output file (without extension).

rxn_insight.database.analyze_reactions(df: DataFrame) Tuple[DataFrame, List[str]][source]

Analyzes a DataFrame of reactions to extract detailed information.

Parameters:

df – A DataFrame with reaction data.

Returns:

A tuple containing the updated DataFrame and a list of skipped reactions.

rxn_insight.database.calculate_class_distribution(df: DataFrame) DataFrame[source]

Calculates the distribution of reaction classes.

Parameters:

df – A DataFrame containing reaction data.

Returns:

A DataFrame summarizing reaction class counts.

rxn_insight.database.calculate_name_distribution(df: DataFrame) DataFrame[source]

Calculates the distribution of reaction names.

Parameters:

df – A DataFrame containing reaction data.

Returns:

A DataFrame summarizing reaction name counts.

rxn_insight.reaction module

Reaction module

class rxn_insight.reaction.Molecule(smi: str)[source]

Bases: object

This class reads in SMILES.

get_functional_groups(df: DataFrame = None) list[str][source]

Identifies and returns the functional groups present in the molecule.

Parameters:

df – Optional DataFrame containing functional group patterns; loads default if not provided.

get_rings() list[str][source]

Identifies and returns rings in the molecule.

search_reactions(df: DataFrame) DataFrame[source]

Searches for reactions involving the molecule as a product.

Parameters:

df – The DataFrame to search for reactions.

search_reactions_by_scaffold(df: DataFrame, threshold: float = 0.5, max_return: int = 100, fp: str = 'MACCS') DataFrame[source]

Searches for reactions based on scaffold similarity.

Parameters:
  • df – DataFrame containing reactions to search.

  • threshold – Similarity threshold to apply.

  • max_return – Maximum number of reactions to return.

  • fp – Type of fingerprint to use for similarity calculation.

class rxn_insight.reaction.Reaction(reaction: str, solvent: str = '', reagent: str = '', catalyst: str = '', ref: str = '', rxn_mapper: RXNMapper | None = None, keep_mapping: bool = False, smirks: DataFrame = None, fg: DataFrame = None, search_template: bool = True)[source]

Bases: object

Handles operations related to chemical reactions.

This class facilitates various operations on chemical reactions, such as parsing reaction strings, identifying components like solvents and reagents, classifying reactions, and analyzing ring structures.

reaction

The SMILES representation of the reaction.

Type:

str

solvent

Solvents used in the reaction.

Type:

str

reagent

Reagents used in the reaction.

Type:

str

catalyst

Catalysts used in the reaction.

Type:

str

reference

Reference or note associated with the reaction.

Type:

str

smirks_db

Database of SMIRKS transformations.

Type:

pd.DataFrame

fg_db

Functional group data.

Type:

pd.DataFrame

classifier

Reaction classification object.

Type:

ReactionClassifier

reactants

SMILES string of the reactants.

Type:

str

products

SMILES string of the products.

Type:

str

mapped_reaction

Reaction with atom mappings included.

Type:

str

reaction_class

Class of the reaction.

Type:

str

template

Reaction template derived from the classifier.

Type:

str

reaction_info

Additional information about the reaction.

Type:

dict

tag

Optional tag for the reaction.

Type:

str

name

Optional name of the reaction.

Type:

str

byproducts

Tuple of byproducts in the reaction.

Type:

tuple

scaffold

Molecular scaffold of the reaction.

Type:

str

neighbors

Placeholder for reaction neighborhood information.

Type:

Any

suggested_solvent

Suggested solvent for the reaction.

Type:

str

suggested_catalyst

Suggested catalyst for the reaction.

Type:

str

suggested_reagent

Suggested reagent for the reaction.

Type:

str

Example

>>> from rxn_insight.reaction import Reaction
>>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1")
>>> ri = rxn.get_reaction_info()
>>> print(ri)
{'REACTION': 'Brc1ccccc1.OB(O)c1ccccc1>>c1ccc(-c2ccccc2)cc1',
'MAPPED_REACTION': 'Br[c:5]1[cH:6][cH:7][cH:8][cH:9][cH:10]1.OB(O)[c:4]1[cH:3][cH:2][cH:1][cH:12][cH:11]1>>[cH:1]1[cH:2][cH:3][c:4](-[c:5]2[cH:6][cH:7][cH:8][cH:9][cH:10]2)[cH:11][cH:12]1',
'N_REACTANTS': 2, 'N_PRODUCTS': 1, 'FG_REACTANTS': ['Aromatic halide', 'Boronic acid'], 'FG_PRODUCTS': [],
'PARTICIPATING_RINGS_REACTANTS': ['c1ccccc1', 'c1ccccc1'], 'PARTICIPATING_RINGS_PRODUCTS': ['c1ccccc1', 'c1ccccc1'],
'ALL_RINGS_PRODUCTS': ['c1ccccc1', 'c1ccccc1'], 'BY-PRODUCTS': ['HBr', 'B'], 'CLASS': 'C-C Coupling',
'TAG': 'd79a78c79f0c392f0911481acf5c300cc98205269acdb93c24fb610a61c4c868', 'SOLVENT': [''], 'REAGENT': [''],
'CATALYST': [''], 'REF': '', 'NAME': 'Suzuki coupling with boronic acids', 'SCAFFOLD': 'c1ccc(-c2ccccc2)cc1'}
add_agents() None[source]

Adds agents identified by the classifier to the reagent list.

find_neighbors(df: DataFrame, fp: str = 'MACCS', concatenate: bool = True, max_return: int = 100, threshold: float = 0.3, broaden: bool = False, full_search: bool = False) DataFrame[source]

Finds and returns similar reactions in the database.

Parameters:
  • df – The DataFrame to search within.

  • fp – The type of fingerprint to use, ‘MACCS’ or ‘Morgan’.

  • concatenate – Whether to concatenate patterns in fingerprinting.

  • max_return – Maximum number of similar reactions to return.

  • threshold – The similarity threshold to consider for matching.

  • broaden – Whether to use a broadened search criteria based on tags.

  • full_search – If true, performs an exhaustive search across the database.

Example

>>> from rxn_insight.reaction import Reaction
>>> df_uspto = pd.read_parquet("uspto_rxn_insight.gzip")  # Download: https://zenodo.org/records/10171745
>>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1")
>>> df_neighbors = rxn.find_neighbors(df_uspto)
get_byproducts() list[str][source]

Calculates and returns byproducts of the reaction based on functional group analysis.

get_class() str[source]

Determines and returns the class of the reaction.

get_functional_groups() tuple[list[str], ...][source]

Identifies and returns functional groups in reactants and products.

get_name() str[source]

Determines and returns the name of the reaction based on SMIRKS data.

get_reaction_center() str | None[source]

Returns the reaction center SMILES string if available.

get_reaction_info() dict[str, list[str] | str][source]

This function compiles all reaction-related information at once. Upon calling this function, the T-matrix of the reaction will be calculated, a class and name will be assigned, the functional groups, rings, and scaffold of the reaction are determined. All information is returned as a dictionary.

get_rings_in_products() list[str][source]

Identifies and returns ring structures in the reaction products.

get_rings_in_reactants() list[str][source]

Identifies and returns ring structures in the reaction reactants.

get_rings_in_reaction_center() tuple[list[str], ...][source]

Identifies and returns rings in the reaction center for reactants and products.

get_scaffold() str | None[source]

Extracts and returns the molecular scaffold of the product.

give_broad_tag() str[source]

Generates a broadened tag for the reaction based on its characteristics.

read_reaction(reaction: str) None[source]

Processes a reaction string in SMILES format.

Parameters:

reaction (str) – Reaction string in SMILES format, with components separated by >.

suggest_conditions(df: DataFrame) dict[str, DataFrame][source]

Suggests reaction conditions based on similar reactions found.

Parameters:

df – The DataFrame containing reaction data to analyze.

Example

>>> from rxn_insight.reaction import Reaction
>>> df_uspto = pd.read_parquet("uspto_rxn_insight.gzip")  # Download: https://zenodo.org/records/10171745
>>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1")
>>> df_conditions = rxn.suggest_conditions(df_uspto)

rxn_insight.representation module

rxn_insight.representation.get_morgan_fingerprint(mol: Mol) ndarray[Any, dtype[Any]][source]

Get the ECFP4 fingerprint of a molecule. :param mol: RDKit Mol object :return: NumPy array

rxn_insight.representation.morgan_reaction_fingerprint(rxn: str) ndarray[Any, dtype[Any]][source]

Obtain the Morgan-based fingerprint of a reaction à la Schneider: https://doi.org/10.1021/ci5006614 :param rxn: Reaction SMILES :return: NumPy array

rxn_insight.utils module

rxn_insight.utils.atom_remover(mol: Mol, matches: list[list[int]]) Mol[source]
rxn_insight.utils.calculate_braycurtis_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Bray-Curtis similarity between two vectors.

This metric measures dissimilarity as the proportion of the absolute differences.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Bray-Curtis similarity.

Return type:

float

rxn_insight.utils.calculate_canberra_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Canberra similarity between two vectors.

The Canberra similarity is based on the sum of absolute differences scaled by the sum of vector elements. It emphasizes differences in smaller values.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Canberra similarity (1 - Canberra distance).

Return type:

float

rxn_insight.utils.calculate_chebyshev_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Chebyshev similarity between two vectors.

The Chebyshev similarity is derived from the maximum absolute difference between elements in the two vectors.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Chebyshev similarity (1 - Chebyshev distance).

Return type:

float

rxn_insight.utils.calculate_correlation_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Correlation similarity between two vectors.

The Correlation similarity measures the linear relationship between vectors, normalized to account for scale differences.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Correlation similarity (1 - Correlation distance).

Return type:

float

rxn_insight.utils.calculate_cosine_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Cosine similarity between two vectors.

The Cosine similarity measures the cosine of the angle between two vectors, indicating their directional similarity.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Cosine similarity (1 - Cosine distance).

Return type:

float

rxn_insight.utils.calculate_dice_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Dice similarity between two vectors.

Dice similarity is used for binary vectors and measures overlap between sets.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Dice similarity (1 - Dice distance).

Return type:

float

rxn_insight.utils.calculate_euclidean_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Euclidean similarity between two vectors.

The Euclidean similarity is derived from the straight-line distance between two points in multidimensional space.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Euclidean similarity (1 - Euclidean distance).

Return type:

float

rxn_insight.utils.calculate_jaccard_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Jaccard similarity between two vectors.

The Jaccard similarity measures the proportion of shared elements in two sets. It is commonly used for binary vectors, indicating how similar the sets are.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Jaccard similarity (1 - Jaccard distance).

Return type:

float

rxn_insight.utils.calculate_kulczynksi1_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Kulczynski 1 similarity between two vectors.

Kulczynski similarity measures the average overlap between two sets.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Kulczynski 1 similarity.

Return type:

float

rxn_insight.utils.calculate_manhattan_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Manhattan similarity between two vectors.

The Manhattan similarity (or city block similarity) is the sum of the absolute differences between elements in the vectors.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Manhattan similarity (1 - Manhattan distance).

Return type:

float

rxn_insight.utils.calculate_minkowski_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]], p: float = 3.0) float[source]

Calculate the Minkowski similarity between two vectors.

The Minkowski similarity generalizes the Manhattan and Euclidean similarities using a parameter p to determine the metric.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

  • p (float) – The order of the Minkowski metric (default: 3.0).

Returns:

Minkowski similarity (1 - Minkowski distance).

Return type:

float

rxn_insight.utils.calculate_rogerstanimoto_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Rogers-Tanimoto similarity between two vectors.

This metric evaluates similarity for binary vectors based on mismatched proportions.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Rogers-Tanimoto similarity.

Return type:

float

rxn_insight.utils.calculate_russellrao_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Russell-Rao similarity between two vectors.

Russell-Rao similarity evaluates the proportion of matching 1’s in binary vectors.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Russell-Rao similarity.

Return type:

float

rxn_insight.utils.calculate_sokalmichener_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Sokal-Michener similarity between two vectors.

This metric is used for binary data, emphasizing matching 1’s and 0’s equally.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Sokal-Michener similarity.

Return type:

float

rxn_insight.utils.calculate_sokalsneath_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Sokal-Sneath similarity between two vectors.

Sokal-Sneath similarity emphasizes 1’s in binary data.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Sokal-Sneath similarity.

Return type:

float

rxn_insight.utils.calculate_yule_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float[source]

Calculate the Yule similarity between two vectors.

Parameters:
  • v1 (npt.NDArray[Any]) – First vector.

  • v2 (npt.NDArray[Any]) – Second vector.

Returns:

Yule similarity.

Return type:

float

rxn_insight.utils.check_rings(atom: Atom, mol: Mol, match: list[int]) tuple[bool, list[int]][source]
rxn_insight.utils.curate_smirks(df: DataFrame) DataFrame[source]

Make the SMIRKS database fit to the required format. :param df: Pandas DataFrame :return: Curated SMIRKS database

rxn_insight.utils.draw_chemical_reaction(smiles: str, highlightByReactant: bool = False, font_scale: float = 1.5) str[source]
rxn_insight.utils.extract_from_reaction(reaction: dict[str, str | int], radius_reactants: int = 2, radius_products: int = 1) dict[str, str | int][source]

Extract the reaction template from mapped reaction SMILES. Code adapted from https://doi.org/10.1021/acs.jcim.9b00286. :param reaction: Dictionary with keys ‘reactants’ and ‘products’. :param radius_reactants: Radius of atoms around the reaction center in the reactants :param radius_products: Radius of atoms around the reaction center in the products :return: dictionary with template information

rxn_insight.utils.get_atom_mapping(rxn: str, rxn_mapper: RXNMapper | None = None) str[source]

This function maps reactants and products using RXNMapper (https://doi.org/10.1126/sciadv.abe4166) :param rxn_mapper: RXNMapper object :param rxn: Reaction SMILES without atom mapping :return: Reaction SMILES with atom mapping

rxn_insight.utils.get_catalyst_ranking(df: DataFrame) DataFrame[source]
rxn_insight.utils.get_fp(rxn: str, fp: str = 'MACCS', concatenate: bool = True) ndarray[Any, dtype[Any]][source]
rxn_insight.utils.get_map_index(mol: Mol) dict[int, int][source]
rxn_insight.utils.get_reaction_template(reaction: str, radius_reactants: int = 2, radius_products: int = 2) str | None[source]

Get the reaction template from a mapped reaction. :param reaction: Mapped Reaction SMILES :param radius_reactants: Radius of atoms around the reaction center in the reactants :param radius_products: Radius of atoms around the reaction center in the products :return: Reaction template in forward direction

rxn_insight.utils.get_reagent_ranking(df: DataFrame) DataFrame[source]
rxn_insight.utils.get_ring_systems(mol: Mol, include_spiro: bool = False) list[list[int]][source]

Code taken from https://gist.github.com/greglandrum/de1751a42b3cae54011041dd67ae7415 :param mol: RDKit Mol object :param include_spiro: :return: List with atoms that make up the ring systems

rxn_insight.utils.get_scaffold(mol: Mol) str | None[source]

Get the Murcko scaffold of a molecule :param mol: RDKit Mol object :return: SMILES string

rxn_insight.utils.get_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]], metric: str = 'jaccard') float[source]

Calculate the similarity between two fingerprints using a specified metric.

Supported metrics include: - Binary metrics: jaccard, dice, kulczynski1, rogerstanimoto, russellrao, sokalmichener, sokalsneath, yule. - Distance-based metrics: braycurtis, canberra, chebyshev, manhattan, correlation, cosine, euclidean, minkowski.

Parameters:
  • v1 (npt.NDArray[Any]) – Reference fingerprint as a NumPy array.

  • v2 (npt.NDArray[Any]) – Fingerprint to compare as a NumPy array.

  • metric (str) – Metric to calculate the similarity. Default is “jaccard”.

Returns:

Calculated similarity score.

Return type:

float

Raises:

ValueError – If an unsupported metric is provided.

Example

>>> import numpy as np
>>> v1 = np.array([1, 0, 1, 1])
>>> v2 = np.array([1, 1, 1, 0])
>>> get_similarity(v1, v2, metric="jaccard")
0.6666666666666667
rxn_insight.utils.get_solvent_ranking(df: DataFrame) DataFrame[source]
rxn_insight.utils.maccs_fp(mol: Mol) ndarray[Any, dtype[Any]][source]
rxn_insight.utils.make_rdkit_fp(rxn: str, fp: str = 'MACCS', concatenate: bool = True) str[source]
rxn_insight.utils.morgan_fp(mol: Mol) ndarray[Any, dtype[Any]][source]
rxn_insight.utils.moveAtomMapsToNotes(m: Mol) None[source]
rxn_insight.utils.remove_atom_mapping(rxn: str, smarts: bool = False) str[source]

This function removes the mapping from mapped Reaction SMILES. :param smarts: SMIRKS instead of Reaction SMILES :param rxn: Reaction SMILES with mapping :return: Reaction SMILES without mapping

rxn_insight.utils.remove_molecule_mapping(ring: Mol) str[source]
rxn_insight.utils.sanitize_mapped_reaction(rxn: str) tuple[str, str, list[str]][source]

Remove reactants that are unmapped from the reactants. :param rxn: Reaction SMILES with atom mapping :return: Mapped and unmapped reaction SMILES without reagents.

rxn_insight.utils.sanitize_ring(mol: Mol) str[source]
rxn_insight.utils.tag_reaction(rxn_info: dict[str, list[str] | str]) str[source]

Module contents