rxn_insight package
Submodules
rxn_insight.classification module
Reaction classification module
- class rxn_insight.classification.ReactionClassifier(reaction: str, rxn_mapper: RXNMapper | None = None, keep_mapping: bool = False, search_template: bool = True)[source]
Bases:
object
This class handles operations related to chemical reaction classification.
- balance_reaction(fgr: list[str], fgp: list[str]) list[str] [source]
Balances the reaction based on functional groups present in reactants and products.
- Parameters:
fgr (list[str]) – Functional groups in reactants.
fgp (list[str]) – Functional groups in products.
- Returns:
A list of potential by-products or missing elements in the balanced reaction.
- Return type:
list[str]
- check_nos() bool [source]
Checks if nitrogen, oxygen, or sulfur atoms are involved in the reaction center.
- Returns:
True if N, O, or S atoms are involved in the reaction center; otherwise, False.
- Return type:
bool
- classify_reaction() str [source]
Classifies the reaction based on its chemical characteristics and transformation patterns.
- Returns:
The classification of the reaction, such as ‘Reduction’, ‘Oxidation’, etc.
- Return type:
str
- get_atom_mapping_indices() tuple[dict[int, int], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], int] [source]
Generates a mapping from atom indices to their positions in the transformation matrix.
- Returns:
Contains a dictionary for atom mapping to indices, arrays for atom numbers and mapping, and the matrix size.
- Return type:
tuple
- get_be_matrix(molecule: Mol) ndarray[Any, dtype[Any]] [source]
Calculates the bond-electron matrix for the given molecule.
- Parameters:
molecule (Mol) – The molecule for which to calculate the bond-electron matrix.
- Returns:
A matrix representing the bond-electron relationships in the molecule.
- Return type:
npt.NDArray[Any]
- get_functional_group_smarts(molecule: Mol, matrix: ndarray[Any, dtype[Any]], map_dict: dict[int, int]) tuple[str, ...] [source]
Identifies and returns SMARTS strings for functional groups in the molecule based on the specified matrix and mapping.
- Parameters:
molecule (Mol) – The RDKit molecule object.
matrix (npt.NDArray[Any]) – A matrix representing chemical properties or structure.
map_dict (dict[int, int]) – Mapping of atom indices to their corresponding mapping numbers in the reaction.
- Returns:
A tuple containing SMARTS strings of the identified functional groups.
- Return type:
tuple[str, …]
- get_functional_groups(mol: Mol, map_dict: dict[int, int], df: DataFrame) list[str] [source]
Extracts functional groups from the molecule using the specified mapping and reference DataFrame.
- Parameters:
mol (Mol) – The molecule from which to extract functional groups.
map_dict (dict[int, int]) – A dictionary mapping atom indices to mapping numbers.
df (pd.DataFrame) – DataFrame containing functional group definitions.
- Returns:
A list of names of identified functional groups.
- Return type:
list[str]
- get_reaction_center_info(df: DataFrame) dict[str, list[str] | str | int] [source]
Compiles detailed information about the reaction center from the reaction.
- Parameters:
df (pd.DataFrame) – DataFrame containing additional data required for analysis.
- Returns:
A dictionary containing detailed information about the reaction center.
- Return type:
dict[str, Union[list[str], str, int]]
- get_ring_type(mol: Mol, map_dict: dict[int, int] | None = None) list[str] [source]
Determines the types of ring structures present in the molecule.
- Parameters:
mol (Mol) – The molecule to analyze.
map_dict (Optional[dict[int, int]]) – Mapping of atom indices to their mapping numbers, if available.
- Returns:
A list of ring types identified in the molecule.
- Return type:
list[str]
- get_template_smiles() str | None [source]
Generates a reaction SMILES from the reaction SMARTS template.
- Returns:
The reaction SMILES of the reaction template, or None if no template is generated.
- Return type:
str | None
- is_acylation() bool [source]
Evaluates whether the reaction involves acylation, specifically focusing on the transformation around carbonyl groups.
- Returns:
True if the reaction involves acylation, otherwise False.
- Return type:
bool
- is_aromatic_heterocycle() bool [source]
Assesses whether the reaction involves the formation or modification of an aromatic heterocycle.
- Returns:
True if the reaction pertains to aromatic heterocycle changes, otherwise False.
- Return type:
bool
- is_cc_coupling() bool [source]
Checks if the reaction is a carbon-carbon coupling process.
- Returns:
True if the reaction involves carbon-carbon coupling, otherwise False.
- Return type:
bool
- is_deprotection() bool [source]
Evaluates whether the reaction is a deprotection, which involves the removal of protective groups from functional sites.
- Returns:
True if the reaction is a deprotection process, otherwise False.
- Return type:
bool
- is_fga() bool [source]
Determines if the reaction involves the addition of functional groups to the existing molecular framework.
- Returns:
True if the reaction is classified as functional group addition, otherwise False.
- Return type:
bool
- is_fgi() bool [source]
Determines if the reaction involves a functional group interconversion (FGI).
- Returns:
True if the reaction is classified as a functional group interconversion, otherwise False.
- Return type:
bool
- is_heteroatom_alkylation() bool [source]
Determines if the reaction involves alkylation of heteroatoms (N, O, S).
- Returns:
True if the reaction is a heteroatom alkylation, otherwise False.
- Return type:
bool
- is_oxidation() bool [source]
Checks if the reaction is an oxidation by examining changes in oxidation states and the involvement of key functional groups.
- Returns:
True if the reaction involves oxidation, otherwise False.
- Return type:
bool
- is_protection() bool [source]
Determines if the reaction is a protection, which involves adding protective groups to functional sites.
- Returns:
True if the reaction is classified as a protection process, otherwise False.
- Return type:
bool
- is_reduction() bool [source]
Determines if the reaction is a reduction process based on the change in oxidation states and functional group transformation.
- Returns:
True if the reaction can be classified as a reduction, otherwise False.
- Return type:
bool
- name_reaction(smirks_db: DataFrame) str [source]
Determines the name of the reaction from a database based on SMIRKS transformations.
- Parameters:
smirks_db (pd.DataFrame) – DataFrame containing SMIRKS patterns and corresponding reaction names.
- Returns:
The name of the reaction, or ‘OtherReaction’ if no specific name can be determined.
- Return type:
str
- remove_metals_and_halogens() tuple[ndarray[Any, dtype[Any]], ...] [source]
Removes metal and halogen atoms from the R-matrix, as they are generally not useful for reaction classification.
- Returns:
A tuple containing the sanitized transformation matrix, reaction center atoms, and mappings.
- Return type:
tuple[npt.NDArray[Any], …]
rxn_insight.database module
- class rxn_insight.database.Database(df: DataFrame | None = None)[source]
Bases:
object
A class to manage and analyze reaction datasets, providing functionalities for creating databases, analyzing reactions, and saving results.
Example
>>> from rxn_insight.database import Database >>> import pandas as pd >>> # Create a sample DataFrame >>> data = { ... "reaction": ["OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1"], ... "solvent": ["CN(C)C=O"], ... "reagent": ["F[Cs]"], ... "catalyst": ["[Pd]"], ... "yield": [85], ... "reference": ["Ref1"] ... } >>> df = pd.DataFrame(data) >>> # Initialize a Database object >>> db = Database() >>> # Create a database from the DataFrame >>> reaction_df = db.create_database_from_df( ... df, ... reaction_column="reaction", ... solvent_column="solvent", ... reagent_column="reagent", ... catalyst_column="catalyst", ... yield_column="yield", ... ref_column="reference" ... )
- create_database_from_csv(fname: str, reaction_column: str, solvent_column: str = 'SOLVENT', reagent_column: str = 'REAGENT', catalyst_column: str = 'CATALYST', yield_column: str = 'YIELD', ref_column: str = 'REF') DataFrame [source]
Creates a reaction database from a CSV file.
- Parameters:
fname – Path to the CSV file.
reaction_column – Name of the column containing reaction SMILES.
solvent_column – Name of the solvent column (default: “SOLVENT”).
reagent_column – Name of the reagent column (default: “REAGENT”).
catalyst_column – Name of the catalyst column (default: “CATALYST”).
yield_column – Name of the yield column (default: “YIELD”).
ref_column – Name of the reference column (default: “REF”).
- Returns:
A DataFrame with analyzed reaction data.
- create_database_from_df(df: DataFrame, reaction_column: str, solvent_column: str = 'SOLVENT', reagent_column: str = 'REAGENT', catalyst_column: str = 'CATALYST', yield_column: str = 'YIELD', ref_column: str = 'REF') DataFrame [source]
Creates a reaction database from a given DataFrame.
- Parameters:
df – A DataFrame containing reaction data.
reaction_column – Name of the column containing reaction SMILES.
solvent_column – Name of the solvent column (default: “SOLVENT”).
reagent_column – Name of the reagent column (default: “REAGENT”).
catalyst_column – Name of the catalyst column (default: “CATALYST”).
yield_column – Name of the yield column (default: “YIELD”).
ref_column – Name of the reference column (default: “REF”).
- Returns:
A DataFrame with analyzed reaction data.
- get_class_distribution()[source]
Retrieves the class distribution of reactions in the database.
- Returns:
A DataFrame summarizing the reaction class distribution.
- get_name_distribution()[source]
Retrieves the distribution of reaction names in the database.
- Returns:
A DataFrame summarizing the reaction name distribution.
- save_to_csv(fname: str)[source]
Saves the reaction database to a CSV file.
- Parameters:
fname – The name of the output file (without extension).
- rxn_insight.database.analyze_reactions(df: DataFrame) Tuple[DataFrame, List[str]] [source]
Analyzes a DataFrame of reactions to extract detailed information.
- Parameters:
df – A DataFrame with reaction data.
- Returns:
A tuple containing the updated DataFrame and a list of skipped reactions.
rxn_insight.reaction module
Reaction module
- class rxn_insight.reaction.Molecule(smi: str)[source]
Bases:
object
This class reads in SMILES.
- get_functional_groups(df: DataFrame = None) list[str] [source]
Identifies and returns the functional groups present in the molecule.
- Parameters:
df – Optional DataFrame containing functional group patterns; loads default if not provided.
- search_reactions(df: DataFrame) DataFrame [source]
Searches for reactions involving the molecule as a product.
- Parameters:
df – The DataFrame to search for reactions.
- search_reactions_by_scaffold(df: DataFrame, threshold: float = 0.5, max_return: int = 100, fp: str = 'MACCS') DataFrame [source]
Searches for reactions based on scaffold similarity.
- Parameters:
df – DataFrame containing reactions to search.
threshold – Similarity threshold to apply.
max_return – Maximum number of reactions to return.
fp – Type of fingerprint to use for similarity calculation.
- class rxn_insight.reaction.Reaction(reaction: str, solvent: str = '', reagent: str = '', catalyst: str = '', ref: str = '', rxn_mapper: RXNMapper | None = None, keep_mapping: bool = False, smirks: DataFrame = None, fg: DataFrame = None, search_template: bool = True)[source]
Bases:
object
Handles operations related to chemical reactions.
This class facilitates various operations on chemical reactions, such as parsing reaction strings, identifying components like solvents and reagents, classifying reactions, and analyzing ring structures.
- reaction
The SMILES representation of the reaction.
- Type:
str
- solvent
Solvents used in the reaction.
- Type:
str
- reagent
Reagents used in the reaction.
- Type:
str
- catalyst
Catalysts used in the reaction.
- Type:
str
- reference
Reference or note associated with the reaction.
- Type:
str
- smirks_db
Database of SMIRKS transformations.
- Type:
pd.DataFrame
- fg_db
Functional group data.
- Type:
pd.DataFrame
- classifier
Reaction classification object.
- Type:
- reactants
SMILES string of the reactants.
- Type:
str
- products
SMILES string of the products.
- Type:
str
- mapped_reaction
Reaction with atom mappings included.
- Type:
str
- reaction_class
Class of the reaction.
- Type:
str
- template
Reaction template derived from the classifier.
- Type:
str
- reaction_info
Additional information about the reaction.
- Type:
dict
- tag
Optional tag for the reaction.
- Type:
str
- name
Optional name of the reaction.
- Type:
str
- byproducts
Tuple of byproducts in the reaction.
- Type:
tuple
- scaffold
Molecular scaffold of the reaction.
- Type:
str
- neighbors
Placeholder for reaction neighborhood information.
- Type:
Any
- suggested_solvent
Suggested solvent for the reaction.
- Type:
str
- suggested_catalyst
Suggested catalyst for the reaction.
- Type:
str
- suggested_reagent
Suggested reagent for the reaction.
- Type:
str
Example
>>> from rxn_insight.reaction import Reaction >>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1") >>> ri = rxn.get_reaction_info() >>> print(ri) {'REACTION': 'Brc1ccccc1.OB(O)c1ccccc1>>c1ccc(-c2ccccc2)cc1', 'MAPPED_REACTION': 'Br[c:5]1[cH:6][cH:7][cH:8][cH:9][cH:10]1.OB(O)[c:4]1[cH:3][cH:2][cH:1][cH:12][cH:11]1>>[cH:1]1[cH:2][cH:3][c:4](-[c:5]2[cH:6][cH:7][cH:8][cH:9][cH:10]2)[cH:11][cH:12]1', 'N_REACTANTS': 2, 'N_PRODUCTS': 1, 'FG_REACTANTS': ['Aromatic halide', 'Boronic acid'], 'FG_PRODUCTS': [], 'PARTICIPATING_RINGS_REACTANTS': ['c1ccccc1', 'c1ccccc1'], 'PARTICIPATING_RINGS_PRODUCTS': ['c1ccccc1', 'c1ccccc1'], 'ALL_RINGS_PRODUCTS': ['c1ccccc1', 'c1ccccc1'], 'BY-PRODUCTS': ['HBr', 'B'], 'CLASS': 'C-C Coupling', 'TAG': 'd79a78c79f0c392f0911481acf5c300cc98205269acdb93c24fb610a61c4c868', 'SOLVENT': [''], 'REAGENT': [''], 'CATALYST': [''], 'REF': '', 'NAME': 'Suzuki coupling with boronic acids', 'SCAFFOLD': 'c1ccc(-c2ccccc2)cc1'}
- find_neighbors(df: DataFrame, fp: str = 'MACCS', concatenate: bool = True, max_return: int = 100, threshold: float = 0.3, broaden: bool = False, full_search: bool = False) DataFrame [source]
Finds and returns similar reactions in the database.
- Parameters:
df – The DataFrame to search within.
fp – The type of fingerprint to use, ‘MACCS’ or ‘Morgan’.
concatenate – Whether to concatenate patterns in fingerprinting.
max_return – Maximum number of similar reactions to return.
threshold – The similarity threshold to consider for matching.
broaden – Whether to use a broadened search criteria based on tags.
full_search – If true, performs an exhaustive search across the database.
Example
>>> from rxn_insight.reaction import Reaction >>> df_uspto = pd.read_parquet("uspto_rxn_insight.gzip") # Download: https://zenodo.org/records/10171745 >>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1") >>> df_neighbors = rxn.find_neighbors(df_uspto)
- get_byproducts() list[str] [source]
Calculates and returns byproducts of the reaction based on functional group analysis.
- get_functional_groups() tuple[list[str], ...] [source]
Identifies and returns functional groups in reactants and products.
- get_reaction_info() dict[str, list[str] | str] [source]
This function compiles all reaction-related information at once. Upon calling this function, the T-matrix of the reaction will be calculated, a class and name will be assigned, the functional groups, rings, and scaffold of the reaction are determined. All information is returned as a dictionary.
- get_rings_in_products() list[str] [source]
Identifies and returns ring structures in the reaction products.
- get_rings_in_reactants() list[str] [source]
Identifies and returns ring structures in the reaction reactants.
- get_rings_in_reaction_center() tuple[list[str], ...] [source]
Identifies and returns rings in the reaction center for reactants and products.
- give_broad_tag() str [source]
Generates a broadened tag for the reaction based on its characteristics.
- read_reaction(reaction: str) None [source]
Processes a reaction string in SMILES format.
- Parameters:
reaction (str) – Reaction string in SMILES format, with components separated by >.
- suggest_conditions(df: DataFrame) dict[str, DataFrame] [source]
Suggests reaction conditions based on similar reactions found.
- Parameters:
df – The DataFrame containing reaction data to analyze.
Example
>>> from rxn_insight.reaction import Reaction >>> df_uspto = pd.read_parquet("uspto_rxn_insight.gzip") # Download: https://zenodo.org/records/10171745 >>> rxn = Reaction("OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1") >>> df_conditions = rxn.suggest_conditions(df_uspto)
rxn_insight.representation module
- rxn_insight.representation.get_morgan_fingerprint(mol: Mol) ndarray[Any, dtype[Any]] [source]
Get the ECFP4 fingerprint of a molecule. :param mol: RDKit Mol object :return: NumPy array
- rxn_insight.representation.morgan_reaction_fingerprint(rxn: str) ndarray[Any, dtype[Any]] [source]
Obtain the Morgan-based fingerprint of a reaction à la Schneider: https://doi.org/10.1021/ci5006614 :param rxn: Reaction SMILES :return: NumPy array
rxn_insight.utils module
- rxn_insight.utils.calculate_braycurtis_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Bray-Curtis similarity between two vectors.
This metric measures dissimilarity as the proportion of the absolute differences.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Bray-Curtis similarity.
- Return type:
float
- rxn_insight.utils.calculate_canberra_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Canberra similarity between two vectors.
The Canberra similarity is based on the sum of absolute differences scaled by the sum of vector elements. It emphasizes differences in smaller values.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Canberra similarity (1 - Canberra distance).
- Return type:
float
- rxn_insight.utils.calculate_chebyshev_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Chebyshev similarity between two vectors.
The Chebyshev similarity is derived from the maximum absolute difference between elements in the two vectors.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Chebyshev similarity (1 - Chebyshev distance).
- Return type:
float
- rxn_insight.utils.calculate_correlation_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Correlation similarity between two vectors.
The Correlation similarity measures the linear relationship between vectors, normalized to account for scale differences.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Correlation similarity (1 - Correlation distance).
- Return type:
float
- rxn_insight.utils.calculate_cosine_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Cosine similarity between two vectors.
The Cosine similarity measures the cosine of the angle between two vectors, indicating their directional similarity.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Cosine similarity (1 - Cosine distance).
- Return type:
float
- rxn_insight.utils.calculate_dice_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Dice similarity between two vectors.
Dice similarity is used for binary vectors and measures overlap between sets.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Dice similarity (1 - Dice distance).
- Return type:
float
- rxn_insight.utils.calculate_euclidean_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Euclidean similarity between two vectors.
The Euclidean similarity is derived from the straight-line distance between two points in multidimensional space.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Euclidean similarity (1 - Euclidean distance).
- Return type:
float
- rxn_insight.utils.calculate_jaccard_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Jaccard similarity between two vectors.
The Jaccard similarity measures the proportion of shared elements in two sets. It is commonly used for binary vectors, indicating how similar the sets are.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Jaccard similarity (1 - Jaccard distance).
- Return type:
float
- rxn_insight.utils.calculate_kulczynksi1_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Kulczynski 1 similarity between two vectors.
Kulczynski similarity measures the average overlap between two sets.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Kulczynski 1 similarity.
- Return type:
float
- rxn_insight.utils.calculate_manhattan_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Manhattan similarity between two vectors.
The Manhattan similarity (or city block similarity) is the sum of the absolute differences between elements in the vectors.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Manhattan similarity (1 - Manhattan distance).
- Return type:
float
- rxn_insight.utils.calculate_minkowski_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]], p: float = 3.0) float [source]
Calculate the Minkowski similarity between two vectors.
The Minkowski similarity generalizes the Manhattan and Euclidean similarities using a parameter p to determine the metric.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
p (float) – The order of the Minkowski metric (default: 3.0).
- Returns:
Minkowski similarity (1 - Minkowski distance).
- Return type:
float
- rxn_insight.utils.calculate_rogerstanimoto_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Rogers-Tanimoto similarity between two vectors.
This metric evaluates similarity for binary vectors based on mismatched proportions.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Rogers-Tanimoto similarity.
- Return type:
float
- rxn_insight.utils.calculate_russellrao_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Russell-Rao similarity between two vectors.
Russell-Rao similarity evaluates the proportion of matching 1’s in binary vectors.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Russell-Rao similarity.
- Return type:
float
- rxn_insight.utils.calculate_sokalmichener_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Sokal-Michener similarity between two vectors.
This metric is used for binary data, emphasizing matching 1’s and 0’s equally.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Sokal-Michener similarity.
- Return type:
float
- rxn_insight.utils.calculate_sokalsneath_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Sokal-Sneath similarity between two vectors.
Sokal-Sneath similarity emphasizes 1’s in binary data.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Sokal-Sneath similarity.
- Return type:
float
- rxn_insight.utils.calculate_yule_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]]) float [source]
Calculate the Yule similarity between two vectors.
- Parameters:
v1 (npt.NDArray[Any]) – First vector.
v2 (npt.NDArray[Any]) – Second vector.
- Returns:
Yule similarity.
- Return type:
float
- rxn_insight.utils.check_rings(atom: Atom, mol: Mol, match: list[int]) tuple[bool, list[int]] [source]
- rxn_insight.utils.curate_smirks(df: DataFrame) DataFrame [source]
Make the SMIRKS database fit to the required format. :param df: Pandas DataFrame :return: Curated SMIRKS database
- rxn_insight.utils.draw_chemical_reaction(smiles: str, highlightByReactant: bool = False, font_scale: float = 1.5) str [source]
- rxn_insight.utils.extract_from_reaction(reaction: dict[str, str | int], radius_reactants: int = 2, radius_products: int = 1) dict[str, str | int] [source]
Extract the reaction template from mapped reaction SMILES. Code adapted from https://doi.org/10.1021/acs.jcim.9b00286. :param reaction: Dictionary with keys ‘reactants’ and ‘products’. :param radius_reactants: Radius of atoms around the reaction center in the reactants :param radius_products: Radius of atoms around the reaction center in the products :return: dictionary with template information
- rxn_insight.utils.get_atom_mapping(rxn: str, rxn_mapper: RXNMapper | None = None) str [source]
This function maps reactants and products using RXNMapper (https://doi.org/10.1126/sciadv.abe4166) :param rxn_mapper: RXNMapper object :param rxn: Reaction SMILES without atom mapping :return: Reaction SMILES with atom mapping
- rxn_insight.utils.get_fp(rxn: str, fp: str = 'MACCS', concatenate: bool = True) ndarray[Any, dtype[Any]] [source]
- rxn_insight.utils.get_reaction_template(reaction: str, radius_reactants: int = 2, radius_products: int = 2) str | None [source]
Get the reaction template from a mapped reaction. :param reaction: Mapped Reaction SMILES :param radius_reactants: Radius of atoms around the reaction center in the reactants :param radius_products: Radius of atoms around the reaction center in the products :return: Reaction template in forward direction
- rxn_insight.utils.get_ring_systems(mol: Mol, include_spiro: bool = False) list[list[int]] [source]
Code taken from https://gist.github.com/greglandrum/de1751a42b3cae54011041dd67ae7415 :param mol: RDKit Mol object :param include_spiro: :return: List with atoms that make up the ring systems
- rxn_insight.utils.get_scaffold(mol: Mol) str | None [source]
Get the Murcko scaffold of a molecule :param mol: RDKit Mol object :return: SMILES string
- rxn_insight.utils.get_similarity(v1: ndarray[Any, dtype[Any]], v2: ndarray[Any, dtype[Any]], metric: str = 'jaccard') float [source]
Calculate the similarity between two fingerprints using a specified metric.
Supported metrics include: - Binary metrics: jaccard, dice, kulczynski1, rogerstanimoto, russellrao, sokalmichener, sokalsneath, yule. - Distance-based metrics: braycurtis, canberra, chebyshev, manhattan, correlation, cosine, euclidean, minkowski.
- Parameters:
v1 (npt.NDArray[Any]) – Reference fingerprint as a NumPy array.
v2 (npt.NDArray[Any]) – Fingerprint to compare as a NumPy array.
metric (str) – Metric to calculate the similarity. Default is “jaccard”.
- Returns:
Calculated similarity score.
- Return type:
float
- Raises:
ValueError – If an unsupported metric is provided.
Example
>>> import numpy as np >>> v1 = np.array([1, 0, 1, 1]) >>> v2 = np.array([1, 1, 1, 0]) >>> get_similarity(v1, v2, metric="jaccard") 0.6666666666666667
- rxn_insight.utils.make_rdkit_fp(rxn: str, fp: str = 'MACCS', concatenate: bool = True) str [source]
- rxn_insight.utils.remove_atom_mapping(rxn: str, smarts: bool = False) str [source]
This function removes the mapping from mapped Reaction SMILES. :param smarts: SMIRKS instead of Reaction SMILES :param rxn: Reaction SMILES with mapping :return: Reaction SMILES without mapping