IsoSolve: Integrate, clarify and consolidate isotopic measurements by MS and/or NMR

1. General information

This notebook is part of Supporting information of the following publication:

IsoSolve: an integrative framework to improve isotopic coverage and consolidate isotopic measurements by MS and/or NMR. Millard et al., 2021, bioRxiv preprint

This notebook contains examples on IsoSolve usage, and reproduces the analysis detailed in the publication, including all equations and Figures 3-6.

More information can be found at the IsoSolve git repository.

Content

2. Prepare environment

Dependencies

The following Python packages are required:

These packages can be installed by running the following command in a terminal:

pip install --user X

where X is the package name.

Load packages

Import Python packages.

Import IsoSolve.

Load functions to calculate metrics, create measurements-isotopomers mapping matrix, display results summary.

3. Examples: Integrate specific datasets for Alanine

Path of the file containing the relationships between measurements and the isotopic space for alanine.

3.1. Integrate isotopic data collected by NMR (HSQC, Cα; TOCSY, Hα) to reproduce Equations 8-12 of the publication.

Integrate datasets (columns 1 and 4 of the measurements mapping file).

Display results.

3.2. Integrate isotopic data collected by NMR (HSQC, Cα; TOCSY, Hα) and MS (LC-MS, [M+H]+) to reproduce Equations 17-23 of the publication.

Integrate datasets (columns 1, 4 and 8 of the measurements mapping file).

Display equations.

3.3. Integrate isotopic data collected by NMR (HSQC, Cα and Cβ) and MS (LC-MS, ion [M+H]+) to reproduce Equations 31-38 of the publication.

Integrate datasets (columns 1, 2 and 8 of the measurements mapping file).

Display equations.

3.4. Integrate all datasets.

Integrate datasets (all columns of the measurements mapping file).

Display equations.

4. Analyze all combinations of datasets for Alanine

This code computes all combinations of isotopic datasets that can be collected for alanine and reproduces Figure 3 of the publication.

Combinations are evaluated based on the following metrics:

Path of the file containing the relationships between measurements and the isotopic space for alanine.

Define all combinations of methods.

Integrate datasets for all combinations and calculate evaluation metrics.

Display results.

Plot results as an heatmap (Figure 3).

Number of combinations which do not bring any novel isotopic information.

Number of combinations with full isotopic coverage.

Save results to an excel file.

5. Determine isotopic coverage for all amino acids

This code integrates isotopic measurements collected by NMR and MS for all amino acids and determines the number of isotopomers, cumomers and EMUs that can be quantified. It reproduces Figure 4 of the publication.

Experimental datasets can be easily defined in a dictionary to generate the isotopomers-measurements mapping dataframe. Dictionary keys are dataset name, and values represent measurements which can be defined as:

List of accessible isotopic measurements for all amino acids.

Calculate isotopic coverage for all amino acids.

Notes:

Load results file and calculate isotopic coverage.

Plot results to generate Figure 6.

6. Overview of integration results from different combinations of datasets for Alanine

This code performs data integration for different combinations of isotopic measurements of Alanine collected by NMR and MS and calculates evaluation metrics for each combination. It reproduces Figure 5 of the publication.

Measurements datafile.

Define combinations of datasets.

Integrate measurements.

Display results.

Plot results.

7. Detailed analysis of integration results for Alanine

This code integrates isotopic measurements collected by NMR and MS for alanine and reproduces Figure 6 of the publication.

Measurements datafile.

Integrates datasets for combinations 41, 174 and 254.

Evaluate consistency of all datasets (combination 254).

Calculate theoretical distribution of isotopomers, cumomers and EMUs.

Display metrics for combination 41 (HSQC + LC-MS).

8. Evaluate self-consistency of measurements

This code evaluates the self-consistency of different datasets to consolidate measurements and identify biased datasets.

Define mapping and measurements datafiles.

Integrate measurements and evaluate the consistency of the different datasets based on chi2 statistics.

Now we alter two measurements ('b' and 'c') in the measurements file.

Data integration indicates that the altered datasets are not self-consistent (p(chi2) < 0.05).

9. Generate isotopically-resolved InChIs

The IUPAC International Chemical Identifier (InChI) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web.

The identifiers describe chemical substances in terms of layers of information. IsoSolve may generate an isotopic layer that specifies the isotopic species of the tracer element, following the extended representation proposed by the InChI Isotopologue and Isotopomer Development Team

To generate isotopic layers of InChIs, use isosolve.main(..., inchi=True), as shown here for Alanine.

Define mapping and measurements datafiles.

Integrate measurements and display isotopic layers for isotopomers, cumomers, and EMUs.

When measurements are provided as input, the integration results are provided for each isotopic InChI.

When the system is partly undetermined, IsoSolve provides InChIs for all species and numerical values only for the identifiable species.