We are hunting for peptides that may have taken up heavy atom isotopes such
as 15N
. Since there can be anywhere from 0
to
maxIso
uptake of heavy atoms we need to search for a range
of mz
's for each peptide.
We try and fit a gaussian to the intensities of the peptide without isotopic enrichment
$$\begin{equation} \text{monoFitParams} \equiv \text{minimize}[\mu,\sigma, k, b] = \sum_i \left(k \times e^{-( \text{retention-time}_i - \mu)^2/2 \sigma^2} + b - \text{intensity}^{o}_i \right)^2 \end{equation}$$
where $`\text{intensity}^{o}_{i}`$ are the intensities found for the peptide with no isotopic enrichment. (Actually we fit to a smoothed version of the intensities)
We use quadrature to estimate the area under this curve.
$$\begin{equation} A_{\text{o}} = \int_{l_b=\max(\mu_{opt} - 2 * \sigma_{opt} , rt_0)}^{u_b=\min(\mu_{opt} + 2 * \sigma_{opt} , rt_n)} d\text{rt} \left( k_{opt} \times e^{-(\text{rt} - \mu_{opt})^2/2 \sigma_{opt}^2}+ b_{opt} \right) \end{equation}$$
which we can do analytically as: $$\begin{equation} A_o = \text{b}_\text{opt} \times (u_b - l_b) + k_\text{opt} \times \sigma_\text{opt}\times \sqrt{\frac{\pi}{2}} \left[\text{erf}\left(\frac{1}{\sqrt{2}}\frac{u_b - \mu_\text{opt}}{\sigma_\text{opt}}\right) - \text{erf}\left(\frac{1}{\sqrt{2}}\frac{l_b - \mu_\text{opt}}{\sigma_\text{opt}}\right) \right] \end{equation}$$
$`A_o`$ is stored as monoPeakArea
.
We now linearly fit the scatter plots of the heavy peptides with the base peptide.
$$\begin{equation} I^{\text{[eic]}}_{rt} \sim \alpha^{\text{[eic]}} + \beta^{\text{[eic]}} I^{\text{mono}}_{rt} \end{equation}$$
Note that the first value of isotopeEnvelopes
is based on an mz that
has negative enrichment (at least theorectically!) as a check. (See
the blue line in the plot at top.)
If we integrate this relationship on both sides over the retention time range we get an intensity relationship:
$$\begin{equation} \text{isotopeEnvelopes[eic]} = A^{\text{[eic]}} = \max(0, \alpha^{\text{[eic]}} \times( u_b - l_b) + A_{\text{o}} \times \beta^{\text{[eic]}}) \end{equation}$$
$`\alpha`$ is probably small due to the fact that there will be zero intensities on both sides at the margins.
There is of course a natural isotopic abundance for any peptide from all atom types
(not just the one we are interested in). The calculation of this is ingenious.
Since we can't measure which (say Carbon) atom in a peptide has a heavy atom, we
have a combinatorial problem. With $`a_j^{[C_n]}`$ as the abundance fraction
of isotope j
on (Carbon) atom n
we calculate the abundance
profile for all (Carbon) atoms on the peptide as (we assume zeros
outside the isotopic abundance range):
$$\begin{equation} \begin{split} \text{a}^{[C]}_i =& \sum_{i=j+k+l+\cdots+z} a^{[\text{C}_1]}_{j} a^{[\text{C}_2]}_k a^{[\text{C}_3]}_l \times \cdots \times a^{[\text{C}_N]}_z \\ \equiv& \sum_{k,l,\cdots,z} a^{[\text{C}_1]}_{i-k+l+\cdots+z} a^{[\text{C}_2]}_k a^{[\text{C}_3]}_l \times \cdots \times a^{[\text{C}_N]}_z \end{split} \end{equation}$$
(The abundances of each isotope of an element must sum to 1).
But this is a discrete convolution which, when (discrete) fourier transformed, becomes a "simple" product.
$$\begin{equation} \implies \mathcal{FFT}\{\text{a}^{[C]}\}_k = \left[\sum_j a_j^{[C]} e^{-\frac{2\pi i j k}{M}}\right]^N \end{equation}$$
So the full abundance profile (which we call ndist
) is the inverse FFT of this:
$$\begin{equation} \text{ndist}_i \approx \sum_{i=k+l+m+n+\dots} a^{[\text{C}]}_k a^{[\text{N}]}_l a^{[\text{S}]}_m a^{[\text{P}]}_n\dots \end{equation}$$
We use this a lot since we need to calculate it with elevated abundance levels for the isotope we are using in our experiment. We use the notation $`ndist_i^{E}`$ to identify the abundances found if the environmental abundance (of our heavy isotope is E).
We calculate isotopic abundance from natural abundance to (user specified) maximum labelled abundance
$$\begin{equation} a_i^\text{iso} = \text{natural-abundance} + \frac{i}{N}\times(\text{maximumLabelEnrichment} - \text{natural-abundance}) \quad\forall\; i \in [0,N] \end{equation}$$
Then we adjust abundances of the labelled element (so they always sum to 1!)
$$\begin{equation} \text{adjustedAbundance}_{k} = \left\{ \begin{array}{ll} (1-a^\text{iso}) \times \frac{\text{Abundance}_k}{\sum_{l\ne \text{iso}} \text{Abundance}_l} & \text{if}\; \text{iso} \ne k \\ a^\text{iso}\; & \text{if}\; \text{iso} = k\end{array}\right. \end{equation}$$
for each of these values we recalulate the ndist
array with these new abundances.
$$\begin{equation} \begin{array}{cc} & \begin{array}{ccc} \quad a_o & a_1 & \cdots & a_N \\ \end{array} \\ \begin{array}{r c c} \text{iso}_0 \\ \text{iso}_1 \\ \text{iso}_2 \\ \cdots \\ \text{iso}_{\text{isoMax}} \end{array} & \left[ \begin{array}{c c c} \left[\begin{array}{c} \color{red}{n} \\ \color{red}{d} \\ \color{red}{i} \\ \color{red}{s} \\ \color{red}{t} \end{array} \right] \left[\begin{array}{c} \color{green}{n} \\ \color{green}{d} \\ \color{green}{i} \\ \color{green}{s} \\ \color{green}{t} \end{array} \right] \;\cdots\; \left[\begin{array}{c} \color{blue}{n} \\ \color{blue}{d} \\ \color{blue}{i} \\ \color{blue}{s} \\ \color{blue}{t} \end{array} \right] \end{array} \right] \end{array} \begin{array}{c} \; \\ \times \mathbf{w} \end{array} \begin{array}{c} \; \\ \quad = \end{array} \begin{array}{c} \; \\ \quad \mathbf{A} \times \mathbf{w} \end{array} \begin{array}{c} \; \\ \quad \sim \end{array} \begin{array}{c} \; \\ \quad \mathbf{I}^{exp} \end{array} \end{equation}$$
With $`\mathbf{w} \ge 0`$ and $`\text{labelledEnvelopes} \equiv \mathbf{w}`$
If we turn the positive weights $`\mathbf{\hat{w}}`$ into fractions
$`f_i = w_i / \sum_{i=0}^{N} w_i`$ then
LPF
$`\equiv \text{relativeIsotopeAbundance}`$ (an estimate of the fraction of
this peptide that include some/any heavy atoms)
$$\begin{equation}
\text{relativeIsotopeAbundance} = 1 - f_0 = \sum_{i=1}^N f_i
\end{equation}$$
enrichment
is an estimate/average of the fraction of
heavy atoms (of the labelled type) in the peptide:
$$\begin{equation} \text{enrichment} = \langle a_i^{iso} \rangle = \sum_{i=0}^N f_i a_i^{iso} \ge 0 \end{equation}$$ or... $$\begin{equation} \text{enrichment} = (1-\lambda) \times \text{natural abundance} + \lambda \times \text{maximumLabelEnrichment} \quad \text{where}\; \lambda = \langle \frac{i}{N} \rangle \in [0,1] \end{equation}$$
where $`N`$ is the labelled element count of the peptide. With this
$`\text{enrichment}`$ we recalulate $`\text{ndist[enrichment]}`$ using this abundance of the isotope. Then we calculate the Pearson correlation coefficient: with $`x = \text{heavyDistribution}`$ and $`y = \text{ndist[enrichment]} \equiv \text{ndist}^E`$
$$\begin{equation} \text{heavyCor} = r = \frac{\sum (x - \bar{x}) (y - \bar{y})} {\sqrt{\sum (x - \bar{x})^2 \sum (y - \bar{y})^2}} \end{equation}$$
$`\text{heavyDistribution}`$ is just a "normalized" version of $`\text{isotopeEnvelopes}`$: $$\begin{equation} \text{heavyDistribution} = \max(0, \text{isotopeEnvelopes} - A_o \frac{\text{ndist}}{\text{ndist[0]}}) \end{equation}$$
With E as enrichment
and relativeIsotopeAbundance
as LPF
we
calculate:
$$\begin{equation}
\text{theoreticalDist} = \mathbf{A} \times \mathbf{\hat{w}}
\end{equation}$$
naturalDist
$`A_o \times \text{ndist}`$ is plotted as orange
vertical dashed lines.theoreticalDist
is mirrored on the x-axis and plotted inverted as
.isotopeEnvelopes
is plotted as