Manual#
Conventions#
Coordinates#
Our choice of cartesian coordinate system is the same as the DICOM standard, where \(x\) points from patient right to patient left, \(y\) points from anterior to posterior, and \(z\) points from inferior to superior.
A positive scanner angle \(\beta\) in the DICOM standard is defines as a counterclockwise rotation of angle \(\beta\) from 12 o’clock when looking into the scanner. When compared to the standard azimuthal angle our defined cartesian coordinate system, it follows that \(\phi = 3 \pi / 2 - \beta\). Since \(\beta\) is defined in the range \([0, 2\pi]\) it follows that \(\phi\) is defined in the range \([-\pi/2, 3\pi/2]\).
There are two primary coordinate systems considered here
Cartesian: Specified by the \(x\), \(y\), and \(z\) coordinates above. Any item in this coordinate system is referred to as an Object
Sinogram: Specified by \(r\), \(\beta\), and \(z\). Sinogram space is used to represent a series of 2D scans (in the \(r-z\) plane) at different angles \(\beta\). Any item in this coordinate system are referred to as Image.
As a convention, \(r\) is aligned with the \(x\)-axis at \(\beta=0\). (Note this implies that \(r\) is aligned with the negative \(y\)-axis at \(\beta=90^{\circ}\). which can be counterintuitive when viewing images)
Arrays#
Objects and images are stored using pytorch tensors. The dimensions of objects are \([B, L_x, L_y, L_z]\) and the dimensions of images are \([B, L_{\theta}, L_{r}, L_z]\) where \(B\) specifies a batch size dimension. When reconstructing multiple objects, using batches may computational time. Batches can also be used to store images taken at different energy windows in SPECT.
The index of an object tensor gives a correpsonding voxel in object space; the index of an image tensor gives a corresponding pixel in image space. Indices are arranged such that smaller indices correspond to smaller coordinate values. For example, object_tensor[0,-1,0,0]
gives the voxel at the largest value of \(x\), and the smallest values of \(y\) and \(z\). As another example, image_tensor[0,10,0,-1]
gives the pixel at the 10th detector angle corresponding to the smallest
value of \(r\) and the largest value of \(z\).
When configuring data for reconstruction in this software, it is important that that all objects and projections are aligned properly along the various axes. When importing your own data, this can be achieved through a combination of transposing and inverting axes. For example, a plot of a projection at \(\beta=90^{\circ}\) with \(r\) as the horizontal axis and \(z\) as the vertical axis should have the patient looking to the right (see coordinate system above).
Mathematics#
Throughout tutorials and documentation, mathematical notation is often used to represent different operations. In this section we define some of the notation. Unless otherwise spcified, the symbols refer to the following:
\(f\) refers to an object, and \(f_i\) refers to the value of the object at voxel \(i\)
\(g\) refers to an object, and \(f_j\) refers to the value of the object at voxel \(j\)
\(H\) refers to the system matrix and \(H_{ji}\) refers to the components: which quantifies the contribution of voxel \(i\) in object space to detector element \(j\) in image space
Note that \(f_i\) is still represented by a four dimensional tensor, as specified in the previous section (as is \(g_j\)). The system matrix is never explicitly represented by a tensor (it is very large, and there are tricks we can use to simulate the operation of the system matrix without actually using matrix operations).
Mathematical Foundations#
This section establishes a mathematical paradigm for tomography in medical imaging, and is thus mostly intended for those who wish to use PyTomography to implement novel reconstruction algorithms. It is still useful, however, for everyone to know.
Projections#
PyTomography is built around two fundamental operations used in image reconstruction: Forward Projection and Back Projection.
Forward Projection: Takes something in object space \(\mathbb{U}\) and converts it to something in image space \(\mathbb{V}\) using the system matrix: \(b_j = \sum_{i} H_{ji} a_i\) (or \(b = Ha\)). This operation is implemented in the class
ForwardProjectionNet
frompytomography.projections
.Back Projection: Takes something in image space \(\mathbb{V}\) and converts it to something in object space \(\mathbb{U}\) using the system matrix: \(a_i' = \frac{1}{\sum_j H_{ji}}\sum_{j} H_{ji} b_j\) (or \(a'=\frac{1}{H^T \vec{1}}H^T b\)). This operation is implemented in the class
BackProjectionNet
frompytomography.projections
.
It’s worth noting that \(a_i\) and \(b_j\) don’t have to represent physical objects or images. In the case of the OSEM algorithm, it is the ratio of two quantities that is forward projected: such a ratio does not represent a physical object.
Mappings#
Consider the case of of a 128x128x128 object being scanned at 64 different angles, each with resolution 128x128: in this situation, the object is a vector of length 2097152 and the image is a vector of length 1048576. If each component \(H_{ij}\) is stored using an 8 byte float, the system matrix would require 17.6TB of harddrive space to store. Fortunately, \(H\) is a sparse matrix containing mostly zeros, and can be stored in a memory efficient format on a computer.
Note also that \(H:\mathbb{U} \to \mathbb{V}\) is a non square matrix that maps from one vector space to another. In practice, it’s useful to seperate \(H\) into a combination of operations consisting of square matrices \(A_i:\mathbb{U}\to\mathbb{U}\) (object to object) and \(B_i:\mathbb{V}\to \mathbb{V}\) (image to image) and a single projection operator \(P:\mathbb{U} \to \mathbb{V}\) (object to image). The projection operator \(P\) is a simple operator that exists independently of any phenonema being modeled, while the \(A_i\)’s and \(B_i\)’s are used to model phenomena such as atteunation/PSF in SPECT/PET.
A peculiar feature of PET/SPECT/CT imaging is that they are projection-based: namely, that image space \(\mathbb{V}\) consists of a sequence of 2D “projections”, where each item in the sequence corresponds to a particular projection angle. We can choose to express the image as \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\) where \(\theta\) corresponds to a particular projection angle, and \(\hat{\theta}\) is a unit vector that represents a specific projection angle. Note that \(g\) and \(g_{\theta}\) do not lie in the same vector space. In this paradigm, we can represent \(H\) as
To implement back projection, we also need \(H^T\), which can be written as
Example:: Modeling of a SPECT scanner can be written as \(H_{\text{SPECT}} = \sum_{\theta} P(\theta) A_1(\theta) A_2(\theta) \otimes \hat{\theta}\). Consider a particular projection: say \(\theta = 10^{\circ}\). The operator \(A_2(10^{\circ})\) implements atteunation modeling when the object is being projected at a scanner angle of \(10^{\circ}\). It will adjust the object based on the amount of attenuating material photons have to travel through to reach the scanner at that particular angle. The matrix \(A_1\) (which is independent of scanner angle for a circular orbit) implements PSF blurring for that particular projection by blurring planes parallel to the \(10^{\circ}\) scanner based on the distance between the plane and the scanner. The matrix \(P(\theta)\) sums all the voxels together in the direction of the scanner, turning a 3D object into a 2D projection. The projection at that particular angle becomes \(g_{10^{\circ}} = P(\theta) A_1(\theta) A_2(\theta) f\) and the corresponding image (containing only 1 projection) would be \(g = g_{10^{\circ}} \otimes \hat{10^{\circ}} = P(\theta) A_1(\theta) A_2(\theta) f \otimes \hat{10^{\circ}}\) The net image (consisting of all projections) requires summing over all the different projections: \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\).
Example:: Modeling of a PET scanner (2D mode, no scatter) can be written as \(H_{\text{PET}} = \sum_{\theta} B_1(\theta) B_2(\theta) P(\theta) \otimes \hat{\theta}\). The operator \(B_2(\theta)\) implements atteunation modeling in PET. Unlike SPECT, where attenuation modeling is done in object space, it is implemented for PET in image space due to the fact that the probability of detection is adjusted by the same value for each LOR in PET. The matrix \(B_1\) implements PSF blurring: unlike in SPECT, it is assumed that the blurring is constant as a function from distance to the scanner, and thus the operation can be implemented in image space. The matrix \(P(\theta)\) sums all the voxels together in the direction of the scanner, turning a 3D object into a 2D projection.
Operations \(A_i\) and \(B_i\) are referred to as mappings: many predefined mappings are located in the mappings
folder.
(INSERT TUTORIAL SHOWING HOW TO MAKE MAPPING).
Reconstruction Algorithms#
In realtity, the object \(f\) (and hence the image \(g\)) are random vectors, while the system matrix \(H\) is deterministic. In addition, only the vector \(g\) is measured. For notational simplicity, we’ll let \(\tilde{f}\) represent the random vector, and \(f=E[\tilde{f}]\) represent the mean value of \(f\). This notation will be convention for the entire manual and API. As such, we can write \(g=H\tilde{f}\)
The standard reconstruction algorithm for PET and SPECT is known as the ordered-subsets expectation maximum (OSEM) algorithm. It assumes that \(\tilde{f}\) (and hence \(g\)) is a Poisson random vector, which holds when \(\tilde{f}\) represents the number of emissions from a radionuclide in a spatial location and in a given time interval. Before we begin the derivation, we define a new matrix \(\tilde{F}\) such that \(\tilde{F}_{ij} = H_{ji} \tilde{f}_i\). The components of \(\tilde{F}\) denoted \(\tilde{F}_{ij}\) represent the number of counts from voxel \(i\) in image space contributing to pixel \(j\) in image space. Since \(F\) is a random vector that counts number of emissions, it is also Poisson with \(\tilde{F} \sim \text{Poisson}(F)\). We now seek a maximum liklihood solution for \(f\), and write the liklihood function for probability density function as
Setting \(\nabla_{f} \ln L(\tilde{f},f) = 0\) simply yields \(f = \tilde{f}\). In reality, however, we measure \(g\), not \(\tilde{f}\), so we need to obtain \(\bar{f}\) as some function of \(f\). As such, the standard maximum liklihood technique will not work. What we can do, however, is consider the quantity:
where \(E_{\tilde{f}}\) represents an operator that yields the expectation value over \(\tilde{f}\). It’s important that you properly understand the interpretation of this expression. It yields the expected value of the log-liklihood, given the measured projection data \(g\) and a “guess” about what the distribution would look like: \(f^{(n)}\). There’s just one question: what does \(E_f[H_{ji}\tilde{f}_i|g, f^{(n)}]\) (i.e. the expected number of emissions from voxel \(i\) contributing to image pixel \(j\)) look like? I claim
Why? Because we’re also given information about \(g\), we know information about the sums of counts along each projection line, and we can adjust the \(H_{ji}f_i^{(n)}\) by the ratio \(\frac{g}{Hf^{(n)}}\) to ensure the counts add up along projection lines. Substituting this in yields
Setting \(\nabla_{f} E[\ln L(\tilde{f},f) | g, f^{(n)}]= 0\) now yields
We can rewrite this in vector notation as
The \(f\) on the LHS becomes the “next guess” for the distribution \(f\), so it’s better to rewrite the equation as
This is the basic form of the maximum liklihood expectation maximum (MLEM) algorithm. It requires an initial guess \(f^{(0)}\), which is typically set to all 1’s. The ordered-subset expectation maximum (OSEM) algorithm is a projection-imaging based technique that uses a subset of the total number of angles during each iteration. While it requires more iterations to converge to a solution, it often saves time due to the smaller computational cost of projecting a small subset of angles. Using the same notation as the previous section, we can express \(g = \sum_{\theta} g_{\theta} \otimes \hat{\theta}\) and \(H = \sum_{\theta} H_{\theta} \otimes \hat{\theta}\). If we seperate all the angles \(\theta\) into \(M\) distinct subsets \(\Theta_0...\Theta_{M-1}\), we can write \(g_m = \sum_{\theta \in \Theta_m} g_{\theta} \otimes \hat{\theta}\) and \(H_m = \sum_{\theta \in \Theta_m} H_{\theta} \otimes \hat{\theta}\). We can then write the OSEM algorithm as
where \(f^{n,M} \equiv f^{n+1,0}\) (so we cycle through all the subsets, then move to the next iteration).
Scatter#
Scatter for PET is not currently implemented in PyTomography, but it is planned for the near future. Scatter in SPECT involves modififying the denominator of the MLEM/OSEM algorithm to include scatter projections:
where \(s_m\) represents a scatter image (which is often obtained in SPECT through the triple energy window technique).
Priors#
Prior functions are used to encapsulate prior beliefs about what the reconstructed object should look like before reconstructing. For example, it may be a reasonable prior belief that adjacent voxels should have similar radiopharmaceutical concentration. Prior information can be included by modifying the liklihood function:
where \(\beta\) is a factor that scales the strength of the prior (note the similarity to temperature \(\beta\) used in statistical mechanics). Using the log liklihood method:
We run into a problem: what value of \(f\) do we use when computing the gradient of \(V\)? There are a few approaches to solve this issue. The first is the one step late (OSL) formalism, that uses the previous iteration value of \(f\):
The second is the block sequential regularizer (BSR) technique, which seperates each iteration into two steps:
- \[f^{(n,m+1)}_{1/2} = \left[\frac{1}{H_m^T \vec{1}} H_m^T \left( \frac{g_m}{H_mf^{(n,m)}}\right) \right]f^{(n,m)}\]
- \[f^{(n,m+1)} = f^{(n,m+1)}_{1/2}\left(1-\beta \frac{\alpha_n}{H_m^T \vec{1}} \nabla_{f} V(f)|_{f=f^{(n,m)}_{1/2}}\right)\]