QCatch

This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).

Summary

🦒 Knee Plots

The left plot shows the number of UMIs against cell rank (ordered by UMI count). Knee plot can be used to filter low-quality cells with too less UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).

Rank: cells are ranked by number of UMIs.
UMI count: a.k.a number of deduplicated reads.


🔢 UMI Counts and Detected Cene across Cell Barcodes

Barcode frequency: number of corrected reads per cell barcode.


🧽 Sequencing Saturation and Barcode Collapsing plot

The left plot is the Sequencing Saturation plot, calculated by: 1 - (n_deduped_reads / n_reads). It measures the fraction of reads originating from an already-observed UMI. You may not always need high sequencing saturation, but having higher saturation ensures you can detect very lowly expressed transcripts. The slope near the endpoint indicates the added benefit (if any) of further increasing sequencing depth.
The right plot shows the reads counts per cell against corrected reads count per cell (cellbarcode collapse).

Correct reads : correct the cellbarcode error for each read
Cellbarcode collapse: group the reads with same cellbarcode.


🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot.

The left plot depicts the distribution of detected gene counts. A knee plot can be used to filter out low-quality cells with too few UMIs.
The right plot displays a violin plot of the percentage of mitochondrial gene expression across all unfiltered cells. For the 'All Cells' plot, cells with fewer than 10 detected genes — which are considered nearly empty — were excluded. In contrast, the 'Retained Cells' plot includes all retained cells without additional filtering.


🧩 Bar plot for S/U/A counts and S/(U+S) Ratio Plot

The left plot shows the number of reads from three categories for RNA splicing status: Splice(S), Unsplice(U), and Ambiguous(A).
The right plot shows the histogram of S/(S+U) ratio. Note: cells with a sum gene count of (U + S) equal to zero were excluded in the S/(U+S) ratio plot.

Rank: cells are ranked by number of UMIs.
UMI count: number of deduplicated reads.


🗺️ Clustering: UMAP and t-SNE.

These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: Only retained cells are included in these visualizations. All retained cells are shown without further filtering. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.

📜 Quant log information

alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.

# Category Content
📝 Permit List Log Information

cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.

# Category Content