1. Overview¶
CpGtools package provides a number of Python programs to annotate, QC, visualize, and analyze DNA methylation data generated from Illumina HumanMethylation450 BeadChip (450K) / MethylationEPIC BeadChip (850K) array or RRBS / WGBS.
These programs can be divided into three groups:
CpG position analysis modules
CpG signal analysis modules
Differential CpG analysis modules
1.1. CpG position analysis modules¶
These modules are primarily used to analyze CpG’s genomic locations.
Name |
Description |
Aggregate proportion values of CpGs that located in give genomic regions (eg. CpG islands, promoters, exons, etc.). |
|
Add annotation information CpGs according to their genomic coordinates. |
|
Add annotation information to 450K/850K probes. |
|
Generate the CpG density (count) profile over gene body and the up/down-stream intergenic regions. |
|
Calculate the distribution of CpG over chromosomes. |
|
Calculate the distribution of CpG over gene-centered genomic regions. |
|
Calculate the distribution of CpG over user-specified genomic regions. |
|
Generate a DNA motif logo and matrices for a given set of CpGs. |
|
Assign CpGs to their putative target genes. It uses the algorithm similar to GREAT. |
1.2. CpG signal analysis modules¶
These modules are primarily used to analyze CpG’s DNA methylation beta values
Name |
Description |
Perform PCA (principal component analysis) for samples. |
|
Generate jitter plot (a.k.a. strip chart) and bean plot for each sample.” |
|
Convert Beta-value into M-value or vice versa. |
|
Calculate the methylation profile (i.e., average beta value) for genomic regions around genes. |
|
Calculate methylation profile (i.e. average beta value) around the user-specified genomic regions. |
|
Create stacked barplot for each sample. The stacked barplot showing the proportions of CpGs whose beta values are falling into [0,0.25], [0.25,0.5], [0.5,0.75],[0.75,1] |
|
Summarize basic information on CpGs located in each genomic region. |
|
Perform t-SNE (t-Distributed Stochastic Neighbor Embedding) analysis for samples. |
|
Select the top N most variable CpGs (according to standard deviation) from the input file. |
|
Use Bayesian Gaussian Mixture model to trichotmize beta values into three status: ‘Un-methylated’,’Semi-methylated’, ‘Full-methylated’, and ‘unassigned’. |
1.3. Differential CpG analysis modules¶
These modules are primarily used to identify CpGs that are differentially methylated between groups
Name |
Description |
Differential CpG analysis using the Bayesian approach. (for 450K/850K data) |
|
Differential CpG analysis using the beta-binomial model. (for RRBS/WGBS count data) |
|
Differential CpG analysis using Fisher’s Exact Test. (for RRBS/WGBS count data) |
|
Differential CpG analysis using the GLM generalized liner model. (for 450K/850K data) |
|
Differential CpG analysis using logistic regression model. (for RRBS/WGBS count data) |
|
Differential CpG analysis using Mann-Whitney U test for two group comparison, and the Kruskal-Wallis H-test for multiple groups comparison. |
|
Differential CpG analysis using T test. (for 450K/850K data) |