16. beta_trichotmize.py

16.1. Description

Rather than using hard threshold to call “methylated” or “unmethylated” CpGs or regions, this program uses probability approach (Bayesian Gaussian Mixture model) to trichotmize beta values into three status:

  • Un-methylated (labeled as “0” in result file)

  • Semi-methylated (labeled as “1” in result file)

  • Full-methylated (labeled as “2” in result file)

  • unassigned (labeled as “-1” in result file)

Basically, GMM will first calculate probability p0, p1, and p2 for each CpG based on its beta value:

p0

the probability that the CpG is un-methylated

p1

the probability that the CpG is semi-methylated

p2

the probability that the CpG is full-methylated

The classification will be made using rules:

if p0 -- max(p0, p1, p2):
       un-methylated
elif p2 -- max(p0, p1, p2):
       full-methylated
elif p1 -- max(p0, p1, p2):
       if p1 >- prob_cutoff:
               semi-methylated
       else:
               unknown/unassigned

16.2. Input files (examples)

16.3. Command

$beta_trichotmize.py -i test_05_TwoGroup.tsv -r

Below histogram and piechart showed the proportion of CpGs assigned to “Un-methylated”, “Semi-methylated” and “Full-methylated”.

../_images/trichotmize.png