6. CpG_logo.py

6.1. Description

This program generates DNA motif logo for a given set of CpGs. To answer the question of “what is the genomic context for a given list of CpGs ?”. This program first extract genomic sequences around C postion, and then generate motif matrices include:

  • position frequency matrix (PFM)

  • position probability matrix (PPM)

  • position weight matrix (PWM)

  • MEME format matrix

  • Jaspar format matrix

It also generate motif logo using weblogo

Notes

  • input BED file must has strand information.

6.2. Options

--version

show program’s version number and exit

-h, --help

show this help message and exit

-i INPUT_FILE, --input-file-INPUT_FILE

BED file specifying the C position. This BED file should have at least 6 columns (Chrom, ChromStart, ChromeEnd, name, score, strand). Note: Must provide correct strand information. This file can be a regular text file or compressed file (.gz, .bz2).

-r GENOME_FILE, --refgenome-GENOME_FILE

Reference genome seqeunces in FASTA format. Must be indexed using samtools “faidx” command.

-e EXTEND_SIZE, --extend-EXTEND_SIZE

Number of bases extended to up- and down-stream. default-5 (bp)

-n MOTIF_NAME, --name-MOTIF_NAME

Motif name. default-motif

-o OUT_FILE, --output-OUT_FILE

Prefix of output file.

6.3. Input files (examples)

6.4. Command

$CpG_logo.py -i 450_CH.hg19.bed.gz -r hg19.fa -o 450_CH

6.5. Output files

  • 450_CH.logo.fa

  • 450_CH.logo.jaspar

  • 450_CH.logo.meme

  • 450_CH.logo.pfm

  • 450_CH.logo.ppm

  • 450_CH.logo.pwm

  • 450_CH.logo.logo.pdf

../_images/450_CH.logo.png