{% extends "base.j2.html" %} {% block title %}GEMINI query interface{% endblock %} {% block head %}{% endblock %} {% block body %}

The variants table

Core columns
column_name type notes
chrom STRING The chromosome on which the variant resides
start INTEGER The 0-based start position.
end INTEGER The 1-based end position.
variant_id INTEGER PRIMARY_KEY
anno_id INTEGER Variant transcript number for the most severely affected transcript
ref STRING Reference allele
alt STRING Alternate alele for the variant
qual INTEGER Quality score for the assertion made in ALT
filter STRING A string of filters passed/failed in variant calling
Variant and PopGen info
     
type STRING
The type of variant.
Any of: [snp, indel]
sub_type STRING
The variant sub-type.
If type is snp: [ts, (transition), tv (transversion)]
If type is indel: [ins, (insertion), del (deletion)]
call_rate FLOAT The fraction of samples with a valid genotype
num_hom_ref INTEGER The total number of of homozygotes for the reference (ref) allele
num_het INTEGER The total number of heterozygotes observed.
num_hom_alt INTEGER The total number of homozygotes for the reference (alt) allele
num_unknown INTEGER The total number of of unknown genotypes
aaf FLOAT The observed allele frequency for the alternate allele
hwe FLOAT The Chi-square probability of deviation from HWE (assumes random mating)
inbreeding_coeff FLOAT The inbreeding co-efficient that expresses the likelihood of effects due to inbreeding
pi FLOAT The computed nucleotide diversity (pi) for the site
Genotype information
     
gts BLOB A compressed binary vector of sample genotypes (e.g., “A/A”, “A|G”, “G/G”) Access a specific sample's genotype with gts.sample_id
gt_types BLOB A compressed binary vector of numeric genotype “types” (e.g., 0, 1, 2) Access a specific sample's genotype type with gt_types.sample_id
gt_phases BLOB A compressed binary vector of sample genotype phases (e.g., False, True, False) Access a specific sample's genotype phasing info with gt_phases.sample_id
gt_depths BLOB A compressed binary vector of the depth of aligned sequence observed for each sample Access a specific sample's sequence depth info with gt_depths.sample_id
Gene information
     
gene STRING Corresponding gene name of the highly affected transcript
transcript STRING
The variant transcript that was most severely affected
(for two equally affected transcripts, either the first | one is selected (VEP) or protein_coding biotype considered (snpEff).
is_exonic BOOL Does the variant affect an exon for >= 1transcript?
is_coding BOOL Does the variant fall in a coding region (excl. 3’ & 5’ UTRs) for >= 1 transcript?
is_lof BOOL Based on the value of the impact col, is the variant LOF for >= transcript?
exon STRING Exon information for the severely affected transcript
codon_change STRING What is the codon change?
aa_change STRING What is the amino acid change (for an snp)?
aa_length STRING The length of CDS in terms of number of amino acids
biotype STRING The ‘type’ of the severely affected transcript (e.g.protein-coding, pseudogene, rRNA etc.)
impact STRING The consequence of the most severely affected transcript
impact_severity STRING Severity of the highest order observed for the variant
polyphen_pred STRING Polyphen predictions for the snps (only with VEP) for the severely affected transcript
polyphen_score FLOAT Polyphen scores for the severely affected transcript
sift_pred STRING SIFT predictions for the snp’s (VEP only) for the most severely affected transcript
sift_score FLOAT SIFT scores for the predictions
pfam_domain STRING Pfam protein domain that the variant affects
Optional VCF INFO fields
     
anc_allele STRING The reported ancestral allele if there is one.
rms_bq FLOAT The RMS base quality at this position.
cigar STRING CIGAR string describing how to align an alternate allele to the reference allele.
depth INTEGER The number of aligned sequence reads that led to this variant call
strand_bias FLOAT Strand bias at the variant position
rms_map_qual FLOAT RMS mapping quality, a measure of variance of quality scores
in_hom_run INTEGER Homopolymer runs for the variant allele
num_mapq_zero INTEGER Total counts of reads with mapping quality equal to zero
num_alleles INTEGER Total number of alleles in called genotypes
num_reads_w_dels FLOAT Fraction of reads with spanning deletions
haplotype_score FLOAT Consistency of the site with two segregating haplotypes
qual_depth FLOAT Variant confidence or quality by depth
allele_count INTEGER Allele counts in genotypes
allele_bal FLOAT Allele balance for hets
is_somatic BOOL Whether the variant is somatically acquired.
Population frequency information
     
in_dbsnp BOOL
Is this variant found in dbSnp (build 135)?
0 : Absence of the variant in dbsnp
1 : Presence of the variant in dbsnp
rs_ids STRING
A comma-separated list of rs ids for variants present in dbsnp
in_hm2 BOOL Whether the variant was part of HapMap2.
in_hm3 BOOL Whether the variant was part of HapMap3.
in_esp BOOL Presence/absence of the variant in the ESP project data
in_1kg BOOL Presence/absence of the variant in the 1000 genome project data
aaf_esp_ea FLOAT Minor Allele Frequency of the variant for European Americans in the ESP project
aaf_esp_aa FLOAT Minor Allele Frequency of the variant for African Americans in the ESP project
aaf_esp_all FLOAT Minor Allele Frequency of the variant w.r.t both groups in the ESP project
aaf_1kg_amr FLOAT Allele Frequency of the variant for samples in AMR based on AC/AN (1000g project)
aaf_1kg_asn FLOAT Allele frequency of the variant for samples in ASN based on AC/AN (1000g project)
aaf_1kg_afr FLOAT Allele frequency of the variant for samples in AFR based on AC/AN (1000g project)
aaf_1kg_eur FLOAT Allele Frequency of the variant for samples in EUR based on AC/AN (1000g project)
aaf_1kg_all FLOAT Global allele frequency (based on AC/AN) (1000g project)
Disease phenotype info.
     
in_omim BOOL
0 : Absence of the variant in OMIM database
1 : Presence of the variant in OMIM database
clinvar_sig STRING
The clinical significance scores for each
of the variant according to ClinVar:
unknown, untested, non-pathogenic
probable-non-pathogenic, probable-pathogenic
pathogenic, drug-response, histocompatibility
other
clinvar_disease_name STRING The name of the disease to which the variant is relevant
clinvar_dbsource STRING Variant Clinical Channel IDs
clinvar_dbsource_id STRING The record id in the above database
clinvar_origin STRING
The type of variant.
Any of:
unknown, germline, somatic,
inherited, paternal, maternal,
de-novo, biparental, uniparental,
not-tested, tested-inconclusive,
other
clinvar_dsdb STRING Variant disease database name
clinvar_dsdbid STRING Variant disease database ID
clinvar_disease_acc STRING Variant Accession and Versions
clinvar_in_locus_spec_db BOOL Submitted from a locus-specific database?
clinvar_on_diag_assay BOOL Variation is interrogated in a clinical diagnostic assay?
Genome annotations
     
exome_chip BOOL Whether an SNP is on the Illumina HumanExome Chip
cyto_band STRING Chromosomal cytobands that a variant overlaps
rmsk STRING
A comma-separated list of RepeatMasker annotations that the variant overlaps.
Each hit is of the form: name_class_family
in_cpg_island BOOL
Does the variant overlap a CpG island?.
Based on UCSC: Regulation > CpG Islands > cpgIslandExt
in_segdup BOOL
Does the variant overlap a segmental duplication?.
Based on UCSC: Variation&Repeats > Segmental Dups > genomicSuperDups track
is_conserved BOOL
Does the variant overlap a conserved region?
Based on the 29-way mammalian conservation study
gerp_bp_score FLOAT
GERP conservation score.
Only populated if the --load-gerp-bp option is used when loading.
Higher scores reflect greater conservation. At base-pair resolution.
gerp_element_pval FLOAT
GERP elements P-val
Lower P-values scores reflect greater conservation. Not at base-pair resolution.
recomb_rate FLOAT
Returns the mean recombination rate at the variant site
Based on HapMapII_GRCh37 genetic map
Mappability
     
grc STRING
Association with patch and fix regions from the Genome Reference Consortium:
Identifies potential problem regions associated with variant calls.
Built with annotation_provenance/make-ncbi-grc-patches.py
gms_illumina FLOAT
Genome Mappability Scores (GMS) for Illumina error models
Provides low GMS scores (< 25.0 in any technology) from:
#Download_GMS_by_Chromosome_and_Sequencing_Technology
Input VCF for annotations prepared with:
gms_solid FLOAT Genome Mappability Scores with SOLiD error models
gms_iontorrent FLOAT Genome Mappability Scores with IonTorrent error models
in_cse BOOL
Is a variant in an error prone genomic position,
using CSE: Context-Specific Sequencing Errors
ENCODE information
     
encode_tfbs STRING
Comma-separated list of transcription factors that were
observed by ENCODE to bind DNA in this region. Each hit in the list is constructed
as TF_CELLCOUNT, where:
TF is the transcription factor name
CELLCOUNT is the number of cells tested that had nonzero signals.
Provenance: wgEncodeRegTfbsClusteredV2 UCSC table
encode_dnaseI_cell_count INTEGER
Count of cell types that were observed to have DnaseI hypersensitivity.
encode_dnaseI_cell_list STRING
Comma separated list of cell types that were observed to have DnaseI hypersensitivity.
Provenance: Thurman, et al, Nature, 489, pp. 75-82, 5 Sep. 2012
encode_consensus_gm12878 STRING
ENCODE consensus segmentation prediction for GM12878.

CTCF: CTCF-enriched element
E: Predicted enhancer
PF: Predicted promoter flanking region
R: Predicted repressed or low-activity region
TSS: Predicted promoter region including TSS
T: Predicted transcribed region
WE: Predicted weak enhancer or open chromatin cis-regulatory element | unknown: This region of the genome had no functional prediction.
encode_consensus_h1hesc STRING ENCODE consensus segmentation prediction for h1HESC. See encode_consseg_gm12878 for details.
encode_consensus_helas3 STRING ENCODE consensus segmentation prediction for Helas3. See encode_consseg_gm12878 for details.
encode_consensus_hepg2 STRING ENCODE consensus segmentation prediction for HEPG2. See encode_consseg_gm12878 for details.
encode_consensus_huvec STRING ENCODE consensus segmentation prediction for HuVEC. See encode_consseg_gm12878 for details.
encode_consensus_k562 STRING ENCODE consensus segmentation prediction for k562. See encode_consseg_gm12878 for details.

The variant_impacts table

column_name type notes
variant_id INTEGER PRIMARY_KEY (Foreign key to variants table)
anno_id INTEGER PRIMARY_KEY (Based on variant transcripts)
gene STRING The gene affected by the variant.
transcript STRING The transcript affected by the variant.
is_exonic BOOL Does the variant affect an exon for this transcript?
is_coding BOOL Does the variant fall in a coding region (excludes 3’ & 5’ UTR’s of exons)?
is_lof BOOL Based on the value of the impact col, is the variant LOF?
exon STRING Exon information for the variants that are exonic
codon_change STRING What is the codon change?
aa_change STRING What is the amino acid change?
aa_length STRING The length of CDS in terms of number of amino acids
biotype STRING The type of transcript (e.g.protein-coding, pseudogene, rRNA etc.)
impact STRING Impacts due to variation (ref.impact category)
impact_severity STRING Severity of the impact based on the impact column value (ref.impact category)
polyphen_pred STRING
Impact of the SNP as given by PolyPhen (VEP only)
benign, possibly_damaging, probably_damaging, unknown
polyphen_scores FLOAT Polyphen score reflecting severity (higher the impact, higher the score)
sift_pred STRING
Impact of the SNP as given by SIFT (VEP only)
neutral, deleterious
sift_scores FLOAT SIFT prob. scores reflecting severity (Higher the impact, lower the score)

The samples table

column name type notes
sample_id INTEGER PRIMARY_KEY
name STRING Sample names
family_id INTEGER Family ids for the samples [User defined, default: NULL]
paternal_id INTEGER Paternal id for the samples [User defined, default: NULL]
maternal_id INTEGER Maternal id for the samples [User defined, default: NULL]
sex STRING Sex of the sample [User defined, default: NULL]
phenotype STRING The associated sample phenotype [User defined, default: NULL]
ethnicity STRING The ethnic group to which the sample belongs [User defined, default: NULL]
{% endblock %}