{% extends "base.j2.html" %} {% block title %}GEMINI query interface{% endblock %} {% block head %}{% endblock %} {% block body %}
column_name | type | notes |
---|---|---|
chrom | STRING | The chromosome on which the variant resides |
start | INTEGER | The 0-based start position. |
end | INTEGER | The 1-based end position. |
variant_id | INTEGER | PRIMARY_KEY |
anno_id | INTEGER | Variant transcript number for the most severely affected transcript |
ref | STRING | Reference allele |
alt | STRING | Alternate alele for the variant |
qual | INTEGER | Quality score for the assertion made in ALT |
filter | STRING | A string of filters passed/failed in variant calling |
type | STRING | The type of variant.
Any of: [snp, indel]
|
sub_type | STRING | The variant sub-type.
If type is snp: [ts, (transition), tv (transversion)]
If type is indel: [ins, (insertion), del (deletion)]
|
call_rate | FLOAT | The fraction of samples with a valid genotype |
num_hom_ref | INTEGER | The total number of of homozygotes for the reference (ref) allele |
num_het | INTEGER | The total number of heterozygotes observed. |
num_hom_alt | INTEGER | The total number of homozygotes for the reference (alt) allele |
num_unknown | INTEGER | The total number of of unknown genotypes |
aaf | FLOAT | The observed allele frequency for the alternate allele |
hwe | FLOAT | The Chi-square probability of deviation from HWE (assumes random mating) |
inbreeding_coeff | FLOAT | The inbreeding co-efficient that expresses the likelihood of effects due to inbreeding |
pi | FLOAT | The computed nucleotide diversity (pi) for the site |
gts | BLOB | A compressed binary vector of sample genotypes
(e.g., “A/A”, “A|G”, “G/G”)
Access a specific sample's genotype with gts.sample_id
|
gt_types | BLOB | A compressed binary vector of numeric genotype “types”
(e.g., 0, 1, 2)
Access a specific sample's genotype type with gt_types.sample_id
|
gt_phases | BLOB | A compressed binary vector of sample genotype phases
(e.g., False, True, False)
Access a specific sample's genotype phasing info with gt_phases.sample_id
|
gt_depths | BLOB | A compressed binary vector of the depth of aligned sequence observed for each sample
Access a specific sample's sequence depth info with gt_depths.sample_id
|
gene | STRING | Corresponding gene name of the highly affected transcript |
transcript | STRING | The variant transcript that was most severely affected
(for two equally affected transcripts, either the first
| one is selected (VEP) or protein_coding biotype considered (snpEff).
|
is_exonic | BOOL | Does the variant affect an exon for >= 1transcript? |
is_coding | BOOL | Does the variant fall in a coding region (excl. 3’ & 5’ UTRs) for >= 1 transcript? |
is_lof | BOOL | Based on the value of the impact col, is the variant LOF for >= transcript? |
exon | STRING | Exon information for the severely affected transcript |
codon_change | STRING | What is the codon change? |
aa_change | STRING | What is the amino acid change (for an snp)? |
aa_length | STRING | The length of CDS in terms of number of amino acids |
biotype | STRING | The ‘type’ of the severely affected transcript (e.g.protein-coding, pseudogene, rRNA etc.) |
impact | STRING | The consequence of the most severely affected transcript |
impact_severity | STRING | Severity of the highest order observed for the variant |
polyphen_pred | STRING | Polyphen predictions for the snps (only with VEP) for the severely affected transcript |
polyphen_score | FLOAT | Polyphen scores for the severely affected transcript |
sift_pred | STRING | SIFT predictions for the snp’s (VEP only) for the most severely affected transcript |
sift_score | FLOAT | SIFT scores for the predictions |
pfam_domain | STRING | Pfam protein domain that the variant affects |
anc_allele | STRING | The reported ancestral allele if there is one. |
rms_bq | FLOAT | The RMS base quality at this position. |
cigar | STRING | CIGAR string describing how to align an alternate allele to the reference allele. |
depth | INTEGER | The number of aligned sequence reads that led to this variant call |
strand_bias | FLOAT | Strand bias at the variant position |
rms_map_qual | FLOAT | RMS mapping quality, a measure of variance of quality scores |
in_hom_run | INTEGER | Homopolymer runs for the variant allele |
num_mapq_zero | INTEGER | Total counts of reads with mapping quality equal to zero |
num_alleles | INTEGER | Total number of alleles in called genotypes |
num_reads_w_dels | FLOAT | Fraction of reads with spanning deletions |
haplotype_score | FLOAT | Consistency of the site with two segregating haplotypes |
qual_depth | FLOAT | Variant confidence or quality by depth |
allele_count | INTEGER | Allele counts in genotypes |
allele_bal | FLOAT | Allele balance for hets |
is_somatic | BOOL | Whether the variant is somatically acquired. |
in_dbsnp | BOOL | Is this variant found in dbSnp (build 135)?
0 : Absence of the variant in dbsnp
1 : Presence of the variant in dbsnp
|
rs_ids | STRING | A comma-separated list of rs ids for variants present in dbsnp
|
in_hm2 | BOOL | Whether the variant was part of HapMap2. |
in_hm3 | BOOL | Whether the variant was part of HapMap3. |
in_esp | BOOL | Presence/absence of the variant in the ESP project data |
in_1kg | BOOL | Presence/absence of the variant in the 1000 genome project data |
aaf_esp_ea | FLOAT | Minor Allele Frequency of the variant for European Americans in the ESP project |
aaf_esp_aa | FLOAT | Minor Allele Frequency of the variant for African Americans in the ESP project |
aaf_esp_all | FLOAT | Minor Allele Frequency of the variant w.r.t both groups in the ESP project |
aaf_1kg_amr | FLOAT | Allele Frequency of the variant for samples in AMR based on AC/AN (1000g project) |
aaf_1kg_asn | FLOAT | Allele frequency of the variant for samples in ASN based on AC/AN (1000g project) |
aaf_1kg_afr | FLOAT | Allele frequency of the variant for samples in AFR based on AC/AN (1000g project) |
aaf_1kg_eur | FLOAT | Allele Frequency of the variant for samples in EUR based on AC/AN (1000g project) |
aaf_1kg_all | FLOAT | Global allele frequency (based on AC/AN) (1000g project) |
in_omim | BOOL | 0 : Absence of the variant in OMIM database
1 : Presence of the variant in OMIM database
|
clinvar_sig | STRING | The clinical significance scores for each
of the variant according to ClinVar:
unknown, untested, non-pathogenic
probable-non-pathogenic, probable-pathogenic
pathogenic, drug-response, histocompatibility
other
|
clinvar_disease_name | STRING | The name of the disease to which the variant is relevant |
clinvar_dbsource | STRING | Variant Clinical Channel IDs |
clinvar_dbsource_id | STRING | The record id in the above database |
clinvar_origin | STRING | The type of variant.
Any of:
unknown, germline, somatic,
inherited, paternal, maternal,
de-novo, biparental, uniparental,
not-tested, tested-inconclusive,
other
|
clinvar_dsdb | STRING | Variant disease database name |
clinvar_dsdbid | STRING | Variant disease database ID |
clinvar_disease_acc | STRING | Variant Accession and Versions |
clinvar_in_locus_spec_db | BOOL | Submitted from a locus-specific database? |
clinvar_on_diag_assay | BOOL | Variation is interrogated in a clinical diagnostic assay? |
exome_chip | BOOL | Whether an SNP is on the Illumina HumanExome Chip |
cyto_band | STRING | Chromosomal cytobands that a variant overlaps |
rmsk | STRING | A comma-separated list of RepeatMasker annotations that the variant overlaps.
Each hit is of the form: name_class_family
|
in_cpg_island | BOOL | Does the variant overlap a CpG island?.
Based on UCSC: Regulation > CpG Islands > cpgIslandExt
|
in_segdup | BOOL | Does the variant overlap a segmental duplication?.
Based on UCSC: Variation&Repeats > Segmental Dups > genomicSuperDups track
|
is_conserved | BOOL | Does the variant overlap a conserved region?
Based on the 29-way mammalian conservation study
|
gerp_bp_score | FLOAT | GERP conservation score.
Only populated if the --load-gerp-bp option is used when loading.
Higher scores reflect greater conservation. At base-pair resolution.
|
gerp_element_pval | FLOAT | GERP elements P-val
Lower P-values scores reflect greater conservation. Not at base-pair resolution.
|
recomb_rate | FLOAT | Returns the mean recombination rate at the variant site
Based on HapMapII_GRCh37 genetic map
|
grc | STRING | Association with patch and fix regions from the Genome Reference Consortium:
Identifies potential problem regions associated with variant calls.
Built with annotation_provenance/make-ncbi-grc-patches.py
|
gms_illumina | FLOAT | Genome Mappability Scores (GMS) for Illumina error models
Provides low GMS scores (< 25.0 in any technology) from:
#Download_GMS_by_Chromosome_and_Sequencing_Technology
Input VCF for annotations prepared with:
|
gms_solid | FLOAT | Genome Mappability Scores with SOLiD error models |
gms_iontorrent | FLOAT | Genome Mappability Scores with IonTorrent error models |
in_cse | BOOL | Is a variant in an error prone genomic position,
using CSE: Context-Specific Sequencing Errors
|
encode_tfbs | STRING | Comma-separated list of transcription factors that were
observed by ENCODE to bind DNA in this region. Each hit in the list is constructed
as TF_CELLCOUNT, where:
TF is the transcription factor name
CELLCOUNT is the number of cells tested that had nonzero signals.
Provenance: wgEncodeRegTfbsClusteredV2 UCSC table
|
encode_dnaseI_cell_count | INTEGER | Count of cell types that were observed to have DnaseI hypersensitivity.
|
encode_dnaseI_cell_list | STRING | Comma separated list of cell types that were observed to have DnaseI hypersensitivity.
Provenance: Thurman, et al, Nature, 489, pp. 75-82, 5 Sep. 2012
|
encode_consensus_gm12878 | STRING | ENCODE consensus segmentation prediction for GM12878.
CTCF: CTCF-enriched element
E: Predicted enhancer
PF: Predicted promoter flanking region
R: Predicted repressed or low-activity region
TSS: Predicted promoter region including TSS
T: Predicted transcribed region
WE: Predicted weak enhancer or open chromatin cis-regulatory element
| unknown: This region of the genome had no functional prediction.
|
encode_consensus_h1hesc | STRING | ENCODE consensus segmentation prediction for h1HESC. See encode_consseg_gm12878 for details. |
encode_consensus_helas3 | STRING | ENCODE consensus segmentation prediction for Helas3. See encode_consseg_gm12878 for details. |
encode_consensus_hepg2 | STRING | ENCODE consensus segmentation prediction for HEPG2. See encode_consseg_gm12878 for details. |
encode_consensus_huvec | STRING | ENCODE consensus segmentation prediction for HuVEC. See encode_consseg_gm12878 for details. |
encode_consensus_k562 | STRING | ENCODE consensus segmentation prediction for k562. See encode_consseg_gm12878 for details. |
column_name | type | notes |
---|---|---|
variant_id | INTEGER | PRIMARY_KEY (Foreign key to variants table) |
anno_id | INTEGER | PRIMARY_KEY (Based on variant transcripts) |
gene | STRING | The gene affected by the variant. |
transcript | STRING | The transcript affected by the variant. |
is_exonic | BOOL | Does the variant affect an exon for this transcript? |
is_coding | BOOL | Does the variant fall in a coding region (excludes 3’ & 5’ UTR’s of exons)? |
is_lof | BOOL | Based on the value of the impact col, is the variant LOF? |
exon | STRING | Exon information for the variants that are exonic |
codon_change | STRING | What is the codon change? |
aa_change | STRING | What is the amino acid change? |
aa_length | STRING | The length of CDS in terms of number of amino acids |
biotype | STRING | The type of transcript (e.g.protein-coding, pseudogene, rRNA etc.) |
impact | STRING | Impacts due to variation (ref.impact category) |
impact_severity | STRING | Severity of the impact based on the impact column value (ref.impact category) |
polyphen_pred | STRING | Impact of the SNP as given by PolyPhen (VEP only)
benign, possibly_damaging, probably_damaging, unknown
|
polyphen_scores | FLOAT | Polyphen score reflecting severity (higher the impact, higher the score) |
sift_pred | STRING | Impact of the SNP as given by SIFT (VEP only)
neutral, deleterious
|
sift_scores | FLOAT | SIFT prob. scores reflecting severity (Higher the impact, lower the score) |
column name | type | notes |
---|---|---|
sample_id | INTEGER | PRIMARY_KEY |
name | STRING | Sample names |
family_id | INTEGER | Family ids for the samples [User defined, default: NULL] |
paternal_id | INTEGER | Paternal id for the samples [User defined, default: NULL] |
maternal_id | INTEGER | Maternal id for the samples [User defined, default: NULL] |
sex | STRING | Sex of the sample [User defined, default: NULL] |
phenotype | STRING | The associated sample phenotype [User defined, default: NULL] |
ethnicity | STRING | The ethnic group to which the sample belongs [User defined, default: NULL] |