VcfCodingSnps
From Genome Analysis Wiki
Jump to navigationJump to searchvcfCodingSnps is a SNP annotation tool that annotates coding variants in a VCF format input file. It takes a VCF as input and generates an annotated VCF file as output.
Basic Usage Example
Here is an example of how vcfCodingSnps
works:
vcfCodingSnps -s chrom22-CHB.vcf -g genelist.txt -o annotated-chrom22-CHB.vcf
Command Line Options
-s SNP file Specifies the name of the input VCF-format SNP file -g genefile Specifies the name of the input gene file, by default use gene list file in ASCII format generated by UCSC genome browser -o output file Specifies the name of the output VCF-format SNP file
Input File Infomation
1. Example headlines of input VCF-format SNP file:
##format=VCFv3.2 ##NA12891=../depthFilter/filtered.NA12891.chrom22.SLX.maq.SRP000032.2009_07.glf ##NA12892=../depthFilter/filtered.NA12892.chrom22.SLX.maq.SRP000032.2009_07.glf ##NA12878=../merged/NA12878.chrom22.merged.glf ##minTotalDepth=0 ##maxTotalDepth=1000 ##minMapQuality=30 ##minPosterior=0.9990 ##program=glfTrio ##versionDate=Tue Dec 1 00:42:24 2009 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12891 NA12892 NA12878 22 14439753 . a t 100 mapQ=0 depth=68;duples=homs;mac=2 GT:GQ:DP 1|1:100:40 0|0:81:28 1|0:84:0 22 14441250 . t c 59 mapQ=0 depth=40 GT:GQ:DP 1|1:56:25 1|1:31:15 1|1:32:0 22 14443154 . t g 45 mapQ=9 depth=92;duples=homs;mac=2 GT:GQ:DP 1|1:49:21 0|0:60:20 1|0:100:51 ... ...
2. Input gene file should be a plain text file generated by ucsc genome browser. A sample pathway of generating an input gene file is
Go to http://genome.ucsc.edu/ ►► Click "table" ►► Specify the fields required (clade: mammal, genome:human etc.) ►► get output gene file 1. A detailed instruction on using the table browser could be found at genome.ucsc.edu/cgi-bin/hgTables. 2. One can specify the regieon to be whole genome or any particular gene position (e.g. chr21:33031597-33041570).
Here is an example of input gene file headlines:
#name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinID alignID uc001aaa.3 chr1 + 11873 14409 11873 11873 3 11873,12612,13220, 12227,12721,14409, uc001aaa.3 uc010nxq.1 chr1 + 11873 14409 12189 13639 3 11873,12594,13402, 12227,12721,14409, B7ZGX9 uc010nxq.1 uc010nxr.1 chr1 + 11873 14409 11873 11873 3 11873,12645,13220, 12227,12697,14409, uc010nxr.1 uc009vis.2 chr1 - 14362 16765 14362 14362 4 14362,14969,15795,16606, 14829,15038,15942,16765, uc009vis.2 uc009vjc.1 chr1 - 16857 17751 16857 16857 2 16857,17232, 17055,17751, uc009vjc.1 uc009vjd.2 chr1 - 15795 18061 15795 15795 5 15795,16606,16857,17232,17605, 15947,16765,17055,17368,18061, uc009vjd.2
Output File
Some possible annotating results for a single SNP with the meanings of their output format are listed below:
5'UTR=A26C2[-] means the SNP is in the 5'UTR region of gene A26C2 with a minus strand. INTRONIC=POTEG[-] means the SNP is in the intronic region of gene POTEG with a minus strand. SYNONYMOUS_CODING=GAB4:Ala15826157Ala[-] means that the SNP is synonymous coding at position 15826167 in gene GAB4 with a minus strand and it keeps amino-acid Ala unchaged. NON_SYNONYMOUS_CODING=GAB4:Leu15830952Pro[-] means that the SNP is non_synonymous coding at position 15830925 in gene GAB4 with a minus strand and it changes amino-acid Leu to Pro. SPLICE_SITE=NCAPH2[+] means that the SNP is in the SPLICE_SITE (5 bp within exon start or end positions in the coding reegion) of gene MCAPH2 with a plus strand. STOP_GAINED=MAPK12:Trp49035685stop[-] means that the SNP is at position 49035685 in gene MAPK12 with a minus strand and it changes amino-acid Trp to a stop codon.