Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,702 bytes added ,  01:51, 31 October 2015
no edit summary
Line 1: Line 1:  
[[Category:Software|vcfCooker]]
 
[[Category:Software|vcfCooker]]
 +
 +
Please see [[GotCloud]] to download vcfCooker. Downloads are [https://github.com/statgen/gotcloud/releases on github].
    
(Updated at 2012/01/20 10:47PM)
 
(Updated at 2012/01/20 10:47PM)
Line 7: Line 9:  
== Current Binary Location==
 
== Current Binary Location==
   −
Current binary version of vcfCooker is available at /net/fantasia/home/hmkang/sw/vcfCooker .
+
Current binary version of vcfCooker is available at /usr/cluster/bin/vcfCooker (as an in-house software).
    
== Basic Usage ==
 
== Basic Usage ==
Line 94: Line 96:  
  --mono-subset : Includes monomorphic SNPs for the subsetting
 
  --mono-subset : Includes monomorphic SNPs for the subsetting
 
  --filt-only-subset : Use PASS-filter SNPs only for subsetting.
 
  --filt-only-subset : Use PASS-filter SNPs only for subsetting.
 +
 +
== Genotype-level filtering a VCF file ==
 +
 +
If you want to filter individual genotypes base on genotype quality (GQ field) of genotype depth of (GD field), use either of the following commands
 +
 +
vcfCooker --in-vcf [input-vcf] --out [output-vcf] --bgzf --minGD [GD_thres] --write-vcf
 +
vcfCooker --in-vcf [input-vcf] --out [output-vcf] --bgzf --minGQ [GQ_thres] --write-vcf
 +
 +
This will generate vcf file by changing the genotypes below the threshold to missing (./.), updating the AN and AC entry in the INFO field accordingly
 +
 +
== Site level filtering a VCF file ==
 +
 +
The following options allows filtering a VCF file
 +
 +
vcfCooker --in-vcf [input-vcf] --out [output-vcf] --bgzf --filter --write-vcf
 +
 +
After filtering the FILTER column updated with a filter tag as combination of key name (uppercase for maximum bound and lowercase for minimum bound) and threshold value. For example, DP10000 means that the site was filtered by the criteria of --maxDP 10000. dp10 means that the site was filtered by --minDP 10.
 +
 +
The following is currently supported filtering criteria (to PASS filters). Other criteria are not rigorously tested, so please use at your own risk
 +
  --winIndel : Minimum distance with nearby INDELs (--indelVCF must be used together)
 +
  --indelVCF : VCF file containing the known INDELs
 +
  --minQUAL : Minimum SNP quality allows
 +
  --minMQ : Minimum Mapping Quality
 +
  --maxDP [2147483647] : Maximum Read Depth
 +
  --minDP : Minimum Read Depth
 +
  --maxABL [100] : Maximum % allele balance value based on genotype likelihood (formula by Tom Blackwell)
 +
  --minNS : Minimum # of samples with positive depth
 +
  --maxSTR [100] :  Maximum % strand balance correlation between REF/ALT and FWD/REV (-100 to 100)
 +
  --minSTR [-100] :  Minimum % strand balance correlation between REF/ALT and FWD/REV (-100 to 100)
 +
  --maxSTZ [2147483647] : Maximum Z score of the strand bias between REF/ALT and FWD/REV
 +
  --minSTZ [-2147483648] : Minimum Z score of strand bias between REF/ALT and FWD/REV
 +
  --maxCBR [100] : Maximum % cycle bias correlation between REF/ALT and read position
 +
  --minCBR [-100] : Maximum % cycle bias correlation between REF/ALT and read position
 +
  --maxLQR [100] : Maximum % of low-quality base (0-100) among all reads
 +
  --maxAOI [2147483647] : Maximum z-score quantifying incorrect calibration of base qualities
 +
  --maxMQ0 [100] : Maximum % of mapping quality = 0 reads
 +
  --maxMQ10 [100] : Maximum % of mapping quality <= 10 reads
 +
  --maxMQ20 [100] : Maximum % of mapping quality <= 20 reads
 +
  --maxMQ30 [100] : Maximum % of mapping quality <= 30 reads
 +
  --minFIC [-2147483648] : Minimum % inbreeding coefficient (-100 to 100)
 +
  --minABE [0] : Minimum % allele balance (0 to 100) based on exact base quality
 +
  --maxABE [100] : Minimum % allele balance (0 to 100) based on exact base quality
 +
  --minABZ [-2147483648] : Minimum allele balance z-score based on exact base quality
 +
  --maxABZ [-2147483648] : Maximum allele balance z-score based on exact base quality
 +
  --keepFilter : Do not reset the filter, add filter tags to existing filter (default is OFF).
    
== Upgrading glfMultiples outputs (v 3.3 to v 4.0)  ==
 
== Upgrading glfMultiples outputs (v 3.3 to v 4.0)  ==
Line 113: Line 160:  
* AN (NumAlleles) will be added as a new INFO field
 
* AN (NumAlleles) will be added as a new INFO field
 
* AB (AlleleBalance) will be added as a new INFO field (suggested by Tom Blackwell at [[Genotype_Likelihood_Based_Allele_Balance]])
 
* AB (AlleleBalance) will be added as a new INFO field (suggested by Tom Blackwell at [[Genotype_Likelihood_Based_Allele_Balance]])
  −
== Filtering a VCF file ==
  −
  −
A example command line of upgrading / filtering a glfMultiples output is as follows.
  −
  −
vcfCooker --in-vcf /home/csidore/1000g_CEUTSI_WG/analysis_chr20/vcf/TSI+CEU+GBR.Q10.chr20.vcf --out 1KG.20100517.EUR.chr20.vcf.gz --bgzf --upgrade \
  −
  --filter --maxAB 65 --indelVCF /share/swg/hmkang/data/1000G/pilot_indels_2010_07/1kg.pilot_release.merged.indels.sites.hg19.chr20.vcf --winIndel 10 \
  −
  --minDP 93 --maxDP 1860 --minNS 19 --minQUAL 10 --write-vcf --winFFRQ 10 --maxFFRQ 30
      
== Acknowledgements ==
 
== Acknowledgements ==
   −
vcfCooker is a result from collaborative effort by Hyun Min Kang, Matthew Flickinger, Matthew Snyder, Paul Anderson, Tom Blackwell, Mary Kate Trost, and Goncalo Abecasis. Please email to Hyun Min Kang [[mailto:hmkang@umich.edu| hmkang@umich.edu ]] for any questions.
+
vcfCooker is a result from collaborative effort by Hyun Min Kang, Matthew Flickinger, Matthew Snyder, Paul Anderson, Tom Blackwell, Mary Kate Wing, and Goncalo Abecasis. Please email to Hyun Min Kang [[mailto:hmkang@umich.edu| hmkang@umich.edu ]] for any questions.
61

edits

Navigation menu