Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,447 bytes added ,  15:29, 9 September 2017
Line 5: Line 5:  
== Download verifyBamID  ==
 
== Download verifyBamID  ==
   −
To get a copy go to the [http://www.sph.umich.edu/csg/kang/verifyBamID/download VerifyBamID Download] download page.
+
To get a copy of verifyBamId, go to: https://github.com/statgen/verifyBamID/releases
 +
 
 +
Select the latest release and download in one of 3 ways:
 +
# Binary expected to run in Ubuntu x64 platform. In other platforms, please download the source distribution and build it.
 +
#* verifyBamID.#.#.#.gz
 +
#* You will need to run "gunzip" on the .gz file
 +
# Souce Code including libStatGen (uses a fixed version of libStatGen)
 +
#* verifyBamIDLibStatGen.#.#.#.tgz
 +
#* Run "tar xvf" on this file.  Cd into the resulting directory & type make.
 +
# Source Code without libStatGen (allows alternative/newer versions of libStatGen)
 +
#* Source code (tar.gz) or Source code (zip)
 +
#* You will need to download libStatGen separately if you do not already have it.
 +
 
 +
 
 +
To get a copy of older releases go to the [http://csg.sph.umich.edu//kang/verifyBamID/download VerifyBamID Download] download page.
    
== Join in verifyBamID mailing list ==
 
== Join in verifyBamID mailing list ==
   −
Please join in the [http://groups.google.com/group/verifybamid | VerifyBamID Google Group] to ask / discuss / comment about verifyBamID.
+
Please join in the [http://groups.google.com/group/verifybamid VerifyBamID Google Group] to ask / discuss / comment about verifyBamID.
    
== What's new ==
 
== What's new ==
   −
(2012/05/23)  
+
(2014/02/13)
 +
* Put verifyBamID in github.
 +
* Added PhoneHome/Version Checking to VerifyBamID
 +
 
 +
(2012/06/20)
 +
* Fixed a bug of incorrect estimate of contamination when --chip-full option was used (Thanks to Richard Smith)
 +
* Fixed a bug of incorrect per-readgroup output in --chip-* parameter
 +
 
 +
(2012/05/24)  
 
* Fixed a bug of incorrect per-readgroup output (Thanks to Matthew Flickinger)
 
* Fixed a bug of incorrect per-readgroup output (Thanks to Matthew Flickinger)
* '''(IMPORTANT)''' Add an option to remove either side of overlapping fragment. This option is turned on by default, and can be turned off usig --ignoreOverlapPair. If your sequence data has very short insert size, adding this option may increase the sensitivity of estimated contamination.
+
* '''(IMPORTANT)''' Add an option to remove either side of overlapping fragment. This option is turned on by default, and can be turned off usig --ignoreOverlapPair. If your sequence data has very short insert size, this update may increase the sensitivity of estimated contamination.
 +
* Changes in the directory structure and Makefile
    
(2012/05/18) The new release of verifyBamID have undergone major change since the last version (as of 2011 April). Here are the highlights
 
(2012/05/18) The new release of verifyBamID have undergone major change since the last version (as of 2011 April). Here are the highlights
Line 26: Line 49:  
== Build verifyBamID  ==
 
== Build verifyBamID  ==
   −
The binary download of verifyBamID is available. You may use the version in Ubuntu 64-bit platform. To build verifyBamID, download the statgen library and run the following series of commands
+
The binary download of verifyBamID is available. You may use that version in Ubuntu 64-bit platform.  
  tar xzvf verifyBamID.20120523.tar.gz
+
 
  cd verifyBamID
+
If you download the source that includes libStatGen:
  git clone git://github.com/statgen/libStatGen.git ../libStatGen
+
tar xvf verifyBamIDLibStatGen.#.#.#.tgz
 +
cd verifyBamID_#.#.#
 +
make
 +
Executable: verifyBamID/bin/verifyBamID
 +
 
 +
If you download the source without libStatGen:
 +
  tar xvf verifyBamID-#.#.#.tar.gz
 +
  cd verifyBamID-1.1.0
 +
  make cloneLib (if ../libStatGen does not exist)
 
  make
 
  make
  ./bin/verifyBamID
+
  Executable: ./bin/verifyBamID
 +
 
 +
Note that '''make cloneLib''' command will create a directory ../libStatGen under your verifyBamID directory, and '''make''' will create binary of verifyBamID under verifyBamID/bin/
 +
 
 +
If you have a different version of libStatGen at that path, then skip the cloneLib step.  If the libStatGen you want to use is at a different location then update verifyBamID's Makefile.inc.  Replace: LIB_PATH_VERIFY_BAM_ID ?= $(LIB_PATH_GENERAL) with
 +
LIB_PATH_VERIFY_BAM_ID = /path/to/libStatGen
   −
Note that '''git clone''' command will create a directory ./libStatGen under your working directory, and '''make''' will create bin/verifyBamID under your current working directory
      
verifyBamID is designed to be reasonably portable.  
 
verifyBamID is designed to be reasonably portable.  
   −
However, since development occurs only on Ubuntu 9.10 x86 and x64 platforms, and later, there are likely other portability issues.  
+
However, since development occurs only on Ubuntu (9.10-13.10) x86 and x64 platforms, and later, there are likely other portability issues.  
   −
Currently we support verifyBamID only on Ubuntu 9.10 and later on 64-bit processors.
      
== Basic Usage ==
 
== Basic Usage ==
Line 99: Line 133:     
== Interpreting output files ==
 
== Interpreting output files ==
 +
 +
See also [[Understanding VerifyBamID output]].
    
=== Output files ===
 
=== Output files ===
Line 118: Line 154:  
# # READS : Total # of reads loaded from the BAM file
 
# # READS : Total # of reads loaded from the BAM file
 
# # AVG_DP : Average sequencing depth at the sites in the VCF file
 
# # AVG_DP : Average sequencing depth at the sites in the VCF file
# FREEMIX : Sequence-only estimate of contamination
+
# FREEMIX : Sequence-only estimate of contamination (0-1 scale)
 
# FREELK1 : Maximum log-likelihood of the sequence reads given estimated contamination under sequence-only method
 
# FREELK1 : Maximum log-likelihood of the sequence reads given estimated contamination under sequence-only method
# FREELK0 : Log-ikelihood of the sequence reads given no contamination under sequence-only method
+
# FREELK0 : Log-likelihood of the sequence reads given no contamination under sequence-only method
 
# FREE_RH : Estimated reference bias parameter Pr(refBase|HET) (when --free-refBias or --free-full is used)
 
# FREE_RH : Estimated reference bias parameter Pr(refBase|HET) (when --free-refBias or --free-full is used)
 
# FREE_RA : Estimated reference bias parameter Pr(refBase|HOMALT) (when --free-refBias or --free-full is used)
 
# FREE_RA : Estimated reference bias parameter Pr(refBase|HOMALT) (when --free-refBias or --free-full is used)
# CHIPMIX : Sequence+array estimate of contamination (NA if the external genotype is unavailable)
+
# CHIPMIX : Sequence+array estimate of contamination (NA if the external genotype is unavailable) (0-1 scale)
 
# CHIPLK1 : Maximum log-likelihood of the sequence reads given estimated contamination under sequence+array method (NA if the external genotypes are unavailable)
 
# CHIPLK1 : Maximum log-likelihood of the sequence reads given estimated contamination under sequence+array method (NA if the external genotypes are unavailable)
 
# CHIPLK0 : Log-likelihood of the sequence reads given no contamination under sequence+array method (NA if the external genotypes are unavailable)
 
# CHIPLK0 : Log-likelihood of the sequence reads given no contamination under sequence+array method (NA if the external genotypes are unavailable)
Line 129: Line 165:  
# CHIP_RA : Estimated reference bias parameter Pr(refBase|HOMALT) (when --chip-refBias or --chip-full is used)
 
# CHIP_RA : Estimated reference bias parameter Pr(refBase|HOMALT) (when --chip-refBias or --chip-full is used)
 
# DPREF : Depth (Coverage) of HomRef site (based on the genotypes of (SELF_SM/BEST_SM), passing mapQ, baseQual, maxDepth thresholds.
 
# DPREF : Depth (Coverage) of HomRef site (based on the genotypes of (SELF_SM/BEST_SM), passing mapQ, baseQual, maxDepth thresholds.
# RDPHET : DPHET/DPREF, Relative depth at Heterozygous site.
+
# RDPHET : DPHET/DPREF, Relative depth to HomRef site at Heterozygous site.
# RDPALT : DPHET/DPREF, Relative depth at HomAlt site.
+
# RDPALT : DPHET/DPREF, Relative depth to HomRef site at HomAlt site.
    
=== A guideline to interpret output files ===
 
=== A guideline to interpret output files ===
Line 155: Line 191:  
           With-chip optimization options : --chip-none, --chip-mix [ON],
 
           With-chip optimization options : --chip-none, --chip-mix [ON],
 
                                           --chip-refBias, --chip-full
 
                                           --chip-refBias, --chip-full
                     BAM analysis options : --ignoreRG, --noEOF, --precise,
+
                     BAM analysis options : --ignoreRG, --ignoreOverlapPair,
                                          --minMapQ [10], --maxDepth [20],
+
                                          --noEOF, --precise, --minMapQ [10],
                                          --minQ [13], --maxQ [40],
+
                                          --maxDepth [20], --minQ [13],
                                          --grid [0.05]
+
                                          --maxQ [40], --grid [0.05]
 
                 Modeling Reference Bias : --refRef [1.00], --refHet [0.50],
 
                 Modeling Reference Bias : --refRef [1.00], --refHet [0.50],
 
                                           --refAlt [0.00]
 
                                           --refAlt [0.00]
 
                           Output options : --out [], --verbose
 
                           Output options : --out [], --verbose
 
+
                              PhoneHome : --noPhoneHome,
 +
                                          --phoneHomeThinning [50]
    
Each option provides the following features:
 
Each option provides the following features:
Line 177: Line 214:  
* --free-none : Do not perform sequence-only method to estimate parameters
 
* --free-none : Do not perform sequence-only method to estimate parameters
 
* --free-mix : (default) Estimate contamination using sequence-only method with Brent's single dimensional optimization.
 
* --free-mix : (default) Estimate contamination using sequence-only method with Brent's single dimensional optimization.
* --free-refBias : Estimate the refernece bias parameters using sequence-only method with Simplex method
+
* --free-refBias : Estimate the reference bias parameters using sequence-only method with Simplex method
 
* --free-full : Estimate both reference bias parameters and the contamination parameters using sequence-only method
 
* --free-full : Estimate both reference bias parameters and the contamination parameters using sequence-only method
 
* --chip-none : Do not perform sequence+array method to estimate parameters
 
* --chip-none : Do not perform sequence+array method to estimate parameters
* --free-mix : (default) Estimate contamination using sequence+array method with Brent's single dimensional optimization.
+
* --chip-mix : (default) Estimate contamination using sequence+array method with Brent's single dimensional optimization.
* --free-refBias : Estimate the refernece bias parameters using sequence+array method with Simplex method
+
* --chip-refBias : Estimate the refernece bias parameters using sequence+array method with Simplex method
* --free-full : Estimate both reference bias parameters and the contamination parameters using sequence+array method
+
* --chip-full : Estimate both reference bias parameters and the contamination parameters using sequence+array method
 
* --ignoreRG : ignore the read grouup level comparison and compare samples only (recommended for an expedited run)
 
* --ignoreRG : ignore the read grouup level comparison and compare samples only (recommended for an expedited run)
 +
* --ignoreOverlapPair : ignore overlapping pair end fragment covering the same base. Disabling this option may decrease the sensitivity of the method when the insert size is short (with slight gain in the computational speed)
 
* --noEOF : do not check the EOF marker of the BAM file (for earlier version of BAM)
 
* --noEOF : do not check the EOF marker of the BAM file (for earlier version of BAM)
 
* --precise : calculate the likelihood in log-scale for high-depth data (recommended when --maxDepth is greater than 20. Can be a little bit slower)
 
* --precise : calculate the likelihood in log-scale for high-depth data (recommended when --maxDepth is greater than 20. Can be a little bit slower)
Line 195: Line 233:  
* --out : output file prefix (required)
 
* --out : output file prefix (required)
 
* --verbose : print the progress of the method on the screeen
 
* --verbose : print the progress of the method on the screeen
 +
{{PhoneHomeParameters|hdr=====|bullet=1}}
    
== Principle of Operation ==
 
== Principle of Operation ==
Line 205: Line 244:     
For more about the technical details, see the page [[Verifying Sample Identities - Implementation]]
 
For more about the technical details, see the page [[Verifying Sample Identities - Implementation]]
 +
 +
== Reference ==
 +
 +
Please cite the following paper:
 +
 +
G. Jun, M. Flickinger, K. N. Hetrick, Kurt, J. M. Romm, K. F. Doheny, G. Abecasis, M. Boehnke,and H. M. Kang, ''Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data'', American journal of human genetics doi:10.1016/j.ajhg.2012.09.004 (volume 91 issue 5 pp.839 - 848)
 +
 +
 +
== Contamination in Array Data ==
 +
 +
[[VerifyIDintensity]] or [[BAFRegress]] can estimate sample contamination from Illumina genotype array data.
    
== Acknowledgements ==
 
== Acknowledgements ==
   −
VerifyBamID is a result from collaborative effort by Hyun Min Kang, Goo Jun, Matthew Flickinger, Mary Kate Trost, and Goncalo Abecasis. Please email to Hyun Min Kang [[mailto:hmkang@umich.edu| hmkang@umich.edu ]] for any questions.
+
VerifyBamID is a result from collaborative effort by Hyun Min Kang, Goo Jun, Matthew Flickinger, Mary Kate Wing, Goncalo Abecasis, and Michael Boehnke. Please email to Hyun Min Kang [hmkang@umich.edu ] for any questions.

Navigation menu