Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,602 bytes added ,  09:59, 15 April 2019
Line 3: Line 3:  
= Introduction =
 
= Introduction =
   −
'''VICES''' is a program that jointly estimates the proportion of contaminating DNA in samples genotyped on arrays and which other samples in the study it came from.
+
'''VICES''' is a program that jointly estimates the proportion of contaminating DNA in samples genotyped on arrays and which other samples in the study the contaminating DNA came from.
    
= Download =
 
= Download =
Line 10: Line 10:     
'''ZIP archive:''' [https://github.com/gjmzajac/vices/zipball/master/ https://github.com/gjmzajac/vices/zipball/master/]
 
'''ZIP archive:''' [https://github.com/gjmzajac/vices/zipball/master/ https://github.com/gjmzajac/vices/zipball/master/]
<!---
+
 
'''Ubuntu x64 Binary''' [[File:Example.jpg]]
+
'''Ubuntu 16.04 x64 Binary''' [[File:vices_v1.0.tar.gz|vices_v1.0.tar.gz]]
--->
+
 
 +
= Installation =
 +
 
 +
The easiest way to install VICES is to use cget as follows:
 +
cget install --prefix <install_prefix> gjmzajac/VICES
 +
 
 +
Alternatively, you can setup a dev environment cmake directly.
 +
cd vices
 +
cget install -f ./requirements.txt                    # Install dependencies locally.
 +
mkdir build && cd build                                # Create out of source build directory.
 +
cmake -DCMAKE_TOOLCHAIN_FILE=./cget/cget/cget.cmake .. # Configure project with dependency paths.
 +
make                                                  # Build.
 +
 
 
= Usage =
 
= Usage =
 
To run,  
 
To run,  
 
  ./vices -r reports_list.txt -o contam_estimates.txt
 
  ./vices -r reports_list.txt -o contam_estimates.txt
    +
The reports_list.txt file must list the paths of all the Illumina report files (either plaintext or gzipped) you are testing. All files in a single run must be from the same type and version of array with exactly the same markers. Output will be written to the file contam_estimates.txt.
   −
The file must list the paths of all the Illumina report files (either plaintext or gzipped, for example see https://support.illumina.com/downloads/humanomniexpress-12-v1-1-product-support-files.html) you are testing. Output is only by stdout at this stage so you will have to redirect. All files in a single run must be from the same type and version of array with exactly the same markers.
+
The output consists of a header (with the number of samples, markers, etc.) then three tab-delimited columns:
 +
*'''Recipient_Index''' the index (starting with 0) of the samples being tested. These are in the same order as in the file with paths to report files provided to VICES. If the --sample-list option is used and points to a file containing sample IDs in the same order as the report files, then this column becomes '''Recipient_ID'''.
 +
*'''Estimated_contamination'''
 +
*'''Sources''' A breakdown of the estimated sources of contamination. Most will have only one source (AF for allele frequencies) because VICES does not perform the donor search for samples with estimated contamination proportion < 0.005 by AF.
 +
Example (simulated) report files and output are provided in the section below
   −
The output consists of a header (with the number of samples, markers, etc.) then three tab-delimited columns:
+
We highly recommend running VICES only within the same batches they were genotyped in as the donor estimation can take a long time if you give it a long list of over 1000 samples. Other reasons to run in batches are that samples genotyped in different runs are probably less likely to have traded DNA, and batch effects may influence the calculation of allele frequencies, an important initial step in VICES. Your sequencing core should provide some information on batches. If you don't have any batch information, then running VICES on one (or at most 20) 96-well plate at a time should also work well.  
Recipient_Index: the index (starting with 0) of the samples being tested. These are in the same order as in the file with paths to report files provided to VICES.
  −
Estimated_contamination
  −
Sources: A breakdown of the estimated sources of contamination. Most will have only one source (AF for allele frequencies) because VICES does not perform the donor search for samples with estimated contamination proportion < 0.005 by AF.
  −
I highly recommend running VICES only within the same batches they were genotyped in as the donor estimation can take a long time if you give it a long list of over 1000 samples. Other reasons to run in batches are that samples genotyped in different runs are probably less likely to have traded DNA, and batch effects may influence the calculation of allele frequencies, an important initial step in VICES. Your sequencing core should provide some information on batches. If you don't have any batch information, then running VICES on one (or at most 20) 96-well plate at a time should also work well.  
      
If any duplicate/twin samples are contaminated, this could bias the results. You might want to consider excluding one of each duplicate/twin pair before running/rerunning.
 
If any duplicate/twin samples are contaminated, this could bias the results. You might want to consider excluding one of each duplicate/twin pair before running/rerunning.
   −
To answer some questions I have received, no external allele frequencies are required. VICES calculates these directly from the report files you provide. If there is sufficient interest, providing external frequencies may become an option in the official release.  
+
No external allele frequencies are required. VICES calculates these directly from the report files you provide.
    
= Options =
 
= Options =
Line 45: Line 58:  
  -h, --help                            This help page
 
  -h, --help                            This help page
   −
<!---  
+
= VICES Example =
= Example Data =
+
'''Download Sample Files''' [[File:vices_test.zip|vices_test.zip]]
--->
+
 
 +
'''Basic Command''' from within the directory with the example files:
 +
~/vices/build/vices -r vices.test.report.list.txt -s vices.test.report.iids.txt -o vices.test.report.contam.txt
 +
 
 +
'''Expected Output''' in vices.test.report.contam.txt:
 +
Recipient_ID    Estimated_contamination Sources
 +
iid_01  0.0107758      AF:0.00351165; Donor_iid_02:0.00726419
 +
iid_02  0.0371312      Donor_iid_01:0.0187088; Donor_iid_03:0.0184223
 +
iid_03  0.0791742      Donor_iid_01:0.0236221; Donor_iid_02:0.0165325; Donor_iid_08:0.0390197
 +
iid_04  0.0158945      AF:0.0158945
 +
iid_05  0.0535897      AF:0.025121; Donor_iid_04:0.0284687
 +
iid_06  0.0740575      AF:0.0346833; Donor_iid_04:0.0117261; Donor_iid_10:0.0276482
 +
iid_07  0.00551559      AF:0.00551559
 +
iid_08  -0.00673222    AF:-0.00673222
 +
iid_09  0.000255342    AF:0.000255342
 +
iid_10  9.90153e-05    AF:9.90153e-05
 +
iid_11  -0.00138426    AF:-0.00138426
 +
iid_12  -0.00588946    AF:-0.00588946
 +
 
 
= Citation =
 
= Citation =
 
A paper is in the works and should be published soon. For now, you can cite our abstract from the 2017 Biology of Genomes meeting
 
A paper is in the works and should be published soon. For now, you can cite our abstract from the 2017 Biology of Genomes meeting
33

edits

Navigation menu