Difference between revisions of "VICES"
(2 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
= Introduction = | = Introduction = | ||
− | '''VICES''' is a program that jointly estimates the proportion of contaminating DNA in samples genotyped on arrays and which other samples in the study | + | '''VICES''' is a program that jointly estimates the proportion of contaminating DNA in samples genotyped on arrays and which other samples in the study the contaminating DNA came from. |
= Download = | = Download = | ||
Line 61: | Line 61: | ||
'''Download Sample Files''' [[File:vices_test.zip|vices_test.zip]] | '''Download Sample Files''' [[File:vices_test.zip|vices_test.zip]] | ||
− | Basic Command: | + | '''Basic Command''' from within the directory with the example files: |
− | + | ~/vices/build/vices -r vices.test.report.list.txt -s vices.test.report.iids.txt -o vices.test.report.contam.txt | |
− | Expected Output | + | '''Expected Output''' in vices.test.report.contam.txt: |
− | + | Recipient_ID Estimated_contamination Sources | |
+ | iid_01 0.0107758 AF:0.00351165; Donor_iid_02:0.00726419 | ||
+ | iid_02 0.0371312 Donor_iid_01:0.0187088; Donor_iid_03:0.0184223 | ||
+ | iid_03 0.0791742 Donor_iid_01:0.0236221; Donor_iid_02:0.0165325; Donor_iid_08:0.0390197 | ||
+ | iid_04 0.0158945 AF:0.0158945 | ||
+ | iid_05 0.0535897 AF:0.025121; Donor_iid_04:0.0284687 | ||
+ | iid_06 0.0740575 AF:0.0346833; Donor_iid_04:0.0117261; Donor_iid_10:0.0276482 | ||
+ | iid_07 0.00551559 AF:0.00551559 | ||
+ | iid_08 -0.00673222 AF:-0.00673222 | ||
+ | iid_09 0.000255342 AF:0.000255342 | ||
+ | iid_10 9.90153e-05 AF:9.90153e-05 | ||
+ | iid_11 -0.00138426 AF:-0.00138426 | ||
+ | iid_12 -0.00588946 AF:-0.00588946 | ||
= Citation = | = Citation = |
Latest revision as of 09:59, 15 April 2019
Introduction
VICES is a program that jointly estimates the proportion of contaminating DNA in samples genotyped on arrays and which other samples in the study the contaminating DNA came from.
Download
GitHub Repo: https://github.com/gjmzajac/vices
ZIP archive: https://github.com/gjmzajac/vices/zipball/master/
Ubuntu 16.04 x64 Binary File:Vices v1.0.tar.gz
Installation
The easiest way to install VICES is to use cget as follows:
cget install --prefix <install_prefix> gjmzajac/VICES
Alternatively, you can setup a dev environment cmake directly.
cd vices cget install -f ./requirements.txt # Install dependencies locally. mkdir build && cd build # Create out of source build directory. cmake -DCMAKE_TOOLCHAIN_FILE=./cget/cget/cget.cmake .. # Configure project with dependency paths. make # Build.
Usage
To run,
./vices -r reports_list.txt -o contam_estimates.txt
The reports_list.txt file must list the paths of all the Illumina report files (either plaintext or gzipped) you are testing. All files in a single run must be from the same type and version of array with exactly the same markers. Output will be written to the file contam_estimates.txt.
The output consists of a header (with the number of samples, markers, etc.) then three tab-delimited columns:
- Recipient_Index the index (starting with 0) of the samples being tested. These are in the same order as in the file with paths to report files provided to VICES. If the --sample-list option is used and points to a file containing sample IDs in the same order as the report files, then this column becomes Recipient_ID.
- Estimated_contamination
- Sources A breakdown of the estimated sources of contamination. Most will have only one source (AF for allele frequencies) because VICES does not perform the donor search for samples with estimated contamination proportion < 0.005 by AF.
Example (simulated) report files and output are provided in the section below
We highly recommend running VICES only within the same batches they were genotyped in as the donor estimation can take a long time if you give it a long list of over 1000 samples. Other reasons to run in batches are that samples genotyped in different runs are probably less likely to have traded DNA, and batch effects may influence the calculation of allele frequencies, an important initial step in VICES. Your sequencing core should provide some information on batches. If you don't have any batch information, then running VICES on one (or at most 20) 96-well plate at a time should also work well.
If any duplicate/twin samples are contaminated, this could bias the results. You might want to consider excluding one of each duplicate/twin pair before running/rerunning.
No external allele frequencies are required. VICES calculates these directly from the report files you provide.
Options
-r, --report-list <file> File with paths to Illumina report files -o, --output <file> Write output to a file [standard output] -f, --maf-threshold <float> Min minor allele frequency for markers -c, --contam-threshold <float> Threshold for estimating donor samples -s, --sample-list <file> File with sample ids for report files -a, --af-only Specify analysis with AF only. No donor estimation -n, --snp-name-col <string> Name of report file column containing SNP names -1, --allele1-col <string> Name of report file column containing allele 1 -2, --allele2-col <string> Name of report file column containing allele 2 -b, --b-allele-intensity-col <string> Name of report file column containing B allele intensity -m, --num-markers <int> Maximum number of markers for contamination estimation -t, --threads <int> Number of threads for parallel computation -h, --help This help page
VICES Example
Download Sample Files File:Vices test.zip
Basic Command from within the directory with the example files:
~/vices/build/vices -r vices.test.report.list.txt -s vices.test.report.iids.txt -o vices.test.report.contam.txt
Expected Output in vices.test.report.contam.txt:
Recipient_ID Estimated_contamination Sources iid_01 0.0107758 AF:0.00351165; Donor_iid_02:0.00726419 iid_02 0.0371312 Donor_iid_01:0.0187088; Donor_iid_03:0.0184223 iid_03 0.0791742 Donor_iid_01:0.0236221; Donor_iid_02:0.0165325; Donor_iid_08:0.0390197 iid_04 0.0158945 AF:0.0158945 iid_05 0.0535897 AF:0.025121; Donor_iid_04:0.0284687 iid_06 0.0740575 AF:0.0346833; Donor_iid_04:0.0117261; Donor_iid_10:0.0276482 iid_07 0.00551559 AF:0.00551559 iid_08 -0.00673222 AF:-0.00673222 iid_09 0.000255342 AF:0.000255342 iid_10 9.90153e-05 AF:9.90153e-05 iid_11 -0.00138426 AF:-0.00138426 iid_12 -0.00588946 AF:-0.00588946
Citation
A paper is in the works and should be published soon. For now, you can cite our abstract from the 2017 Biology of Genomes meeting
- G. J. M. Zajac, L. G. Fritsche, S. L. Dagenais, R. H. Lyons, C. M. Brummett, & G. Abecasis. VICES: Verify Intensity Contamination from Estimated Sources. Poster Session presented at: The Biology of Genomes; 2017 May 9-13; Cold Spring Harbor, NY.
Contact
For questions or bug reports, email gzajac at umich.edu
Need Something Else?
More Software: ContaminationDetection