Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,610 bytes added ,  14:29, 6 September 2010
Line 98: Line 98:  
== Interpreting output files ==
 
== Interpreting output files ==
    +
=== Output files ===
 
When verifyBamID runs successfully, it generates the following four (or two) files
 
When verifyBamID runs successfully, it generates the following four (or two) files
   Line 105: Line 106:  
* [outPrefix].bestRG - Per-readgroup best-match statistics with best-matching sample among the genotyped sample (won't be produced if there is no individual to compare to)
 
* [outPrefix].bestRG - Per-readgroup best-match statistics with best-matching sample among the genotyped sample (won't be produced if there is no individual to compare to)
    +
=== Column information in the output files ===
 
Each of these files have the following 27 columns per sample, or per readgroup (lane). When individual genotypes are unavailable,  The values of 3-20 will be "N/A"
 
Each of these files have the following 27 columns per sample, or per readgroup (lane). When individual genotypes are unavailable,  The values of 3-20 will be "N/A"
   Line 134: Line 136:  
# BESTMIXLLK : Log-likelihood of the data given the MLE %MIX
 
# BESTMIXLLK : Log-likelihood of the data given the MLE %MIX
 
# BESTMIXLLK- : Difference of log-likelihood between BESTMIXLLK and the likelihood of reads under no contamination (%MIX=0).
 
# BESTMIXLLK- : Difference of log-likelihood between BESTMIXLLK and the likelihood of reads under no contamination (%MIX=0).
 +
 +
=== A guideline to interpret output files ===
 +
 +
verifyBamID provides a series of information that is informative to determine whether the sample is possibly contaminated or swapped, but there is no single criteria that works for every circumstances. There are a few unmodeled factor in the estimation of [SELF-IBD]/[BEST-IBD] and [%MIX], so please note that the MLE estimation may not exactly match to the true amount of contamination. Here we provide a guideline to flag potentially contaminated/swapped samples
 +
 +
* When [SELF-IBD] < 1 AND [%MIX] > 0 AND [REF-A2%] > 0.01, meaning 1% or more of non-reference bases are observed in reference sites, we recommend to examine the data more carefully for the possibility of contamination. Each sample or lane can be checked in this way.
 +
* When [SELF-IBD] << 1 AND [%MIX] ~ 0, then it is possible that the sample is swapped with another sample. When [BEST-IBD] ~ 1, [BEST_SM] might be actually the swapped sample. Otherwise, the swapped sample may not exist in the genotype data you have compared against. We recommend to check each lane for the possibility of sample swpas.
 +
* When genotype data is not available but allele-frequency-based estimates of [%MIX] >= 0.03 and [BESTMIXLLK-]  is large (greater than 100), then it is possible that the sample is contaminated with other sample. We recommend to use per-sample data rather than per-lane data for checking this for low coverage data, because the inference will be more confident when there are large number of bases with depth 2 or higher.
    
== Command Line Options ==
 
== Command Line Options ==

Navigation menu