Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1 byte added ,  14:32, 6 September 2010
Line 141: Line 141:  
verifyBamID provides a series of information that is informative to determine whether the sample is possibly contaminated or swapped, but there is no single criteria that works for every circumstances. There are a few unmodeled factor in the estimation of [SELF-IBD]/[BEST-IBD] and [%MIX], so please note that the MLE estimation may not always exactly match to the true amount of contamination. Here we provide a guideline to flag potentially contaminated/swapped samples  
 
verifyBamID provides a series of information that is informative to determine whether the sample is possibly contaminated or swapped, but there is no single criteria that works for every circumstances. There are a few unmodeled factor in the estimation of [SELF-IBD]/[BEST-IBD] and [%MIX], so please note that the MLE estimation may not always exactly match to the true amount of contamination. Here we provide a guideline to flag potentially contaminated/swapped samples  
   −
* When [SELF-IBD] < 1 AND [%MIX] > 0 AND [REF-A2%] > 0.01, meaning 1% or more of non-reference bases are observed in reference sites, we recommend to examine the data more carefully for the possibility of contamination. Each sample or lane can be checked in this way.
+
* Each sample or lane can be checked in this way. When [SELF-IBD] < 1 AND [%MIX] > 0 AND [REF-A2%] > 0.01, meaning 1% or more of non-reference bases are observed in reference sites, we recommend to examine the data more carefully for the possibility of contamination.
* When [SELF-IBD] << 1 AND [%MIX] ~ 0, then it is possible that the sample is swapped with another sample. When [BEST-IBD] ~ 1, [BEST_SM] might be actually the swapped sample. Otherwise, the swapped sample may not exist in the genotype data you have compared against. We recommend to check each lane for the possibility of sample swpas.  
+
* We recommend to check each lane for the possibility of sample swaps. When [SELF-IBD] << 1 AND [%MIX] ~ 0, then it is possible that the sample is swapped with another sample. When [BEST-IBD] ~ 1, [BEST_SM] might be actually the swapped sample. Otherwise, the swapped sample may not exist in the genotype data you have compared against.  
 
* When genotype data is not available but allele-frequency-based estimates of [%MIX] >= 0.03 and [BESTMIXLLK-]  is large (greater than 100), then it is possible that the sample is contaminated with other sample. We recommend to use per-sample data rather than per-lane data for checking this for low coverage data, because the inference will be more confident when there are large number of bases with depth 2 or higher.
 
* When genotype data is not available but allele-frequency-based estimates of [%MIX] >= 0.03 and [BESTMIXLLK-]  is large (greater than 100), then it is possible that the sample is contaminated with other sample. We recommend to use per-sample data rather than per-lane data for checking this for low coverage data, because the inference will be more confident when there are large number of bases with depth 2 or higher.
  

Navigation menu