Changes

VerifyBamID (view source)

Revision as of 14:29, 6 September 2010

1,610 bytes added , 14:29, 6 September 2010

→‎Interpreting output files

Line 98: Line 98:

== Interpreting output files ==

+

=== Output files ===

When verifyBamID runs successfully, it generates the following four (or two) files

Line 105: Line 106:

* [outPrefix].bestRG - Per-readgroup best-match statistics with best-matching sample among the genotyped sample (won't be produced if there is no individual to compare to)

+

=== Column information in the output files ===

Each of these files have the following 27 columns per sample, or per readgroup (lane). When individual genotypes are unavailable, The values of 3-20 will be "N/A"

Line 134: Line 136:

# BESTMIXLLK : Log-likelihood of the data given the MLE %MIX

# BESTMIXLLK- : Difference of log-likelihood between BESTMIXLLK and the likelihood of reads under no contamination (%MIX=0).

+

=== A guideline to interpret output files ===

+

verifyBamID provides a series of information that is informative to determine whether the sample is possibly contaminated or swapped, but there is no single criteria that works for every circumstances. There are a few unmodeled factor in the estimation of [SELF-IBD]/[BEST-IBD] and [%MIX], so please note that the MLE estimation may not exactly match to the true amount of contamination. Here we provide a guideline to flag potentially contaminated/swapped samples

+

* When [SELF-IBD] < 1 AND [%MIX] > 0 AND [REF-A2%] > 0.01, meaning 1% or more of non-reference bases are observed in reference sites, we recommend to examine the data more carefully for the possibility of contamination. Each sample or lane can be checked in this way.

+

* When [SELF-IBD] << 1 AND [%MIX] ~ 0, then it is possible that the sample is swapped with another sample. When [BEST-IBD] ~ 1, [BEST_SM] might be actually the swapped sample. Otherwise, the swapped sample may not exist in the genotype data you have compared against. We recommend to check each lane for the possibility of sample swpas.

+

* When genotype data is not available but allele-frequency-based estimates of [%MIX] >= 0.03 and [BESTMIXLLK-] is large (greater than 100), then it is possible that the sample is contaminated with other sample. We recommend to use per-sample data rather than per-lane data for checking this for low coverage data, because the inference will be more confident when there are large number of bases with depth 2 or higher.

== Command Line Options ==

Hmkang

Administrators

1,120

edits

Changes

VerifyBamID (view source)

Revision as of 14:29, 6 September 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools