Changes

Verifying Sample Identities - Implementation (view source)

Revision as of 16:20, 13 April 2010

574 bytes added , 16:20, 13 April 2010

Line 50: Line 50:

c) Evaluate this log-sum assuming <math>P_{ibd} = 0.5</math>. This assumes we sequenced a sample that shares half the genome with the target sample, perhaps because it is a sibling or parent of the target sample.

−

d) If desired, evaluate the same log-sum for other intermediate values of P_{ibd}. It may be interesting to set <math>P_{ibd} = 0.95</math> to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set <math>P_{ibd} = 0.05</math> to consider more distant relatives.

+

d) If desired, evaluate the same log-sum for other intermediate values of <math>P_{ibd}</math>. It may be interesting to set <math>P_{ibd} = 0.95</math> to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set <math>P_{ibd} = 0.05</math> to consider more distant relatives.

Once the result of evaluating a), b), c) and d) are available, we can decide if the target sample has been sequenced. Sequencing the target sample will mean that the log-sum in a) is the largest. Sequencing a parent or offspring of the target sample will maximize c). Sequencing a completely incorrect sample will maximize b).

If all the log-sums are very similar, then we don't have enough information to make a clear cut decision. Typically, we thousands of genetic markers from a typical SNP chip and whole genome shotgun sequence data, most decisions should be very clear cut.

+

== Implementation Details ==

+

After loading genotypes, we generate a genome mask for each position. There are three outcomes of interest:

+

; Known Genotypes

+

: These are sites where we have a previously observed a genotype call and where we will be evaluating match / mismatch rates to determine sample identity.

+

; dbSNP sites

+

: These are sites that are known to vary among individuals, but for which a known genotype is not available.

+

; Background sites

+

: These are all other sites and can be used to estimate the <math>\epsilon</math> error rate parameter.

Pha

75

edits

Changes

Verifying Sample Identities - Implementation (view source)

Revision as of 16:20, 13 April 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools