Changes

Verifying Sample Identities - Implementation (view source)

Revision as of 16:09, 13 April 2010

79 bytes added , 16:09, 13 April 2010

Line 40: Line 40:

</math>

−

For any given value of P_{ibd} we can calculate this quantity for each read that overlaps a site with a known homozygous genotype. In addition, we can take the product of this quantity across all sites examined -- because this product is likely to be very small, we actually sum the <math>log</math>s of the appropriate quantities rather than multiplying them together.

+

For any given value of <math>P_{ibd}</math> we can calculate this quantity for each read that overlaps a site with a known homozygous genotype. In addition, we can take the product of this quantity across all sites examined -- because this product is likely to be very small, we actually sum the <math>log</math>s of the appropriate quantities rather than multiplying them together.

To decide if we have sequenced the correct sample, we should do the following:

−

a) Evaluate this log-sum assuming P_{ibd} = 1.0. This assumes that we have sequenced the target sample.

+

a) Evaluate this log-sum assuming <math>P_{ibd} = 1.0</math>. This assumes that we have sequenced the target sample.

−

b) Evaluate this log-sum assuming P_{ibd} = 0.0. This assumes we sequenced a different sample, unrelated to the target.

+

b) Evaluate this log-sum assuming <math>P_{ibd} = 0.0</math>. This assumes we sequenced a different sample, unrelated to the target.

−

c) ~~Evalute~~ this log-sum assuming P_{ibd} = 0.5. This assumes we sequenced a sample that shares half the genome with the target sample, perhaps because it is a sibling or parent of the target sample.

+

c) Evaluate this log-sum assuming <math>P_{ibd} = 0.5</math>. This assumes we sequenced a sample that shares half the genome with the target sample, perhaps because it is a sibling or parent of the target sample.

−

d) If desired, evaluate the same log-sum for other intermediate values of P_{ibd}. It may be interesting to set P_{ibd} = 0.95 to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set P_{ibd} = 0.05 to consider more distant relatives.

+

d) If desired, evaluate the same log-sum for other intermediate values of P_{ibd}. It may be interesting to set <math>P_{ibd} = 0.95</math> to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set <math>P_{ibd} = 0.05</math> to consider more distant relatives.

Once the result of evaluating a), b), c) and d) are available, we can decide if the target sample has been sequenced. Sequencing the target sample will mean that the log-sum in a) is the largest. Sequencing a parent or offspring of the target sample will maximize c). Sequencing a completely incorrect sample will maximize b).

If all the log-sums are very similar, then we don't have enough information to make a clear cut decision. Typically, we thousands of genetic markers from a typical SNP chip and whole genome shotgun sequence data, most decisions should be very clear cut.

Pha

75

edits

Changes

Verifying Sample Identities - Implementation (view source)

Revision as of 16:09, 13 April 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools