Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 50: Line 50:  
c) Evaluate this log-sum assuming <math>P_{ibd} = 0.5</math>. This assumes we sequenced a sample that shares half the genome with the target sample, perhaps because it is a sibling or parent of the target sample.
 
c) Evaluate this log-sum assuming <math>P_{ibd} = 0.5</math>. This assumes we sequenced a sample that shares half the genome with the target sample, perhaps because it is a sibling or parent of the target sample.
   −
d) If desired, evaluate the same log-sum for other intermediate values of P_{ibd}. It may be interesting to set <math>P_{ibd} = 0.95</math> to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set <math>P_{ibd} = 0.05</math> to consider more distant relatives.
+
d) If desired, evaluate the same log-sum for other intermediate values of <math>P_{ibd}</math>. It may be interesting to set <math>P_{ibd} = 0.95</math> to allow for 5% of reads that are derived from a different sample, for example, due to contamination. It may be interesting to set <math>P_{ibd} = 0.05</math> to consider more distant relatives.
    
Once the result of evaluating a), b), c) and d) are available, we can decide if the target sample has been sequenced. Sequencing the target sample will mean that the log-sum in a) is the largest. Sequencing a parent or offspring of the target sample will maximize c). Sequencing a completely incorrect sample will maximize b).
 
Once the result of evaluating a), b), c) and d) are available, we can decide if the target sample has been sequenced. Sequencing the target sample will mean that the log-sum in a) is the largest. Sequencing a parent or offspring of the target sample will maximize c). Sequencing a completely incorrect sample will maximize b).
    
If all the log-sums are very similar, then we don't have enough information to make a clear cut decision. Typically, we thousands of genetic markers from a typical SNP chip and whole genome shotgun sequence data, most decisions should be very clear cut.
 
If all the log-sums are very similar, then we don't have enough information to make a clear cut decision. Typically, we thousands of genetic markers from a typical SNP chip and whole genome shotgun sequence data, most decisions should be very clear cut.
 +
 +
== Implementation Details ==
 +
 +
After loading genotypes, we generate a genome mask for each position. There are three outcomes of interest:
 +
 +
; Known Genotypes
 +
: These are sites where we have a previously observed a genotype call and where we will be evaluating match / mismatch rates to determine sample identity.
 +
 +
; dbSNP sites
 +
: These are sites that are known to vary among individuals, but for which a known genotype is not available.
 +
 +
; Background sites
 +
: These are all other sites and can be used to estimate the <math>\epsilon</math> error rate parameter.
75

edits

Navigation menu