Changes

SardiNIA (view source)

Revision as of 11:55, 6 July 2016

1,468 bytes added , 11:55, 6 July 2016

→‎Future Directions

Line 92: Line 92:

=== Future Directions ===

+

* Sample Filtering

+

** We did not do any filtering of samples (based on dupRate, genome coverage, mapping rate, proper paired, mean depth, or any other QPLOT stats) prior to SNP and Indel calling. Because of this, we want to do this filtering now. 3,188 or 3,839 samples have genome chip data from a few years ago. For these, we could look at the non-reference concordance between the chip genotypes and the sequencing genotypes and declare 'bad' samples to be those that fall below a certain threshold, such as 98% non-ref concordance. However, since the remaining 651 samples do not have chip data, this is not an option for them. Therefore, we decided on the following strategy instead:

+

**# Calculate non-reference concordance for the 3,188 samples that have chip data.

+

**# Create a prediction model using QPLOT statistics as predictors of non-reference concordance. Either do so on all of the 3,188 samples and look at R^2 (likely inflated from overfitting) or use cross-validation (test and training set) to give a measure of external predictive power.

+

**# If reasonable predictive power/R^2, use the prediction model to estimate the non-reference concordance amongst the 651 samples that do not have chip data. Also use the prediction model to estimate the non-reference concordance among the 3,188 samples that do have chip data.

+

**# Set a cut-off for 'good' versus 'bad' samples based on the estimated non-reference concordance and use it to filter samples.

== Key References ==

Kleckner

87

edits

Changes

SardiNIA (view source)

Revision as of 11:55, 6 July 2016

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools