Changes

From Genome Analysis Wiki
Jump to navigationJump to search
70 bytes added ,  15:21, 4 April 2017
Line 93: Line 93:     
=== Future Directions ===
 
=== Future Directions ===
* NOTE: For sample filtering below, you will need to finish chromosome 1 for me. I have it currently running. Once it is done in a few days, you will need to run the command 'python calculateConcordance_onefile.py' while in the '/net/sardinia/progenia/SardiNIA/VariantCalling_20150330/filteringSamples/chr1/' directory.
+
* NOTE: For sample filtering below, you will need to finish chromosome 1 for me. I have it currently running. Once it is done in a few days, you will need to run the command 'python calculateConcordance_onefile.py' while in the '/net/sardinia/progenia/SardiNIA/VariantCalling_20150330/filteringSamples/chr1/' directory. <b>--> Complete </b>
 
* '''Sample Filtering'''
 
* '''Sample Filtering'''
 
** We did not do any filtering of samples (based on dupRate, genome coverage, mapping rate, proper paired, mean depth, or any other QPLOT stats) prior to SNP and Indel calling. Because of this, we want to do this filtering now. 3,188 or 3,839 samples have genome chip data from a few years ago. For these, we could look at the non-reference concordance between the chip genotypes and the sequencing genotypes and declare 'bad' samples to be those that fall below a certain threshold, such as 98% non-ref concordance. However, since the remaining 651 samples do not have chip data, this is not an option for them. Therefore, we decided on the following strategy instead:  
 
** We did not do any filtering of samples (based on dupRate, genome coverage, mapping rate, proper paired, mean depth, or any other QPLOT stats) prior to SNP and Indel calling. Because of this, we want to do this filtering now. 3,188 or 3,839 samples have genome chip data from a few years ago. For these, we could look at the non-reference concordance between the chip genotypes and the sequencing genotypes and declare 'bad' samples to be those that fall below a certain threshold, such as 98% non-ref concordance. However, since the remaining 651 samples do not have chip data, this is not an option for them. Therefore, we decided on the following strategy instead:  
Line 105: Line 105:  
** Investigate the associations between telomere length (an indicator of aging) and variants. Likely interesting in Sardinia population because Sardinians have longer lifespans & centenarians.
 
** Investigate the associations between telomere length (an indicator of aging) and variants. Likely interesting in Sardinia population because Sardinians have longer lifespans & centenarians.
 
* '''Phenotype Study'''
 
* '''Phenotype Study'''
** Likely will not yield much because not many additional samples since Carlo's last data freeze (3,514 samples there)
+
** Likely will not yield much because not many additional samples since Carlo's last data freeze (3,514 samples there) <b>--> GWASs on 120 Visit 1 traits Complete </b>
    
== New Updates ==
 
== New Updates ==

Navigation menu