From Genome Analysis Wiki
Jump to navigationJump to search
602 bytes added
, 16:07, 15 January 2014
Line 3: |
Line 3: |
| The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual. | | The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual. |
| | | |
− | Ingredients: | + | ==Ingredients== |
| *Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index | | *Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index |
| *Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps: | | *Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps: |
Line 14: |
Line 14: |
| perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf | | perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf |
| | | |
− | Customization: | + | ==Customization== |
| *To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl: | | *To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl: |
| **line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log"; | | **line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log"; |
Line 20: |
Line 20: |
| *To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold: | | *To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold: |
| **line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log"); | | **line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log"); |
| + | |
| + | ==Remarks== |
| + | *The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging. |
| + | *The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions. |
| + | *One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample). |
| + | |
| + | ==Contact== |
| + | Please contact Yancy if you have any questions. |