Difference between revisions of "UMAKE-glfSingle"
From Genome Analysis Wiki
Jump to navigationJump to search(6 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual. | The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual. | ||
− | Ingredients | + | ==Ingredients== |
− | *Index file | + | *Index file |
− | *Configuration file | + | **same as original UMAKE index |
− | RUN_PILEUP = TRUE # create GLF file from BAM then individual VCF using glfSingle | + | **An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index |
+ | *Configuration file | ||
+ | **same as original UMAKE conf | ||
+ | **An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf | ||
+ | **Note that the glfSingle and merging steps are included by enabling these two steps: | ||
+ | RUN_PILEUP = TRUE # create GLF file from BAM then individual VCF using glfSingle (sorry, you have to redo the pileups) | ||
RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs | RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs | ||
− | *Perl script for generating Makefile | + | *Perl script for generating Makefile |
− | **These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle | + | **Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl |
+ | **It is modified from umake.pl to use glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF) | ||
+ | ***These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle | ||
**To generate the Makefile corresponding to this new pipeline flow, do: | **To generate the Makefile corresponding to this new pipeline flow, do: | ||
perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf | perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf | ||
− | Customization | + | ==Customization== |
*To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl: | *To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl: | ||
**line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log"; | **line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log"; | ||
Line 20: | Line 27: | ||
*To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold: | *To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold: | ||
**line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log"); | **line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log"); | ||
+ | *To delete single-sample VCFs after merging, add the --d option to line 1000 (do python merge_glfS_vcf.py to view option parameters) | ||
+ | |||
+ | ==Remarks== | ||
+ | *The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging. | ||
+ | *The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions. | ||
+ | *One potential concern of this pipeline is the number and size of additional files, since each sample now has its own set of VCFs (~2Gb for chr20 per sample). | ||
+ | |||
+ | ==Contact== | ||
+ | Please contact Yancy if you have any questions. |
Latest revision as of 12:32, 22 January 2014
This is a modification of UMAKE to incorporate an individual-based variant caller in the pipeline.
The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.
Ingredients
- Index file
- same as original UMAKE index
- An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
- Configuration file
- same as original UMAKE conf
- An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
- Note that the glfSingle and merging steps are included by enabling these two steps:
RUN_PILEUP = TRUE # create GLF file from BAM then individual VCF using glfSingle (sorry, you have to redo the pileups) RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs
- Perl script for generating Makefile
- Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl
- It is modified from umake.pl to use glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF)
- These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
- To generate the Makefile corresponding to this new pipeline flow, do:
perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
Customization
- To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
- line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
- line 1073 - $cmd .= "\n\t".&getMosixCmd("[your-path]/glfSingle -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference > $smVcf.log");
- To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
- line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/glfSingle_ut -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference --uniformTsTv > $smVcf.log");
- To delete single-sample VCFs after merging, add the --d option to line 1000 (do python merge_glfS_vcf.py to view option parameters)
Remarks
- The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
- The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
- One potential concern of this pipeline is the number and size of additional files, since each sample now has its own set of VCFs (~2Gb for chr20 per sample).
Contact
Please contact Yancy if you have any questions.