Difference between revisions of "UMAKE-glfSingle"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 16: Line 16:
 
*Perl script for generating Makefile
 
*Perl script for generating Makefile
 
**Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl
 
**Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl
**It is modified from umake.pl to call glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF)
+
**It is modified from umake.pl to use glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF)
 
***These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
 
***These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
 
**To generate the Makefile corresponding to this new pipeline flow, do:
 
**To generate the Makefile corresponding to this new pipeline flow, do:

Revision as of 17:51, 15 January 2014

This is a modification of UMAKE to incorporate an individual-based variant caller in the pipeline.

The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.

Ingredients

  • Index file
    • same as original UMAKE index
    • An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
  • Configuration file
    • same as original UMAKE conf
    • An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
    • Note that the glfSingle and merging steps are included by enabling these two steps:
 RUN_PILEUP = TRUE       # create GLF file from BAM then individual VCF using glfSingle
 RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs
  • Perl script for generating Makefile
    • Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl
    • It is modified from umake.pl to use glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF)
      • These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
    • To generate the Makefile corresponding to this new pipeline flow, do:
 perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf

Customization

  • To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
    • line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
    • line 1073 - $cmd .= "\n\t".&getMosixCmd("[your-path]/glfSingle -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference > $smVcf.log");
  • To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
    • line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/glfSingle_ut -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference --uniformTsTv > $smVcf.log");

Remarks

  • The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
  • The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
  • One potential concern of this pipeline is the number and size of additional files, since each sample now has its own set of VCFs (~2Gb for chr20 per sample).

Contact

Please contact Yancy if you have any questions.