Difference between revisions of "UMAKE-glfSingle"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 3: Line 3:
 
The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.
 
The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.
  
Ingredients:
+
==Ingredients==
 
*Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
 
*Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
 
*Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:
 
*Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:
Line 14: Line 14:
 
   perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
 
   perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
  
Customization:
+
==Customization==
 
*To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
 
*To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
 
**line 1000 -  my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
 
**line 1000 -  my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
Line 20: Line 20:
 
*To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
 
*To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
 
**line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log");
 
**line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log");
 +
 +
==Remarks==
 +
*The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
 +
*The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
 +
*One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample).
 +
 +
==Contact==
 +
Please contact Yancy if you have any questions.

Revision as of 16:07, 15 January 2014

This is a modification of UMAKE to incorporate an individual-based variant caller in the pipeline.

The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.

Ingredients

  • Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
  • Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:
 RUN_PILEUP = TRUE       # create GLF file from BAM then individual VCF using glfSingle
 RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs
  • Perl script for generating Makefile - /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl . It is modified from umake.pl to call glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF).
    • These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
    • To generate the Makefile corresponding to this new pipeline flow, do:
 perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf

Customization

  • To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
    • line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
    • line 1073 - $cmd .= "\n\t".&getMosixCmd("[your-path]/glfSingle -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference > $smVcf.log");
  • To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
    • line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/glfSingle_ut -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference --uniformTsTv > $smVcf.log");

Remarks

  • The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
  • The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
  • One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample).

Contact

Please contact Yancy if you have any questions.