Changes

From Genome Analysis Wiki
Jump to navigationJump to search
218 bytes added ,  12:32, 22 January 2014
Line 4: Line 4:     
==Ingredients==
 
==Ingredients==
*Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
+
*Index file
*Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:
+
**same as original UMAKE index
   RUN_PILEUP = TRUE      # create GLF file from BAM then individual VCF using glfSingle
+
**An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
 +
*Configuration file
 +
**same as original UMAKE conf
 +
**An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
 +
**Note that the glfSingle and merging steps are included by enabling these two steps:
 +
   RUN_PILEUP = TRUE      # create GLF file from BAM then individual VCF using glfSingle (sorry, you have to redo the pileups)
 
   RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs
 
   RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs
   −
*Perl script for generating Makefile - /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl . It is modified from umake.pl to call glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF).
+
*Perl script for generating Makefile
**These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
+
**Found at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl
 +
**It is modified from umake.pl to use glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF)
 +
***These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
 
**To generate the Makefile corresponding to this new pipeline flow, do:
 
**To generate the Makefile corresponding to this new pipeline flow, do:
 
   perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
 
   perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
Line 20: Line 27:  
*To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
 
*To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
 
**line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log");
 
**line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log");
 +
*To delete single-sample VCFs after merging, add the --d option to line 1000 (do python merge_glfS_vcf.py to view option parameters)
    
==Remarks==
 
==Remarks==
 
*The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
 
*The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
 
*The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
 
*The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
*One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample).
+
*One potential concern of this pipeline is the number and size of additional files, since each sample now has its own set of VCFs (~2Gb for chr20 per sample).
    
==Contact==
 
==Contact==
 
Please contact Yancy if you have any questions.
 
Please contact Yancy if you have any questions.
27

edits

Navigation menu