Difference between revisions of "UMAKE-glfSingle"

Revision as of 16:07, 15 January 2014

This is a modification of UMAKE to incorporate an individual-based variant caller in the pipeline.

The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.

Ingredients

Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:

 RUN_PILEUP = TRUE       # create GLF file from BAM then individual VCF using glfSingle
 RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls, population VCF by merging the glfSingle outputs

Perl script for generating Makefile - /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl . It is modified from umake.pl to call glfSingle and merge_glfS_vcf.py (for merging across single-sample VCF).
- These two programs are available at /net/wonderland/home/yancylo/bin/umake-glfSingle
- To generate the Makefile corresponding to this new pipeline flow, do:

 perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf

Customization

To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
- line 1000 - my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
- line 1073 - $cmd .= "\n\t".&getMosixCmd("[your-path]/glfSingle -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference > $smVcf.log");
To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
- line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/glfSingle_ut -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference --uniformTsTv > $smVcf.log");

Remarks

The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample).

Contact

Please contact Yancy if you have any questions.

@@ Line 3: / Line 3: @@
 The idea is to use glfSingle to generate sample-specific VCF after pileup, and then replace the glfMultiples step by a merging step. The merging generates a population VCF that looks the same as what would have been the glfMultiples output. Subsequent filtering and imputation steps can follow as usual.
-Ingredients:
+==Ingredients==
 *Index file - same as original UMAKE index. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.index
 *Configuration file - same as original UMAKE conf. An example is at /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf . Note that the glfSingle and merging steps are included by enabling these two steps:
@@ Line 14: / Line 14: @@
    perl /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.pl --conf /net/wonderland/home/yancylo/bin/umake-glfSingle/umake-glfSingle.conf
-Customization:
+==Customization==
 *To change paths to glfSingle and merge_glfS_vcf.py, go to the following lines of umake-glfSingle.pl:
 **line 1000 -  my $cmd = "python [your-path]/merge_glfS_vcf.py --file-list $glfAlias --chr $chr --outfile $vcf > $vcf.log";
@@ Line 20: / Line 20: @@
 *To apply the uniform Ts/Tv model to glfSingle, go to the following line of umake-glfSingle.pl and make the changes in bold:
 **line 1073 - $cmd .= "\n\t".&getMosixCmd("/net/wonderland/home/yancylo/bin/umake-glfSingle/'''glfSingle_ut''' -g $smGlf -b $smVcf -l $allSMs[$i] --minMapQuality 0 --minDepth 1 --maxDepth 100000 --reference '''--uniformTsTv''' > $smVcf.log");
+==Remarks==
+*The --reference option should be enabled for glfSingle, such that it calls the homozygous reference genotypes per sample. This is necessary to distinguish between homref and missing genotypes during merging.
+*The merging program combines across individual-sample VCFs in small chunks of positions, hence it does NOT create a memory issue even when merging across large sample sizes and big regions.
+*One potential concern of this pipeline is file size, since each sample now has its own VCFs (~2Gb for chr20 per sample).
+==Contact==
+Please contact Yancy if you have any questions.

Difference between revisions of "UMAKE-glfSingle"

Revision as of 16:07, 15 January 2014

Contents

Ingredients

Customization

Remarks

Contact

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools