Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 2: Line 2:  
Main Workshop wiki page: [[SeqShop: December 2014]]
 
Main Workshop wiki page: [[SeqShop: December 2014]]
   −
See the [[Media:Seqshop cnv partb 2014 06.pdf|introductory slides]] for an intro to this tutorial.
+
See [[Media:Seqshop cnv partb 2014 06.pdf|lecture slides]] for the lecture slides associated with this tutorial.
    
== Goals of This Session ==
 
== Goals of This Session ==
Line 24: Line 24:  
== Setup in person at the SeqShop Workshop ==
 
== Setup in person at the SeqShop Workshop ==
 
''This section is specifically for the SeqShop Workshop computers.''
 
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsible" style="width:600px">
+
<div class="mw-collapsible mw-collapsed" style="width:600px">
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 38: Line 38:  
* Setup an output directory
 
* Setup an output directory
 
** It will leave your output directory from the previous tutorial in tact.
 
** It will leave your output directory from the previous tutorial in tact.
  source /home/mktrost/seqshop/setup.txt
+
  source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
* You won't see any output after running <code>source</code>
 
* You won't see any output after running <code>source</code>
 
** It silently sets up your environment
 
** It silently sets up your environment
 
** If you want to view the detail of the setup, type
 
** If you want to view the detail of the setup, type
  less /home/mktrost/seqshop/setup.txt
+
  less /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
and press 'q' to finish.
 
and press 'q' to finish.
   Line 53: Line 53:  
</div>
 
</div>
 
</div>
 
</div>
      
== Setup when running on your own outside of the SeqShop Workshop ==
 
== Setup when running on your own outside of the SeqShop Workshop ==
Line 229: Line 228:  
In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.
 
In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.
 
* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.
 
* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.
 +
 +
== Command Line Usage of GenomeSTRiP pipeline ==
 +
 +
To see how to use GenomeSTRiP pipeline, type
 +
perl $GC/bin/genomestrip.pl
 +
 +
<div class="mw-collapsible mw-collapsed">
 +
''View Results''
 +
<div class="mw-collapsible-content">
 +
ERROR: One of command options among --run-metadata, --run-discovery, --run-genotype, --run-thirdparty must be specified
 +
ERROR: Missing required option, outdir
 +
Usage:
 +
    /net/seqshop-server/home/mktrost/seqshop/gotcloud/bin/genomestrip.pl
 +
    [options]
 +
 +
    Help Options:
 +
      -help                Print out brief help message [OFF]
 +
      -man                Print the full documentation in man page style [OFF]
 +
 +
    Command options:
 +
      -run-metadata        Create metadata [OFF]
 +
      -run-discovery      Run variant discovery and filtering. Can run with --run-metadata together [OFF]
 +
      -run-genotype        Run genotyping - requires to finish run-metadata and run-discovery [OFF]
 +
      -run-thirdparty      Run genotyping and filtering of third-party sites [OFF]
 +
 +
    Options for input/output data:
 +
      -gotcloudroot|gcroot STRGotCloud Root Directory []
 +
      -conf STR            GotCloud configuration files []
 +
      -outdir STR          Override's conf file's OUT_DIR.  Used as the genomestrip output directory unless --out or GENOMESTRIP_OUT is set []
 +
      -list STR            BAM list file containing ID and BAM path []
 +
      -out STR            Output directory which stores subdirectories such as metadata/, discovery/, genotypes/, thirdparty/ unless overriden individually []
 +
      -metadata STR        Output directory to store --run-metadata results. Default is [OUT]/metadata/ []
 +
      -discovery STR      Output directory to store --run-discovery results. Default is [OUT]/discovery/ []
 +
      -genotype STR        Output directory to store --run-genotype results. Default is [OUT]/genotype/ []
 +
      -thirdparty STR      Output directory to store --run-thirdparty results. Default is [OUT]/thirdparty/ []
 +
 +
    Advanced Options:
 +
      -tmp-dir STR        temporary directory to store temporary files. Default is [OUT]/tmp []
 +
      -gs-dir STR          GenomeSTRiP svtoolkit directory []
 +
      -param STR          GenomeSTRIP parameter file []
 +
      -ref STR            Reference FASTA file []
 +
      -mask STR            Reference mask FASTA file []
 +
      -ploidy-map STR      Ploidy map file []
 +
      -mosix-opt STR      MOSIX options []
 +
      -region STR          Region to focus on the variants []
 +
      -unit INT            Number of variants to be genotyped per parallel run [100]
 +
 +
    Additional Inputs:
 +
      -in-vcf STR          Input site VCF files used for --run-genotype or --run-thirdparty. For --run-thirdparty, this argument is required. For --run-genotype, default is [OUT]/discovery/discovery.vcf []
 +
      -pass-only          Genotype only PASS-filtered variants, default is OFF [OFF]
 +
      -skip-rc            Skip precomputing read count [OFF]
 +
      -base-prefix STR    Prefix of all files []
 +
      -bam-prefix STR      Prefix of BAM files []
 +
      -ref-prefix STR      Prefix of Reference FASTA files []
 +
      -no-phonehome        Skip phone home functionality [OFF]
 +
      -make-base-name STR  Specifies the basename for the makefile []
 +
      -verbose            Specifies that additional details are to be printed out [OFF]
 +
      -dry-run            Perform a dry-run that only produces Makefile but not run it [OFF]
 +
      -numjobs INT        Number of jobs to concurrently run [1]
 +
      -autosomes          Perform analysis only on autosomes [OFF]
 +
</div></div>
    
== Running GotCloud/GenomeSTRiP Metadata Pipeline ==
 
== Running GotCloud/GenomeSTRiP Metadata Pipeline ==
Line 235: Line 295:     
In principle, the metadata can be created from the input BAM files by running the following command
 
In principle, the metadata can be created from the input BAM files by running the following command
  perl ${GC}/bin/genomestrip.pl -run-metadata --conf ${SS}/gotcloud.conf --numjobs 2 --base-prefix ${SS} --outdir ${OUT}
+
  perl ${GC}/bin/genomestrip.pl --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}
   −
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take ~50 minutes to finish'''.
+
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.
    
Instead, let's look what the output would have looked like.
 
Instead, let's look what the output would have looked like.
   −
  ls ${OUT}/sv/metadata
+
  ls ${SS}/metadata
   −
  cpt  depth  depth.dat  gcprofile  gcprofiles.zip  genome_sizes.txt  isd  isd.dist.bin  spans  spans.dat
+
computerc.args.list
 +
  cpt
 +
depth
 +
  depth.args.list
 +
  depth.dat
 +
  gcprofile
 +
gcprofiles.list
 +
  gcprofiles.zip
 +
  genome_sizes.txt
 +
  isd
 +
isd.dist.args.list
 +
  isd.dist.bin
 +
rccache
 +
rccache.bin 
 +
rccache.bin.idx 
 +
rccache.list 
 +
rccache.merge 
 +
  spans
 +
spans.args.list
 +
  spans.dat
    
The directory contains metadata output and other intermediate files produced by "GenomeSTRiP SVProcess" step.  
 
The directory contains metadata output and other intermediate files produced by "GenomeSTRiP SVProcess" step.  
Line 254: Line 333:     
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
 
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
  time perl ${SS}/svtoolkit/bin/genomestrip.pl -run-discovery --metadata ${SS}/svtoolkit/metadata --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
+
  perl ${GC}/bin/genomestrip.pl --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
* <code>${SS}/svtoolkit/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
+
* <code>${GC}/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
* <code>--metadata ${SS}/svtoolkit/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
+
* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
 
** The configuration for this test was downloaded with the seqshop input files (same as other tutorials).
 
** The configuration for this test was downloaded with the seqshop input files (same as other tutorials).
Line 294: Line 373:  
<div class="mw-collapsible-content" style="width:800px">
 
<div class="mw-collapsible-content" style="width:800px">
 
       7 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL
 
       7 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL
     17 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL;PAIRSPERSAMPLE
+
     18 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL;PAIRSPERSAMPLE
 
       3 COHERENCE;COVERAGE;DEPTH;PAIRSPERSAMPLE
 
       3 COHERENCE;COVERAGE;DEPTH;PAIRSPERSAMPLE
 
       2 COHERENCE;COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE
 
       2 COHERENCE;COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE
Line 304: Line 383:  
       2 COVERAGE;DEPTH;PAIRSPERSAMPLE
 
       2 COVERAGE;DEPTH;PAIRSPERSAMPLE
 
       4 COVERAGE;DEPTHPVAL
 
       4 COVERAGE;DEPTHPVAL
       5 COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE
+
       4 COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE
 
       5 COVERAGE;PAIRSPERSAMPLE
 
       5 COVERAGE;PAIRSPERSAMPLE
 
</div>
 
</div>
Line 323: Line 402:  
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
 
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
 
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
 
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
  time perl ${SS}/svtoolkit/bin/genomestrip.pl -run-genotype --metadata ${SS}/svtoolkit/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --gcroot ${GC}
+
  perl ${GC}/bin/genomestrip.pl --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --base-prefix ${SS} --outdir ${OUT}
* The added <code>--gcroot ${GC}</code> option directs the pipeline to tabix/bgzip programs found within gotcloud.
      
This will take ~3 minutes to finish.
 
This will take ~3 minutes to finish.
Line 344: Line 422:  
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
 
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
 
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
 
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
  time perl ${SS}/svtoolkit/bin/genomestrip.pl -run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/svtoolkit/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --gcroot ${GC} --numjobs 4
+
  perl ${GC}/bin/genomestrip.pl --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 2
    
This will take ~1 minute to finish.
 
This will take ~1 minute to finish.
Line 372: Line 450:  
</div>
 
</div>
   −
== Starting SNP Call on your own Genome ==
+
 
Go to [[SeqShop: Calling Your Own Genome, December 2014]] so we can run SNP calling overnight.
+
== Return to Workshop Wiki Page ==
 +
Return to main workshop wiki page: [[SeqShop: December 2014]]

Navigation menu