Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Undo revision 13683 by Pjvh (talk) revert all changes involving `--list bam.list`
Line 1: Line 1:  
==Introduction==
 
==Introduction==
Main Workshop wiki page: [[SeqShop: December 2014]]
+
Main Workshop wiki page: [[SeqShop: May 2015]]
    
See [[Media:Seqshop cnv partb 2014 06.pdf|lecture slides]] for the lecture slides associated with this tutorial.
 
See [[Media:Seqshop cnv partb 2014 06.pdf|lecture slides]] for the lecture slides associated with this tutorial.
Line 22: Line 22:  
GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory.  We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud.
 
GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory.  We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud.
   −
== Setup in person at the SeqShop Workshop ==
+
== Setup ==
 
''This section is specifically for the SeqShop Workshop computers.''
 
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsible mw-collapsed" style="width:600px">
+
<div class="mw-collapsible" style="width:600px">
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   
{{SeqShopLogin}}
 
{{SeqShopLogin}}
  −
=== Setup your run environment===
  −
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
  −
  −
This will setup some environment variables to point you to
  −
* [[GotCloud]] program
  −
* Tutorial input files
  −
* Setup an output directory
  −
** It will leave your output directory from the previous tutorial in tact.
  −
source /net/seqshop-server/home/mktrost/seqshop/setup.txt
  −
* You won't see any output after running <code>source</code>
  −
** It silently sets up your environment
  −
** If you want to view the detail of the setup, type
  −
less /net/seqshop-server/home/mktrost/seqshop/setup.txt
  −
and press 'q' to finish.
  −
  −
<div class="mw-collapsible mw-collapsed" style="width:200px">
  −
View setup.txt
  −
<div class="mw-collapsible-content">
  −
[[File:setup.png|500px]]
  −
</div>
  −
</div>
   
</div>
 
</div>
 
</div>
 
</div>
   −
== Setup when running on your own outside of the SeqShop Workshop ==
+
=== Prerequisite Tutorials ===
''This section is specifically for running on your own outside of the SeqShop Workshop.''
  −
<div class="mw-collapsible" style="width:600px">
  −
''If you are running during the SeqShop Workshop, please skip this section.''
  −
<div class="mw-collapsible-content">
  −
 
   
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]]
 
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]]
    
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
 
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
    +
{{SeqShopRemoteEnv}}
   −
{{SeqShopRemoteEnv}}
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
</div>
  −
</div>
      
== Examining GotCloud/GenomeSTRiP Input files ==
 
== Examining GotCloud/GenomeSTRiP Input files ==
Line 161: Line 132:  
  mills.208620indels.22.sites.bcf.csi
 
  mills.208620indels.22.sites.bcf.csi
 
  mills_indels_hg19.22.sites.bcf
 
  mills_indels_hg19.22.sites.bcf
 +
 +
The files special for GenomeSTRiP:
 +
human_g1k_v37.chr22.mask.100.fasta
 +
human_g1k_v37.chr22.mask.100.fasta.fai
 +
humgen_g1k_v37_ploidy.chr22.map
 
</div>
 
</div>
 
</div>
 
</div>
Line 174: Line 150:  
<div class="mw-collapsible-content" style="width:800px">
 
<div class="mw-collapsible-content" style="width:800px">
 
  genstrip_parameters.txt
 
  genstrip_parameters.txt
 +
 +
This file contains the GenomeSTRiP configuration settings.
 
</div>
 
</div>
 
</div>
 
</div>
Line 295: Line 273:     
In principle, the metadata can be created from the input BAM files by running the following command
 
In principle, the metadata can be created from the input BAM files by running the following command
perl ${GC}/bin/genomestrip.pl --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}
      
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.
 
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.
 +
 +
${GC}/gotcloud genomestrip --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}
    
Instead, let's look what the output would have looked like.
 
Instead, let's look what the output would have looked like.
Line 333: Line 312:     
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
 
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
  perl ${GC}/bin/genomestrip.pl --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud genomestrip --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
* <code>${GC}/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
+
* <code>${GC}/gotcloud genomestrip --run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
 
* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
 
* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
Line 345: Line 324:  
** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS}
 
** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS}
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
+
* <code>--outdir</code> tells GotCloud where to write the output.
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code>
 
** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code>
Line 401: Line 380:     
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
 
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
+
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1.
  perl ${GC}/bin/genomestrip.pl --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --base-prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud genomestrip --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --base-prefix ${SS} --outdir ${OUT}
    
This will take ~3 minutes to finish.
 
This will take ~3 minutes to finish.
Line 421: Line 400:     
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
 
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
+
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1.
  perl ${GC}/bin/genomestrip.pl --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 2
+
  ${GC}/gotcloud genomestrip --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 8
    
This will take ~1 minute to finish.
 
This will take ~1 minute to finish.
   −
You can also check the output by running
+
You can check the output by running
    
  zless $OUT/sv/thirdparty/genotype.vcf.gz
 
  zless $OUT/sv/thirdparty/genotype.vcf.gz
Line 452: Line 431:     
== Return to Workshop Wiki Page ==
 
== Return to Workshop Wiki Page ==
Return to main workshop wiki page: [[SeqShop: December 2014]]
+
Return to main workshop wiki page: [[SeqShop: May 2015]]
61

edits

Navigation menu