Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Undo revision 13683 by Pjvh (talk) revert all changes involving `--list bam.list`
Line 22: Line 22:  
GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory.  We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud.
 
GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory.  We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud.
   −
== Setup in person at the SeqShop Workshop ==
+
== Setup ==
 
''This section is specifically for the SeqShop Workshop computers.''
 
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsible mw-collapsed" style="width:600px">
+
<div class="mw-collapsible" style="width:600px">
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   
{{SeqShopLogin}}
 
{{SeqShopLogin}}
  −
=== Setup your run environment===
  −
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
  −
  −
This will setup some environment variables to point you to
  −
* [[GotCloud]] program
  −
* Tutorial input files
  −
* Setup an output directory
  −
** It will leave your output directory from the previous tutorial in tact.
  −
source /net/seqshop-server/home/mktrost/seqshop/setup.txt
  −
* You won't see any output after running <code>source</code>
  −
** It silently sets up your environment
  −
** If you want to view the detail of the setup, type
  −
less /net/seqshop-server/home/mktrost/seqshop/setup.txt
  −
and press 'q' to finish.
  −
  −
<div class="mw-collapsible mw-collapsed" style="width:200px">
  −
View setup.txt
  −
<div class="mw-collapsible-content">
  −
[[File:setup.png|500px]]
  −
</div>
   
</div>
 
</div>
 
</div>
 
</div>
</div>
  −
  −
== Setup when running on your own outside of the SeqShop Workshop ==
  −
''This section is specifically for running on your own outside of the SeqShop Workshop.''
  −
<div class="mw-collapsible" style="width:600px">
  −
''If you are running during the SeqShop Workshop, please skip this section.''
  −
<div class="mw-collapsible-content">
      +
=== Prerequisite Tutorials ===
 
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]]
 
This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]]
    
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
 
It also uses the bam.index file created in the SnpCall Tutorial.  If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]]
    +
{{SeqShopRemoteEnv}}
   −
{{SeqShopRemoteEnv}}
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
</div>
  −
</div>
      
== Examining GotCloud/GenomeSTRiP Input files ==
 
== Examining GotCloud/GenomeSTRiP Input files ==
Line 161: Line 132:  
  mills.208620indels.22.sites.bcf.csi
 
  mills.208620indels.22.sites.bcf.csi
 
  mills_indels_hg19.22.sites.bcf
 
  mills_indels_hg19.22.sites.bcf
 +
 +
The files special for GenomeSTRiP:
 +
human_g1k_v37.chr22.mask.100.fasta
 +
human_g1k_v37.chr22.mask.100.fasta.fai
 +
humgen_g1k_v37_ploidy.chr22.map
 
</div>
 
</div>
 
</div>
 
</div>
Line 174: Line 150:  
<div class="mw-collapsible-content" style="width:800px">
 
<div class="mw-collapsible-content" style="width:800px">
 
  genstrip_parameters.txt
 
  genstrip_parameters.txt
 +
 +
This file contains the GenomeSTRiP configuration settings.
 
</div>
 
</div>
 
</div>
 
</div>
Line 298: Line 276:  
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.
 
'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.
   −
  perl ${GC}/bin/genomestrip.pl --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud genomestrip --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}
    
Instead, let's look what the output would have looked like.
 
Instead, let's look what the output would have looked like.
Line 334: Line 312:     
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
 
To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command
  perl ${GC}/bin/genomestrip.pl --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud genomestrip --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}
* <code>${GC}/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
+
* <code>${GC}/gotcloud genomestrip --run-discovery</code> runs the GenomeSTRiP Discovery Pipeline
 
* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
 
* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
 
* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.
Line 346: Line 324:  
** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS}
 
** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS}
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
+
* <code>--outdir</code> tells GotCloud where to write the output.
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code>
 
** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code>
Line 402: Line 380:     
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
 
The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.  
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
+
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1.
  perl ${GC}/bin/genomestrip.pl --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --base-prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud genomestrip --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --base-prefix ${SS} --outdir ${OUT}
    
This will take ~3 minutes to finish.
 
This will take ~3 minutes to finish.
Line 422: Line 400:     
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
 
You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.
+
* If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1.
  perl ${GC}/bin/genomestrip.pl --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 2
+
  ${GC}/gotcloud genomestrip --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 8
    
This will take ~1 minute to finish.
 
This will take ~1 minute to finish.
   −
You can also check the output by running
+
You can check the output by running
    
  zless $OUT/sv/thirdparty/genotype.vcf.gz
 
  zless $OUT/sv/thirdparty/genotype.vcf.gz
61

edits

Navigation menu