Line 22: |
Line 22: |
| GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory. We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud. | | GenomeStrip is currently included in with the seqshop example data under the svtoolkit directory. We have added the bin/ sub-directory to add a high level pipeline that will run genomestrip in the same framework as GotCloud. |
| | | |
− | == Setup in person at the SeqShop Workshop == | + | == Setup == |
| ''This section is specifically for the SeqShop Workshop computers.'' | | ''This section is specifically for the SeqShop Workshop computers.'' |
− | <div class="mw-collapsible mw-collapsed" style="width:600px"> | + | <div class="mw-collapsible" style="width:600px"> |
| ''If you are not running during the SeqShop Workshop, please skip this section.'' | | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− |
| |
| {{SeqShopLogin}} | | {{SeqShopLogin}} |
− |
| |
− | === Setup your run environment===
| |
− | This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
| |
− |
| |
− | This will setup some environment variables to point you to
| |
− | * [[GotCloud]] program
| |
− | * Tutorial input files
| |
− | * Setup an output directory
| |
− | ** It will leave your output directory from the previous tutorial in tact.
| |
− | source /net/seqshop-server/home/mktrost/seqshop/setup.txt
| |
− | * You won't see any output after running <code>source</code>
| |
− | ** It silently sets up your environment
| |
− | ** If you want to view the detail of the setup, type
| |
− | less /net/seqshop-server/home/mktrost/seqshop/setup.txt
| |
− | and press 'q' to finish.
| |
− |
| |
− | <div class="mw-collapsible mw-collapsed" style="width:200px">
| |
− | View setup.txt
| |
− | <div class="mw-collapsible-content">
| |
− | [[File:setup.png|500px]]
| |
− | </div>
| |
| </div> | | </div> |
| </div> | | </div> |
− | </div>
| |
− |
| |
− | == Setup when running on your own outside of the SeqShop Workshop ==
| |
− | ''This section is specifically for running on your own outside of the SeqShop Workshop.''
| |
− | <div class="mw-collapsible" style="width:600px">
| |
− | ''If you are running during the SeqShop Workshop, please skip this section.''
| |
− | <div class="mw-collapsible-content">
| |
| | | |
| + | === Prerequisite Tutorials === |
| This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]] | | This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical|Alignment Tutorial]] |
| | | |
| It also uses the bam.index file created in the SnpCall Tutorial. If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]] | | It also uses the bam.index file created in the SnpCall Tutorial. If you have not yet run that tutorial, please follow the directions at: [[SeqShop:_Variant_Calling_and_Filtering_for_SNPs_Practical#GotCloud_BAM_Index_File|GotCloud BAM Index File]] |
| | | |
| + | {{SeqShopRemoteEnv}} |
| | | |
− | {{SeqShopRemoteEnv}}
| + | This is the same setup you did for the previous tutorial, but you need to redo it each time you log in. |
− | </div>
| |
− | </div>
| |
| | | |
| == Examining GotCloud/GenomeSTRiP Input files == | | == Examining GotCloud/GenomeSTRiP Input files == |
Line 161: |
Line 132: |
| mills.208620indels.22.sites.bcf.csi | | mills.208620indels.22.sites.bcf.csi |
| mills_indels_hg19.22.sites.bcf | | mills_indels_hg19.22.sites.bcf |
| + | |
| + | The files special for GenomeSTRiP: |
| + | human_g1k_v37.chr22.mask.100.fasta |
| + | human_g1k_v37.chr22.mask.100.fasta.fai |
| + | humgen_g1k_v37_ploidy.chr22.map |
| </div> | | </div> |
| </div> | | </div> |
Line 174: |
Line 150: |
| <div class="mw-collapsible-content" style="width:800px"> | | <div class="mw-collapsible-content" style="width:800px"> |
| genstrip_parameters.txt | | genstrip_parameters.txt |
| + | |
| + | This file contains the GenomeSTRiP configuration settings. |
| </div> | | </div> |
| </div> | | </div> |
Line 298: |
Line 276: |
| '''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''. | | '''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''. |
| | | |
− | perl ${GC}/bin/genomestrip.pl --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT} | + | ${GC}/gotcloud genomestrip --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT} |
| | | |
| Instead, let's look what the output would have looked like. | | Instead, let's look what the output would have looked like. |
Line 334: |
Line 312: |
| | | |
| To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command | | To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command |
− | perl ${GC}/bin/genomestrip.pl --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} | + | ${GC}/gotcloud genomestrip --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} |
− | * <code>${GC}/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline | + | * <code>${GC}/gotcloud genomestrip --run-discovery</code> runs the GenomeSTRiP Discovery Pipeline |
| * <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]]. | | * <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]]. |
| * <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use. | | * <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use. |
Line 346: |
Line 324: |
| ** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS} | | ** The Configuration file cannot read environment variables, so we need to tell it the path to the input files, ${SS} |
| ** Alternatively, gotcloud.conf could be updated to specify the full paths | | ** Alternatively, gotcloud.conf could be updated to specify the full paths |
− | * <code>--out_dir</code> tells GotCloud where to write the output. | + | * <code>--outdir</code> tells GotCloud where to write the output. |
| ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line | | ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line |
| ** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code> | | ** Based on <code>gotcloud.conf</code>, the GenomeSTRiP output will go in <code>$(OUT_DIR)/sv</code> |
Line 402: |
Line 380: |
| | | |
| The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping. | | The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping. |
− | * If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1. | + | * If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1. |
− | perl ${GC}/bin/genomestrip.pl --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --base-prefix ${SS} --outdir ${OUT} | + | ${GC}/gotcloud genomestrip --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 8 --base-prefix ${SS} --outdir ${OUT} |
| | | |
| This will take ~3 minutes to finish. | | This will take ~3 minutes to finish. |
Line 422: |
Line 400: |
| | | |
| You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them. | | You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them. |
− | * If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1. | + | * If running on a small machine, you may want to reduce <code>--numjobs</code> from 8 to 1. |
− | perl ${GC}/bin/genomestrip.pl --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 2 | + | ${GC}/gotcloud genomestrip --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 8 |
| | | |
| This will take ~1 minute to finish. | | This will take ~1 minute to finish. |
| | | |
− | You can also check the output by running | + | You can check the output by running |
| | | |
| zless $OUT/sv/thirdparty/genotype.vcf.gz | | zless $OUT/sv/thirdparty/genotype.vcf.gz |