Line 1: |
Line 1: |
| + | '''Note:''' the latest version of this practical is available at: [[SeqShop: Variant Calling and Filtering for SNPs Practical]] |
| + | * The ones here is the original one from the June workshop (updated to be run from elsewhere) |
| + | |
| + | |
| ==Introduction== | | ==Introduction== |
| See the [[Media:SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial. | | See the [[Media:SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial. |
Line 48: |
Line 52: |
| ''If you are running during the SeqShop Workshop, please skip this section.'' | | ''If you are running during the SeqShop Workshop, please skip this section.'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | === Download the example data ===
| |
− |
| |
− | === Setup your run environment ===
| |
− |
| |
− | Environment variables will be used throughout the tutorial.
| |
− |
| |
− | We recommend that you setup these variables so you won't have to modify every command in the tutorial.
| |
− |
| |
− | <div class="mw-collapsible mw-collapsed" style="width:500px">
| |
− | I'm using bash (replace the paths below with the appropriate paths):
| |
− | <div class="mw-collapsible-content">
| |
− | * Point to where you installed GotCloud
| |
− | *:<pre>export GC=/home/username/gotcloud</pre>
| |
− | * Point to where you installed the seqshop files
| |
− | *:<pre>export SS=/home/username/seqshop/</pre>
| |
− | * Point to where you want the output to go
| |
− | *:<pre>export OUT=/home/username/seqshop_output/</pre>
| |
− | </div>
| |
− | </div>
| |
− |
| |
− | <div class="mw-collapsible mw-collapsed" style="width:500px">
| |
− | I'm using tcsh (replace the paths below with the appropriate paths):
| |
− | <div class="mw-collapsible-content">
| |
− | * Point to where you installed GotCloud
| |
− | *:<pre>setenv GC /home/username/gotcloud</pre>
| |
− | * Point to where you installed the seqshop files
| |
− | *:<pre>setenv SS /home/username/seqshop/</pre>
| |
− | * Point to where you want the output to go
| |
− | *:<pre>setenv OUT /home/username/seqshop_output/</pre>
| |
− | </div>
| |
− | </div>
| |
| | | |
− | </div>
| + | This tutorial builds on the alignment tutorial, if you have not already, please first run that tutorial: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014|Alignment Tutorial]] |
− | </div>
| |
| | | |
| + | {{SeqShopRemoteEnv}} |
| | | |
| == Examining GotCloud SnpCall Input files == | | == Examining GotCloud SnpCall Input files == |
Line 88: |
Line 61: |
| Per sample BAM files contain sequence reads that are mapped to positions in the genome. | | Per sample BAM files contain sequence reads that are mapped to positions in the genome. |
| | | |
− | For a reminder on how to look at/read BAM files, see: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#BAM_Files|SeqShop Aligment: BAM Files]] | + | For a reminder on how to look at/read BAM files, see: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#BAM_Files|SeqShop Aligment: BAM Files]] |
| | | |
− | For this tutorial, we will use the 4 BAMs produced in the [[SeqShop: Sequence Mapping and Assembly Practical]] as well as with 58 BAMs that were pre-aligned to that 1MB region of chromosome 22. | + | For this tutorial, we will use the 4 BAMs produced in the [[SeqShop: Sequence Mapping and Assembly Practical, June 2014]] as well as with 58 BAMs that were pre-aligned to that 1MB region of chromosome 22. |
| | | |
| === Reference Files === | | === Reference Files === |
Line 117: |
Line 90: |
| <li>View Screenshot</li> | | <li>View Screenshot</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | [[File:RefDir.png|500px]] | + | [[File:RefDir.png|700px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 154: |
Line 127: |
| ;Do you notice a difference between this index and yours? | | ;Do you notice a difference between this index and yours? |
| <ul> | | <ul> |
− | <div class="mw-collapsible mw-collapsed" style="width:500px"> | + | <div class="mw-collapsible mw-collapsed" style="width:550px"> |
| <li>Answer:</li> | | <li>Answer:</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
Line 160: |
Line 133: |
| <li>It doesn't have a full path to the BAM file, while your index has /home/...</li> | | <li>It doesn't have a full path to the BAM file, while your index has /home/...</li> |
| [[File:Bamindex1.png|300px]] | | [[File:Bamindex1.png|300px]] |
− | <li>That's ok, <code>gotcloud.conf</code> contains the path to those BAMs</li> | + | <li>That's ok, we will use the <code>--base_prefix ${SS}</code> command-line option to prefix the BAM paths</li> |
− | [[File:BamindexConf.png|300px]]
| + | <li>Alternatively, we could have set BAM_INDEX in <code>gotcloud.conf</code> to the path to the BAMs |
| + | <pre>BAM_INDEX = /home/username/seqshop/example</pre> </li> |
| + | <ul> |
| + | <li>NOTE: the conf file can't interpret ${SS} environment variables or '~', so you would have to specify the full path</li> |
| + | <li>We just used the command-line option for this tutorial since this path will vary by user.</li> |
| + | </ul> |
| </div> | | </div> |
| </div> | | </div> |
Line 170: |
Line 148: |
| We need to add these BAMs to our index | | We need to add these BAMs to our index |
| * Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline | | * Append the bam.index from the pre-aligned BAMs to the one you generated from the alignment pipeline |
| + | ** '''Be sure to do this command just once''' |
| cat ${SS}/bams/bam.index >> ${OUT}/bam.index | | cat ${SS}/bams/bam.index >> ${OUT}/bam.index |
| * ">>" will append to the file that follows it | | * ">>" will append to the file that follows it |
− | * Be sure to do this command just once
| |
| ** Check that your BAM index is the correct size | | ** Check that your BAM index is the correct size |
| **:<pre>wc -l ${OUT}/bam.index</pre> | | **:<pre>wc -l ${OUT}/bam.index</pre> |
Line 200: |
Line 178: |
| We will use the same configuration file as we used yesterday in GotCloud Align. | | We will use the same configuration file as we used yesterday in GotCloud Align. |
| | | |
− | See [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details | + | See [[SeqShop:_Sequence Mapping and Assembly Practical, June 2014#GotCloud Configuration File|SeqShop: Alignment: GotCloud Configuration File]] for more details |
| * Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X). | | * Note we want to limit snpcall to just chr22 so the configuration already has <code>CHRS = 22</code> (default was 1-22 & X). |
| | | |
Line 211: |
Line 189: |
| Now that we have all of our input files, we need just a simple command to run: | | Now that we have all of our input files, we need just a simple command to run: |
| ${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT} | | ${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT} |
| + | * <code>${GC}/gotcloud</code> runs GotCloud |
| + | * <code>align</code> tells GotCloud you want to run the alignment pipeline. |
| + | * <code>--conf</code> tells GotCloud the name of the configuration file to use. |
| + | ** The configuration for this test was downloaded with the seqshop input files. |
| * --numjobs tells GotCloud how many jobs to run in parallel | | * --numjobs tells GotCloud how many jobs to run in parallel |
| ** Depends on your system | | ** Depends on your system |
| * --region 22:36000000-37000000 | | * --region 22:36000000-37000000 |
| ** The sample files are just a small region of chromosome 22, so to save time, we tell GotCloud to ignore the other regions | | ** The sample files are just a small region of chromosome 22, so to save time, we tell GotCloud to ignore the other regions |
| + | * <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths. |
| + | ** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS} |
| + | ** Alternatively, gotcloud.conf could be updated to specify the full paths |
| + | * <code>--out_dir</code> tells GotCloud where to write the output. |
| + | ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line |
| | | |
| <div class="mw-collapsible mw-collapsed" style="width:500px"> | | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
Line 386: |
Line 373: |
| * Beagle will take about 2-3 minutes to complete | | * Beagle will take about 2-3 minutes to complete |
| * Thunder will automatically run and will take another 3-4 minutes | | * Thunder will automatically run and will take another 3-4 minutes |
| + | |
| + | <div class="mw-collapsible mw-collapsed" style="width:350px"> |
| + | When completed, it should look like this: |
| + | <div class="mw-collapsible-content"> |
| + | [[File:GcldrefineOut.png]] |
| + | </div> |
| + | </div> |
| | | |
| === Genotype Refinement Output === | | === Genotype Refinement Output === |
| | | |
| ; What's new in the output directory? | | ; What's new in the output directory? |
| + | |
| <ul> | | <ul> |
| <div class="mw-collapsible mw-collapsed" style="width:500px"> | | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| <li>Answer</li> | | <li>Answer</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| + | :<pre>ls ${OUT}</pre> |
| <ul> | | <ul> |
| <li><code>beagle</code> directory : Beagle output</li> | | <li><code>beagle</code> directory : Beagle output</li> |
| <li><code>thunder</code> directory : Thunder output</li> | | <li><code>thunder</code> directory : Thunder output</li> |
− | <li><code>umake.beagle.conf</code> : Configuration values used for GotCloud beagle</li> | + | <li><code>umake.beagle.*</code> : Contain the configuration & steps used in GotCloud beagle</li> |
− | <li><code>umake.beagle.Makefile</code> : GNU makefile for commands run as part of GotCloud beagle</li>
| |
− | <li><code>umake.beagle.Makefile.log</code> : Log of the GotCloud beagle run</li>
| |
| <li><code>umake.thunder.*</code> files : Contain the configuration & steps used in GotCloud thunder</li> | | <li><code>umake.thunder.*</code> files : Contain the configuration & steps used in GotCloud thunder</li> |
| </ul> | | </ul> |
Line 406: |
Line 400: |
| </ul> | | </ul> |
| | | |
− | Let's take a look at that interesting location we found in the [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#Accessing_BAMs_by_Position|alignment tutorial]] : chromosome 22, positions 36907000-36907100 | + | Let's take a look at that interesting location we found in the [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Accessing_BAMs_by_Position|alignment tutorial]] : chromosome 22, positions 36907000-36907100 |
| | | |
| Use tabix to extract that from the VCFs: | | Use tabix to extract that from the VCFs: |
Line 490: |
Line 484: |
| Let's get some information on the BEAGLE VCF: | | Let's get some information on the BEAGLE VCF: |
| | | |
| + | perl ${SS}/ext/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle |
| | | |
− | perl ${EXT}/bed-diff.pl --vcf1 ${REF}/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
| |
| | | |
| Look at the results: | | Look at the results: |
Line 514: |
Line 508: |
| | | |
| Now, let's see if it improved after running Thunder VCF: | | Now, let's see if it improved after running Thunder VCF: |
− | perl ${EXT}/bed-diff.pl --vcf1 ${REF}/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.thunder | + | perl ${SS}/ext/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.thunder |
| | | |
| Look at the results: | | Look at the results: |