Line 1: |
Line 1: |
| == Introduction == | | == Introduction == |
− | See the [[Media:SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial. | + | Main Workshop wiki page: [[SeqShop: December 2014]] |
| + | |
| + | See the [[Media:Dec2014 SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial. |
| | | |
| == Goals of This Session == | | == Goals of This Session == |
Line 11: |
Line 13: |
| == Setup in person at the SeqShop Workshop == | | == Setup in person at the SeqShop Workshop == |
| ''This section is specifically for the SeqShop Workshop computers.'' | | ''This section is specifically for the SeqShop Workshop computers.'' |
− | <div class="mw-collapsible" style="width:600px"> | + | <div class="mw-collapsible mw-collapsed" style="width:600px"> |
| ''If you are not running during the SeqShop Workshop, please skip this section.'' | | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
Line 24: |
Line 26: |
| * Tutorial input files | | * Tutorial input files |
| * Setup an output directory | | * Setup an output directory |
− | source /home/mktrost/seqshop/setup.txt | + | source /net/seqshop-server/home/mktrost/seqshop/setup.txt |
| * You won't see any output after running <code>source</code> | | * You won't see any output after running <code>source</code> |
| ** It silently sets up your environment | | ** It silently sets up your environment |
| + | |
| + | Look at setup.txt |
| + | cat /net/seqshop-server/home/mktrost/seqshop/setup.txt |
| <div class="mw-collapsible mw-collapsed" style="width:200px"> | | <div class="mw-collapsible mw-collapsed" style="width:200px"> |
− | View setup.txt
| + | * setup.txt screenshot |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| [[File:setup.png|500px]] | | [[File:setup.png|500px]] |
Line 41: |
Line 46: |
| ''If you are running during the SeqShop Workshop, please skip this section.'' | | ''If you are running during the SeqShop Workshop, please skip this section.'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| + | |
| + | === Download the example data === |
| + | Download and untar file containing the example data used in the practicals: |
| + | mkdir -p ~/seqshop |
| + | cd ~/seqshop |
| + | wget http://csg.sph.umich.edu/mktrost/seqshopExampleDec2014.tar.gz |
| + | tar xvf seqshopExampleDec2014.tar.gz |
| + | |
| + | You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file. |
| + | |
| === Download & Build GotCloud === | | === Download & Build GotCloud === |
| If you do not already have GotCloud: | | If you do not already have GotCloud: |
− | * cd to where you want GotCloud installed (you can change this to any directory you want)
| |
− | mkdir -p ~/seqshop
| |
− | cd ~/seqshop/
| |
| * download, decompress, and build the version of gotcloud that was tested with this tutorial: | | * download, decompress, and build the version of gotcloud that was tested with this tutorial: |
− | wget https://github.com/statgen/gotcloud/archive/gotcloud.workshop.tar.gz | + | wget https://github.com/statgen/gotcloud/archive/gotcloud.1.15.tar.gz |
− | tar xvf gotcloud.workshop.tar.gz | + | tar xvf gotcloud.1.15.tar.gz |
− | mv gotcloud-gotcloud.workshop gotcloud | + | mv gotcloud-gotcloud.1.15 gotcloud |
| cd gotcloud/src | | cd gotcloud/src |
| make | | make |
Line 55: |
Line 67: |
| | | |
| Remember the path to gotcloud/ that is what you will need to set your GC variable to. | | Remember the path to gotcloud/ that is what you will need to set your GC variable to. |
− |
| |
− | === Download the example data ===
| |
− | Download and untar file containing the example data used in the practicals:
| |
− | wget http://www.sph.umich.edu/csg/mktrost/seqshopExample.tar.gz
| |
− | tar xvf seqshopExample.tar.gz
| |
− |
| |
− | You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file.
| |
| | | |
| {{SeqShopRemoteEnv}} | | {{SeqShopRemoteEnv}} |
| + | </div> |
| + | </div> |
| | | |
| == Examining [[GotCloud]] Align Input Files == | | == Examining [[GotCloud]] Align Input Files == |
Line 182: |
Line 189: |
| ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000-36000100 | | ${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000-36000100 |
| | | |
− | === GotCloud FASTQ Index File === | + | === GotCloud FASTQ List File === |
− | The FASTQ index file is created by you to tell GotCloud about each of your FASTQ files: | + | The [[GotCloud:_Alignment_Pipeline#FASTQ_List_File|FASTQ list file]] is created by you to tell GotCloud about each of your FASTQ files: |
| * Where to find it | | * Where to find it |
| * Sample name | | * Sample name |
| ** Each sample can have multiple FASTQs | | ** Each sample can have multiple FASTQs |
| ** Each FASTQ is for a single sample | | ** Each FASTQ is for a single sample |
− | * Run identifier | + | * Run identifier (optional) |
| ** For recalibration we need to know which reads were in the same run. | | ** For recalibration we need to know which reads were in the same run. |
| | | |
− | FASTQ Index Format: | + | FASTQ List Format: |
| * Tab delimited | | * Tab delimited |
| * Starts with a header line | | * Starts with a header line |
Line 197: |
Line 204: |
| * One line per paired-end read (only 1 line per pair). | | * One line per paired-end read (only 1 line per pair). |
| | | |
− | Let's look a look at the index file I prepared for this tutorial: | + | Let's look a look at the FASTQ list file I prepared for this tutorial: |
− | less -S ${SS}/align.index | + | less -S ${SS}/fastq.list |
| | | |
| Remember, use <code>'q'</code> to exit out of <code>less</code> | | Remember, use <code>'q'</code> to exit out of <code>less</code> |
| q | | q |
| | | |
− | ; Which samples had multiple runs? | + | ; Which samples have multiple paired end reads? |
| <ul> | | <ul> |
| <div class="mw-collapsible mw-collapsed" style="width:500px"> | | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| <li>Need a reminder of the format?</li> | | <li>Need a reminder of the format?</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | [[File:fqindex.png|750px]] | + | [[File:fqindexNew.png|500px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 221: |
Line 228: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| <ul> | | <ul> |
− | <li>Use cut to extract just the MERGE_NAME & RGID fields </li> | + | <li>Use cut to extract just the SAMPLE & FASTQ2 fields </li> |
− | cut -f 1,4 ${SS}/align.index | + | cut -f 1,3 ${SS}/fastq.list |
| </ul> | | </ul> |
| </div> | | </div> |
Line 231: |
Line 238: |
| <ul> | | <ul> |
| <li>HG00553 & HG00640</li> | | <li>HG00553 & HG00640</li> |
− | <li>They have multiple unique values in the RGID field</li> | + | <li>They have multiple FASTQ2 files listed</li> |
− | [[File:fqindexRG.png|800px]] | + | [[File:FqListFASTQ2.png|400px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 239: |
Line 246: |
| | | |
| | | |
− | How do you point GotCloud to your index file? | + | How do you point GotCloud to your FASTQ list file? |
− | * Command-line <code>--index_file</code> option | + | * Command-line <code>--list</code> option |
| : or | | : or |
− | * Configuration file <code>INDEX_FILE</code> setting. | + | * Configuration file <code>FASTQ_LIST</code> setting. |
| | | |
| The command-line setting takes precedence over the configuration file setting. | | The command-line setting takes precedence over the configuration file setting. |
Line 270: |
Line 277: |
| <ul> | | <ul> |
| <li>You would change <code>REF_DIR</code> to the new path</li> | | <li>You would change <code>REF_DIR</code> to the new path</li> |
− | [[File:gcConf.png|800px]] | + | [[File:gcConfNew.png|600px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 281: |
Line 288: |
| | | |
| Now that we have all of our input files, we need just a simple command to run them | | Now that we have all of our input files, we need just a simple command to run them |
− | ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numcs 2 --base_prefix ${SS} --outdir ${OUT} | + | * When running at home if you don't have 4 CPUs, reduce the <code>--numjobs</code> setting (it will take longer to run). |
| + | ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numjobs 4 --base_prefix ${SS} --outdir ${OUT} |
| | | |
| * <code>${GC}/gotcloud</code> runs GotCloud | | * <code>${GC}/gotcloud</code> runs GotCloud |
Line 287: |
Line 295: |
| * <code>--conf</code> tells GotCloud the name of the configuration file to use. | | * <code>--conf</code> tells GotCloud the name of the configuration file to use. |
| ** The configuration for this test was downloaded with the seqshop input files. | | ** The configuration for this test was downloaded with the seqshop input files. |
− | * <code>--numcs</code> means to run 2 samples at a time. | + | * <code>--numjobs</code> means to run 4 samples at a time. |
| ** How many you can run concurrently depends on your system. | | ** How many you can run concurrently depends on your system. |
| * <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths. | | * <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths. |
| ** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS} | | ** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS} |
| ** Alternatively, gotcloud.conf could be updated to specify the full paths | | ** Alternatively, gotcloud.conf could be updated to specify the full paths |
− | * <code>--out_dir</code> tells GotCloud where to write the output. | + | * <code>--outdir</code> tells GotCloud where to write the output. |
| ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line | | ** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line |
| | | |
− | [[File:gcalignStart.png|850px]] | + | [[File:gcalignStartNew.png|650px]] |
| | | |
− | This should take 1-3 minutes to run. | + | This should take about 1 minute to run. |
| + | |
| + | It should end with a line like: <code>Processing finished in 54 secs with no errors reported</code> |
| + | * The <code>WARNING</code> messages are just to let you know that the default Read Group field settings are being used. |
| | | |
− | It should end with a line like: <code>Processing finished in 133 secs with no errors reported</code>
| |
| | | |
| If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off. | | If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off. |
| + | |
| | | |
| Inside GotCloud align, not only sequence alignment but also pre-processing of sequence data, including deduplication and base quality recalibration are performed along with quality assessment, as illustrated below. | | Inside GotCloud align, not only sequence alignment but also pre-processing of sequence data, including deduplication and base quality recalibration are performed along with quality assessment, as illustrated below. |
Line 311: |
Line 322: |
| Let's look at the output directory: | | Let's look at the output directory: |
| ls ${OUT} | | ls ${OUT} |
− | [[File:gcalignOutM.png|600px]] | + | [[File:gcalignOutMNew.png|600px]] |
| | | |
| === Quality Control Files === | | === Quality Control Files === |
Line 338: |
Line 349: |
| <li>No, FREEMIX = 0.00000 (<0.03)</li> | | <li>No, FREEMIX = 0.00000 (<0.03)</li> |
| </ul> | | </ul> |
− | [[File:Contam1.png|700px]] | + | [[File:Contam1New.png|700px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 356: |
Line 367: |
| <ul> | | <ul> |
| <li> 98.93% Mapped</li> | | <li> 98.93% Mapped</li> |
− | <li>7.43 MeanDepth</li> | + | <li>7.44 MeanDepth</li> |
| </ul> | | </ul> |
− | [[File:qplots.png|200px]] | + | [[File:qplotsNew.png|200px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 379: |
Line 390: |
| <li> No, it is well above the line</li> | | <li> No, it is well above the line</li> |
| <li> This is due to the small region used for recalibration</li> | | <li> This is due to the small region used for recalibration</li> |
− | [[File:Qplotpdf.png|400px]] | + | [[File:QplotpdfNew.png|400px]] |
| <li> Look at the PDF I produced when I ran the whole genome:</li> | | <li> Look at the PDF I produced when I ran the whole genome:</li> |
| evince ${SS}/ext/HG00551.wg.qplot.pdf& | | evince ${SS}/ext/HG00551.wg.qplot.pdf& |
Line 399: |
Line 410: |
| [[File:GcalignOutBAMm.png|600px]] | | [[File:GcalignOutBAMm.png|600px]] |
| | | |
− | Let's examine at the first 5 lines of the BAM file using [http://samtools.sourceforge.net/samtools.shtml#3 samtools view]: | + | Let's examine at the first 7 lines of the BAM file using [http://samtools.sourceforge.net/samtools.shtml#3 samtools view]: |
− | ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam|head -n 5 | + | ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam|head -n 7 |
| | | |
| ; What are the chromosome and position of the first record in the BAM file? | | ; What are the chromosome and position of the first record in the BAM file? |
Line 414: |
Line 425: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| <ul> | | <ul> |
− | <li>Chr 22, Pos: 16114122</li> | + | <li>Chr 22, Pos: 16918656</li> |
| </ul> | | </ul> |
− | [[File:BamRec.png|650px]] | + | [[File:BamRecNew.png|650px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 450: |
Line 461: |
| <li>We will have to remember this region when we run snpcall to see what it says.</li> | | <li>We will have to remember this region when we run snpcall to see what it says.</li> |
| </ul> | | </ul> |
− | [[File:tview.png|750px]] | + | [[File:tviewNew.png|650px]] |
| </div> | | </div> |
| </div> | | </div> |
Line 479: |
Line 490: |
| ''If you are not running during the SeqShop Workshop, please skip this section.'' | | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| + | To logout of seqshop1/2/3/4, type: |
| + | exit |
| To logout of seqshop-server, type: | | To logout of seqshop-server, type: |
| exit | | exit |
Line 486: |
Line 499: |
| </div> | | </div> |
| </div> | | </div> |
| + | |
| + | == Return to Workshop Wiki Page == |
| + | Return to main workshop wiki page: [[SeqShop: December 2014]] |