Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  
== Introduction ==
 
== Introduction ==
See the [[Media:SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial.
+
Main Workshop wiki page: [[SeqShop: December 2014]]
 +
 
 +
See the [[Media:Dec2014 SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial.
    
== Goals of This Session ==
 
== Goals of This Session ==
Line 11: Line 13:  
== Setup in person at the SeqShop Workshop ==
 
== Setup in person at the SeqShop Workshop ==
 
''This section is specifically for the SeqShop Workshop computers.''
 
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsible" style="width:600px">
+
<div class="mw-collapsible mw-collapsed" style="width:600px">
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 24: Line 26:  
* Tutorial input files
 
* Tutorial input files
 
* Setup an output directory
 
* Setup an output directory
  source /home/mktrost/seqshop/setup.txt
+
  source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
* You won't see any output after running <code>source</code>
 
* You won't see any output after running <code>source</code>
 
** It silently sets up your environment
 
** It silently sets up your environment
 +
 +
Look at setup.txt
 +
cat /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
 
<div class="mw-collapsible mw-collapsed" style="width:200px">
View setup.txt
+
* setup.txt screenshot
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
[[File:setup.png|500px]]
 
[[File:setup.png|500px]]
Line 38: Line 43:  
== Setup when running on your own outside of the SeqShop Workshop ==
 
== Setup when running on your own outside of the SeqShop Workshop ==
 
''This section is specifically for running on your own outside of the SeqShop Workshop.''
 
''This section is specifically for running on your own outside of the SeqShop Workshop.''
<div class="mw-collapsible mw-collapsed" style="width:600px">
+
<div class="mw-collapsible" style="width:600px">
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 +
 +
=== Download the example data ===
 +
Download and untar file containing the example data used in the practicals:
 +
mkdir -p ~/seqshop
 +
cd ~/seqshop
 +
wget http://csg.sph.umich.edu/mktrost/seqshopExampleDec2014.tar.gz
 +
tar xvf seqshopExampleDec2014.tar.gz
 +
 +
You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file.
 +
 
=== Download & Build GotCloud ===
 
=== Download & Build GotCloud ===
 
If you do not already have GotCloud:
 
If you do not already have GotCloud:
* cd to where you want GotCloud installed (you can change this to any directory you want)
  −
mkdir -p ~/seqshop
  −
cd ~/seqshop/
   
* download, decompress, and build the version of gotcloud that was tested with this tutorial:
 
* download, decompress, and build the version of gotcloud that was tested with this tutorial:
  wget https://github.com/statgen/gotcloud/archive/gotcloud.workshop.tar.gz
+
  wget https://github.com/statgen/gotcloud/archive/gotcloud.1.15.tar.gz
  tar xvf gotcloud.workshop.tar.gz
+
  tar xvf gotcloud.1.15.tar.gz
  mv gotcloud-gotcloud.workshop gotcloud
+
  mv gotcloud-gotcloud.1.15 gotcloud
 
  cd gotcloud/src
 
  cd gotcloud/src
 
  make
 
  make
Line 55: Line 67:     
Remember the path to gotcloud/ that is what you will need to set your GC variable to.
 
Remember the path to gotcloud/ that is what you will need to set your GC variable to.
  −
=== Download the example data ===
  −
Download and untar file containing the example data used in the practicals:
  −
wget http://www.sph.umich.edu/csg/mktrost/seqshopExample.tar.gz
  −
tar xvf seqshopExample.tar.gz
  −
  −
You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file.
      
{{SeqShopRemoteEnv}}
 
{{SeqShopRemoteEnv}}
 +
</div>
 +
</div>
    
== Examining [[GotCloud]] Align Input Files ==
 
== Examining [[GotCloud]] Align Input Files ==
Line 281: Line 288:     
Now that we have all of our input files, we need just a simple command to run them
 
Now that we have all of our input files, we need just a simple command to run them
  ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numjobs 2 --base_prefix ${SS} --outdir ${OUT}
+
* When running at home if you don't have 4 CPUs, reduce the <code>--numjobs</code> setting (it will take longer to run).
 +
  ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numjobs 4 --base_prefix ${SS} --outdir ${OUT}
    
* <code>${GC}/gotcloud</code> runs GotCloud
 
* <code>${GC}/gotcloud</code> runs GotCloud
Line 287: Line 295:  
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
 
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
 
** The configuration for this test was downloaded with the seqshop input files.
 
** The configuration for this test was downloaded with the seqshop input files.
* <code>--numjobs</code> means to run 2 samples at a time.
+
* <code>--numjobs</code> means to run 4 samples at a time.
 
** How many you can run concurrently depends on your system.
 
** How many you can run concurrently depends on your system.
 
* <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths.
 
* <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths.
 
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
 
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
+
* <code>--outdir</code> tells GotCloud where to write the output.
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
   −
[[File:gcalignStart.png|850px]]
+
[[File:gcalignStartNew.png|650px]]
 +
 
 +
This should take about 1 minute to run.
   −
This should take 1-3 minutes to run.
+
It should end with a line like: <code>Processing finished in 54 secs with no errors reported</code>
 +
* The <code>WARNING</code> messages are just to let you know that the default Read Group field settings are being used.
   −
It should end with a line like: <code>Processing finished in 133 secs with no errors reported</code>
      
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
 
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
 +
    
Inside GotCloud align, not only sequence alignment but also pre-processing of sequence data, including deduplication and base quality recalibration are performed along with quality assessment, as illustrated below.
 
Inside GotCloud align, not only sequence alignment but also pre-processing of sequence data, including deduplication and base quality recalibration are performed along with quality assessment, as illustrated below.
Line 311: Line 322:  
Let's look at the output directory:
 
Let's look at the output directory:
 
  ls ${OUT}
 
  ls ${OUT}
[[File:gcalignOutM.png|600px]]
+
[[File:gcalignOutMNew.png|600px]]
    
=== Quality Control Files ===
 
=== Quality Control Files ===
Line 338: Line 349:  
<li>No, FREEMIX = 0.00000 (<0.03)</li>
 
<li>No, FREEMIX = 0.00000 (<0.03)</li>
 
</ul>
 
</ul>
[[File:Contam1.png|700px]]
+
[[File:Contam1New.png|700px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 356: Line 367:  
<ul>
 
<ul>
 
<li> 98.93% Mapped</li>
 
<li> 98.93% Mapped</li>
<li>7.43 MeanDepth</li>
+
<li>7.44 MeanDepth</li>
 
</ul>
 
</ul>
[[File:qplots.png|200px]]
+
[[File:qplotsNew.png|200px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 379: Line 390:  
<li> No, it is well above the line</li>
 
<li> No, it is well above the line</li>
 
<li> This is due to the small region used for recalibration</li>
 
<li> This is due to the small region used for recalibration</li>
[[File:Qplotpdf.png|400px]]
+
[[File:QplotpdfNew.png|400px]]
 
<li> Look at the PDF I produced when I ran the whole genome:</li>  
 
<li> Look at the PDF I produced when I ran the whole genome:</li>  
 
  evince ${SS}/ext/HG00551.wg.qplot.pdf&
 
  evince ${SS}/ext/HG00551.wg.qplot.pdf&
Line 399: Line 410:  
[[File:GcalignOutBAMm.png|600px]]
 
[[File:GcalignOutBAMm.png|600px]]
   −
Let's examine at the first 5 lines of the BAM file using [http://samtools.sourceforge.net/samtools.shtml#3 samtools view]:
+
Let's examine at the first 7 lines of the BAM file using [http://samtools.sourceforge.net/samtools.shtml#3 samtools view]:
  ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam|head -n 5
+
  ${GC}/bin/samtools view -h ${OUT}/bams/HG00551.recal.bam|head -n 7
    
; What are the chromosome and position of the first record in the BAM file?
 
; What are the chromosome and position of the first record in the BAM file?
Line 414: Line 425:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
<ul>
 
<ul>
<li>Chr 22, Pos: 16114122</li>
+
<li>Chr 22, Pos: 16918656</li>
 
</ul>
 
</ul>
[[File:BamRec.png|650px]]
+
[[File:BamRecNew.png|650px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 450: Line 461:  
<li>We will have to remember this region when we run snpcall to see what it says.</li>
 
<li>We will have to remember this region when we run snpcall to see what it says.</li>
 
</ul>
 
</ul>
[[File:tview.png|750px]]
+
[[File:tviewNew.png|650px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 479: Line 490:  
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 +
To logout of seqshop1/2/3/4, type:
 +
exit
 
To logout of seqshop-server, type:
 
To logout of seqshop-server, type:
 
  exit
 
  exit
Line 486: Line 499:  
</div>
 
</div>
 
</div>
 
</div>
 +
 +
== Return to Workshop Wiki Page ==
 +
Return to main workshop wiki page: [[SeqShop: December 2014]]

Navigation menu