Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Sequence Mapping and Assembly Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 
== Introduction ==
 
== Introduction ==
 
See the [[Media:SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial.
 
See the [[Media:SeqShop - GotCloud Align.pdf|introductory slides]] for an intro to this tutorial.
Line 8: Line 11:  
** How to evaluate the quality of sequence data
 
** How to evaluate the quality of sequence data
 
** How to visualize sequence data to examine the reads aligned to particular genomic positions
 
** How to visualize sequence data to examine the reads aligned to particular genomic positions
 +
 +
== Setup in person at the SeqShop Workshop ==
 +
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
       
{{SeqShopLogin}}
 
{{SeqShopLogin}}
   −
== Setup your run environment==
+
=== Setup your run environment===
    
This will setup some environment variables to point you to
 
This will setup some environment variables to point you to
Line 25: Line 34:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
[[File:setup.png|500px]]
 
[[File:setup.png|500px]]
 +
</div>
 +
</div>
 
</div>
 
</div>
 
</div>
 
</div>
 +
 +
== Setup when running on your own outside of the SeqShop Workshop ==
 +
''This section is specifically for running on your own outside of the SeqShop Workshop.''
 +
<div class="mw-collapsible" style="width:600px">
 +
''If you are running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
 +
=== Download & Build GotCloud ===
 +
If you do not already have GotCloud:
 +
* cd to where you want GotCloud installed (you can change this to any directory you want)
 +
mkdir -p ~/seqshop
 +
cd ~/seqshop/
 +
* download, decompress, and build the version of gotcloud that was tested with this tutorial:
 +
wget https://github.com/statgen/gotcloud/archive/gotcloud.workshop.tar.gz
 +
tar xvf gotcloud.workshop.tar.gz
 +
mv gotcloud-gotcloud.workshop gotcloud
 +
cd gotcloud/src
 +
make
 +
cd ../..
 +
 +
Remember the path to gotcloud/ that is what you will need to set your GC variable to.
 +
 +
=== Download the example data ===
 +
Download and untar file containing the example data used in the practicals:
 +
wget http://csg.sph.umich.edu//mktrost/seqshopExample.tar.gz
 +
tar xvf seqshopExample.tar.gz
 +
 +
You will see the names of all the files included in the example data scrolling on the screen as they are unpacked from the tar file.
 +
 +
{{SeqShopRemoteEnv}}
    
== Examining [[GotCloud]] Align Input Files ==
 
== Examining [[GotCloud]] Align Input Files ==
Line 36: Line 76:  
* Subset of FASTQs - should map to chromosome 22 36000000-37000000
 
* Subset of FASTQs - should map to chromosome 22 36000000-37000000
   −
  ls ${IN}/fastq/
+
  ls ${SS}/fastq/
 
There are 24 fastq files: combination of single-end & paired-end.  
 
There are 24 fastq files: combination of single-end & paired-end.  
   Line 55: Line 95:     
Look at a couple of FASTQs:
 
Look at a couple of FASTQs:
  less -S ${IN}/fastq/HG00551.SRR190851_1.fastq
+
  less -S ${SS}/fastq/HG00551.SRR190851_1.fastq
 
<code>less</code> is a Linux command that allows you to look at a file.
 
<code>less</code> is a Linux command that allows you to look at a file.
 
*<code>-S</code> option prevents line wrap
 
*<code>-S</code> option prevents line wrap
Line 75: Line 115:     
Look at the paired read:
 
Look at the paired read:
  less -S ${IN}/fastq/HG00551.SRR190851_2.fastq  
+
  less -S ${SS}/fastq/HG00551.SRR190851_2.fastq  
    
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
Remember, use <code>'q'</code> to exit out of <code>less</code>
Line 108: Line 148:     
Take a look at the chromosome 22 reference files included for this tutorial:
 
Take a look at the chromosome 22 reference files included for this tutorial:
  ls ${REF}
+
  ls ${SS}/ref22
    
<ul>
 
<ul>
Line 114: Line 154:  
<li>View Screenshot</li>
 
<li>View Screenshot</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:RefDir.png|500px]]
+
[[File:RefDir.png|700px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 120: Line 160:     
Let's read the reference FASTA file (all reference bases for the chromosome):
 
Let's read the reference FASTA file (all reference bases for the chromosome):
  less ${REF}/human.g1k.v37.chr22.fa
+
  less ${SS}/ref22/human.g1k.v37.chr22.fa
    
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
  q
 
  q
  −
If you want to access the FASTA file by position, you can use <code>samtools faidx</code> command
  −
$GC/bin/samtools faidx $REF/human.g1k.v37.chr22.fa 22:36000000-36000100
      
; Where is the reference sequence?
 
; Where is the reference sequence?
Line 136: Line 173:  
<li>The ends of a chromosome are 'N' - unknown bases</li>
 
<li>The ends of a chromosome are 'N' - unknown bases</li>
 
<li>Let's look at 5 lines of the file starting at line 300,000</li>
 
<li>Let's look at 5 lines of the file starting at line 300,000</li>
   tail -n+300000 ${REF}/human.g1k.v37.chr22.fa |head -n 5
+
   tail -n+300000 ${SS}/ref22/human.g1k.v37.chr22.fa |head -n 5
 
[[File:Fasta.png|500px]]
 
[[File:Fasta.png|500px]]
 
</div>
 
</div>
Line 142: Line 179:  
</ul>
 
</ul>
 
</ul>
 
</ul>
 +
 +
If you want to access the FASTA file by position, you can use <code>samtools faidx</code> command
 +
${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000 | less
 +
or
 +
${GC}/bin/samtools faidx ${SS}/ref22/human.g1k.v37.chr22.fa 22:36000000-36000100
    
=== GotCloud FASTQ Index File ===
 
=== GotCloud FASTQ Index File ===
Line 159: Line 201:     
Let's look a look at the index file I prepared for this tutorial:
 
Let's look a look at the index file I prepared for this tutorial:
  less -S ${IN}/align.index  
+
  less -S ${SS}/align.index  
    
Remember, use <code>'q'</code> to exit out of <code>less</code>
 
Remember, use <code>'q'</code> to exit out of <code>less</code>
Line 169: Line 211:  
<li>Need a reminder of the format?</li>
 
<li>Need a reminder of the format?</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:fqindex.png|650px]]
+
[[File:fqindex.png|750px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 183: Line 225:  
<ul>
 
<ul>
 
<li>Use cut to extract just the MERGE_NAME & RGID fields </li>
 
<li>Use cut to extract just the MERGE_NAME & RGID fields </li>
  cut -f 1,4 ${IN}/align.index
+
  cut -f 1,4 ${SS}/align.index
 
</ul>
 
</ul>
 
</div>
 
</div>
Line 193: Line 235:  
<li>HG00553 & HG00640</li>
 
<li>HG00553 & HG00640</li>
 
<li>They have multiple unique values in the RGID field</li>
 
<li>They have multiple unique values in the RGID field</li>
[[File:fqindexRG.png|650px]]
+
[[File:fqindexRG.png|800px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 220: Line 262:     
Let's look at the configuration file I created for this test:
 
Let's look at the configuration file I created for this test:
  more ${IN}/gotcloud.conf
+
  more ${SS}/gotcloud.conf
    
Use the <code>space bar</code> to advance if the whole file isn't displayed.
 
Use the <code>space bar</code> to advance if the whole file isn't displayed.
   −
; If your input and references are at different paths than specified, what would you change?
+
; If your references are in a different path than what is specified, what would you change?
 
<ul>
 
<ul>
<div class="mw-collapsible mw-collapsed" style="width:200px">
+
<div class="mw-collapsible mw-collapsed" style="width:300px">
 
<li>Answer:</li>
 
<li>Answer:</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
<ul>
 
<ul>
<li>You would change <code>IN_DIR</code> & <code>REF_DIR</code> to the new paths</li>
+
<li>You would change <code>REF_DIR</code> to the new path</li>
[[File:gcConf.png|500px]]
+
[[File:gcConf.png|800px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 242: Line 284:     
Now that we have all of our input files, we need just a simple command to run them
 
Now that we have all of our input files, we need just a simple command to run them
  ${GC}/gotcloud align --conf ${IN}/gotcloud.conf --numcs 2
+
  ${GC}/gotcloud align --conf ${SS}/gotcloud.conf --numcs 2 --base_prefix ${SS} --outdir ${OUT}
 +
 
 +
* <code>${GC}/gotcloud</code> runs GotCloud
 +
* <code>align</code> tells GotCloud you want to run the alignment pipeline.
 +
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
 +
** The configuration for this test was downloaded with the seqshop input files.
 
* <code>--numcs</code> means to run 2 samples at a time.
 
* <code>--numcs</code> means to run 2 samples at a time.
** How many you can run concurrently depends on your system
+
** How many you can run concurrently depends on your system.
 +
* <code>--base_prefix</code> tells GotCloud the prefix to append to relative paths.
 +
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
 +
** Alternatively, gotcloud.conf could be updated to specify the full paths
 +
* <code>--out_dir</code> tells GotCloud where to write the output.
 +
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 +
 
 
[[File:gcalignStart.png|850px]]
 
[[File:gcalignStart.png|850px]]
   Line 331: Line 384:  
[[File:Qplotpdf.png|400px]]
 
[[File:Qplotpdf.png|400px]]
 
<li> Look at the PDF I produced when I ran the whole genome:</li>  
 
<li> Look at the PDF I produced when I ran the whole genome:</li>  
  evince ${IN}/example/HG00551.wg.qplot.pdf&
+
  evince ${SS}/ext/HG00551.wg.qplot.pdf&
 
</ul>
 
</ul>
 
[[File:Qplotpdfwg.png|400px]]
 
[[File:Qplotpdfwg.png|400px]]
Line 380: Line 433:     
Let's visualize what reads in that area look like using samtools tview:
 
Let's visualize what reads in that area look like using samtools tview:
  ${GC}/bin/samtools tview ${OUT}/bams/HG00551.recal.bam ${REF}/human.g1k.v37.chr22.fa
+
  ${GC}/bin/samtools tview ${OUT}/bams/HG00551.recal.bam ${SS}/ref22/human.g1k.v37.chr22.fa
 
* Type ‘g’  
 
* Type ‘g’  
 
** Type 22:36907000  
 
** Type 22:36907000  
Line 424: Line 477:     
== Logging Off ==
 
== Logging Off ==
 +
 +
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
 
To logout of seqshop-server, type:
 
To logout of seqshop-server, type:
 
  exit
 
  exit
Line 429: Line 487:     
When done, log out of the Windows machine.
 
When done, log out of the Windows machine.
 
+
</div>
 
+
</div>
== Please provide feedback from Day 1 ==
  −
https://docs.google.com/a/umich.edu/forms/d/1afGq98QToC17CO8k6tXXyybIZFFfNsy8mOX5Tzc2QyM/viewform
 
96

edits

Navigation menu