Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 9: Line 9:  
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
 
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
   −
{{SeqShopLogin}}
     −
== Getting started ==
+
== Setup in person at the SeqShop Workshop ==
Create a working directory:
+
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
   −
mkdir ancestry
  −
cd ancestry
     −
Download and decompress software package:
+
{{SeqShopLogin}}
   −
wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
+
=== Setup your run environment===
tar xzvf LASER-2.01.tar.gz
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
   −
Set up to access data:
+
This will setup some environment variables to point you to
 +
* Tutorial input files
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 +
* You won't see any output after running <code>source</code>
 +
** It silently sets up your environment
    +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
What is in the setup.txt file:
 
What is in the setup.txt file:
 +
<div class="mw-collapsible-content">
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 +
</div>
 +
</div>
 +
 +
Point to where you want the output to go replacing the path with where you would like your output to go
 +
<pre>export OUT=~/seqshop_output/</pre>
 +
</div>
 +
</div>
 +
    
== Setup when running on your own outside of the SeqShop Workshop ==
 
== Setup when running on your own outside of the SeqShop Workshop ==
Line 37: Line 50:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
=== Download the example data ===
 
=== Download the example data ===
 +
    
=== Setup your run environment ===
 
=== Setup your run environment ===
Line 53: Line 67:  
* Point to where you want the output to go
 
* Point to where you want the output to go
 
*:<pre>export OUT=/home/username/seqshop_output/</pre>
 
*:<pre>export OUT=/home/username/seqshop_output/</pre>
 +
* Additional variables for Ancestry:
 +
*:<pre>export REF=$SS/ancestry/ref&#10;export HGDP=$SS/ancestry/HGDP&#10;export BAM=$SS/ancestry/bams</pre>
 
</div>
 
</div>
 
</div>
 
</div>
Line 65: Line 81:  
* Point to where you want the output to go
 
* Point to where you want the output to go
 
*:<pre>setenv OUT /home/username/seqshop_output/</pre>
 
*:<pre>setenv OUT /home/username/seqshop_output/</pre>
 +
* Additional variables for Ancestry:
 +
*:<pre>setenv REF $SS/ancestry/ref&#10;setenv HGDP $SS/ancestry/HGDP&#10;setenv BAM $SS/ancestry/bams</pre>
 
</div>
 
</div>
 
</div>
 
</div>
Line 70: Line 88:  
</div>
 
</div>
 
</div>
 
</div>
 +
 +
 +
== Getting started ==
 +
Create a working directory:
 +
 +
mkdir $OUT/ancestry
 +
cd $OUT/ancestry
 +
 +
Download and decompress software package:
 +
 +
wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
 +
tar xzvf LASER-2.01.tar.gz
 +
    
== Preparing input files for LASER ==
 
== Preparing input files for LASER ==
Line 82: Line 113:     
This step uses samtools to generate pileup files from bam files.  
 
This step uses samtools to generate pileup files from bam files.  
Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  −
It takes about 2 mins for each pileup job.
     −
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > $OUT/121101035.recal.pileup &
+
<div class="mw-collapsible mw-collapsed" style="width:500px">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > $OUT/121101043.recal.pileup &  
+
In person at workshop notes:
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > $OUT/121101050.recal.pileup &
+
<div class="mw-collapsible-content">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > $OUT/121101052.recal.pileup &
+
*Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > $OUT/121101415.recal.pileup &
+
*It takes about 2 mins for each pileup job.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > $OUT/121101861.recal.pileup &
+
</div>
 +
</div>
 +
 
 +
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
Outside of the workshop notes:
 +
<div class="mw-collapsible-content">
 +
*The BAMs provided as part of the download are chr22 only BAMs.  They are used to demonstrate how to run this step.
 +
*Pileup files for the whole genome BAMs are provided with the download and will be used in the next step.
 +
</div>
 +
</div>
 +
 
 +
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup &
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup &  
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup &
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup &
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup &
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
    
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
 
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
Line 98: Line 143:  
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
We will use the pre-generated pileup files in the $BAM folder.
 
We will use the pre-generated pileup files in the $BAM folder.
 +
* These pre-generated pileup files are for the whole genome of all 6 samples
    
  python ./LASER-2.01/pileup2seq/pileup2seq.py \
 
  python ./LASER-2.01/pileup2seq/pileup2seq.py \

Navigation menu