Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Estimates of Genetic Ancestry Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== Introduction ==
 
== Introduction ==
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
Line 4: Line 8:  
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
 
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
   −
For more details about the options and usage of LASER, please read the [http://www.sph.umich.edu/csg/chaolong/LASER/LASER_Manual.pdf manual].
+
For more details about the options and usage of LASER, please read the [http://csg.sph.umich.edu//chaolong/LASER/LASER_Manual.pdf manual].
    
== LASER workflow ==
 
== LASER workflow ==
 
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
 
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
   −
{{SeqShopLogin}}
     −
== Getting started ==
+
== Setup in person at the SeqShop Workshop ==
Create a working directory:
+
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
   −
mkdir ancestry
  −
cd ancestry
     −
Download and decompress software package:
+
{{SeqShopLogin}}
   −
wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
+
=== Setup your run environment===
tar xzvf LASER-2.01.tar.gz
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
   −
Set up to access data:
+
This will setup some environment variables to point you to
 +
* Tutorial input files
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 +
* You won't see any output after running <code>source</code>
 +
** It silently sets up your environment
    +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
What is in the setup.txt file:
 
What is in the setup.txt file:
 +
<div class="mw-collapsible-content">
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 +
</div>
 +
</div>
 +
 +
Point to where you want the output to go replacing the path with where you would like your output to go
 +
<pre>export OUT=~/seqshop_output/</pre>
 +
</div>
 +
</div>
 +
    
== Setup when running on your own outside of the SeqShop Workshop ==
 
== Setup when running on your own outside of the SeqShop Workshop ==
Line 36: Line 53:  
''If you are running during the SeqShop Workshop, please skip this section.''
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Download the example data ===
     −
=== Setup your run environment ===
+
This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]]
   −
Environment variables will be used throughout the tutorial.
     −
We recommend that you setup these variables so you won't have to modify every command in the tutorial.
+
{{SeqShopRemoteEnv}}
    +
<ul>
 +
<li> Additional variables for Ancestry:</li>
 +
<ul>
 
<div class="mw-collapsible" style="width:500px">
 
<div class="mw-collapsible" style="width:500px">
I'm using bash (replace the paths below with the appropriate paths):
+
<li>Using bash (replace the paths below with the appropriate paths):</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
* Point to where you installed GotCloud
+
:<pre>export REF=$SS/ancestry/ref&#10;export HGDP=$SS/ancestry/HGDP&#10;export BAM=$SS/ancestry/bams</pre>
*:<pre>export GC=/home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>export SS=/home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>export OUT=/home/username/seqshop_output/</pre>
   
</div>
 
</div>
 
</div>
 
</div>
   
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
I'm using tcsh (replace the paths below with the appropriate paths):
+
<li>Using tcsh (replace the paths below with the appropriate paths):</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
* Point to where you installed GotCloud
+
:<pre>setenv REF $SS/ancestry/ref&#10;setenv HGDP $SS/ancestry/HGDP&#10;setenv BAM $SS/ancestry/bams</pre>
*:<pre>setenv GC /home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>setenv SS /home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>setenv OUT /home/username/seqshop_output/</pre>
   
</div>
 
</div>
 
</div>
 
</div>
 
+
</ul>
 +
</ul>
 
</div>
 
</div>
 
</div>
 
</div>
 +
 +
== Getting started ==
 +
Create a working directory:
 +
 +
mkdir $OUT/ancestry
 +
cd $OUT/ancestry
 +
 +
Download and decompress software package:
 +
 +
wget http://csg.sph.umich.edu//chaolong/LASER/LASER-2.01.tar.gz
 +
tar xzvf LASER-2.01.tar.gz
    
== Preparing input files for LASER ==
 
== Preparing input files for LASER ==
Line 82: Line 101:     
This step uses samtools to generate pileup files from bam files.  
 
This step uses samtools to generate pileup files from bam files.  
Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  −
It takes about 2 mins for each pileup job.
     −
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup &
+
<div class="mw-collapsible mw-collapsed" style="width:500px">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup &
+
In person at workshop notes:
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup &
+
<div class="mw-collapsible-content">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup &
+
*Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup &
+
*It takes about 2 mins for each pileup job.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
+
</div>
 +
</div>
 +
 
 +
<div class="mw-collapsible" style="width:500px">
 +
Outside of the workshop notes:
 +
<div class="mw-collapsible-content">
 +
*The BAMs provided as part of the download are chr22 only BAMs.  They are used to demonstrate how to run this step.
 +
*Pileup files for the whole genome BAMs are provided with the download and will be used in the next step.
 +
* You only need to try one of these.
 +
</div>
 +
</div>
 +
 
 +
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup  
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup
    
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
 
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
Line 98: Line 132:  
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
We will use the pre-generated pileup files in the $BAM folder.
 
We will use the pre-generated pileup files in the $BAM folder.
 +
* These pre-generated pileup files are for the whole genome of all 6 samples
    
  python ./LASER-2.01/pileup2seq/pileup2seq.py \
 
  python ./LASER-2.01/pileup2seq/pileup2seq.py \
Line 109: Line 144:  
  $BAM/121101052.recal.pileup \
 
  $BAM/121101052.recal.pileup \
 
  $BAM/121101415.recal.pileup \
 
  $BAM/121101415.recal.pileup \
  $BAM/121101861.recal.pileup &
+
  $BAM/121101861.recal.pileup
    
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
 
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
Line 150: Line 185:  
  less -S hapmap_trios.SeqPC.coord
 
  less -S hapmap_trios.SeqPC.coord
   −
The results should look like below:
+
The results should look like below (results will vary slightly):
    
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
 
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
96

edits

Navigation menu