Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Estimates of Genetic Ancestry Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== Introduction ==
 
== Introduction ==
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
Line 4: Line 8:  
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
 
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
   −
For more details about the options and usage of LASER, please read the [http://www.sph.umich.edu/csg/chaolong/LASER/LASER_Manual.pdf manual].
+
For more details about the options and usage of LASER, please read the [http://csg.sph.umich.edu//chaolong/LASER/LASER_Manual.pdf manual].
    
== LASER workflow ==
 
== LASER workflow ==
 
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
 
[[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]]
   −
== Getting started ==
  −
Create a working directory:
     −
mkdir ancestry
+
== Setup in person at the SeqShop Workshop ==
cd ancestry
+
''This section is specifically for the SeqShop Workshop computers.''
 +
<div class="mw-collapsible mw-collapsed" style="width:600px">
 +
''If you are not running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
 +
 
   −
Download and decompress software package:
+
{{SeqShopLogin}}
   −
wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
+
=== Setup your run environment===
tar xzvf LASER-2.01.tar.gz
+
This is the same setup you did for the previous tutorial, but you need to redo it each time you log in.
   −
Set up to access data:
+
This will setup some environment variables to point you to
 +
* Tutorial input files
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 
  source /home/chaolong/LASER-Tutorial/setup.txt
 +
* You won't see any output after running <code>source</code>
 +
** It silently sets up your environment
    +
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
What is in the setup.txt file:
 
What is in the setup.txt file:
 +
<div class="mw-collapsible-content">
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export GC=/home/mktrost/seqshop/gotcloud
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export REF=/home/mktrost/seqshop/reference/all
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export HGDP=/home/chaolong/LASER-Tutorial/HGDP
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 
  export BAM=/home/chaolong/LASER-Tutorial/BAM
 +
</div>
 +
</div>
 +
 +
Point to where you want the output to go replacing the path with where you would like your output to go
 +
<pre>export OUT=~/seqshop_output/</pre>
 +
</div>
 +
</div>
 +
 +
 +
== Setup when running on your own outside of the SeqShop Workshop ==
 +
''This section is specifically for running on your own outside of the SeqShop Workshop.''
 +
<div class="mw-collapsible" style="width:600px">
 +
''If you are running during the SeqShop Workshop, please skip this section.''
 +
<div class="mw-collapsible-content">
 +
 +
This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]]
 +
 +
 +
{{SeqShopRemoteEnv}}
 +
 +
<ul>
 +
<li> Additional variables for Ancestry:</li>
 +
<ul>
 +
<div class="mw-collapsible" style="width:500px">
 +
<li>Using bash (replace the paths below with the appropriate paths):</li>
 +
<div class="mw-collapsible-content">
 +
:<pre>export REF=$SS/ancestry/ref&#10;export HGDP=$SS/ancestry/HGDP&#10;export BAM=$SS/ancestry/bams</pre>
 +
</div>
 +
</div>
 +
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
<li>Using tcsh (replace the paths below with the appropriate paths):</li>
 +
<div class="mw-collapsible-content">
 +
:<pre>setenv REF $SS/ancestry/ref&#10;setenv HGDP $SS/ancestry/HGDP&#10;setenv BAM $SS/ancestry/bams</pre>
 +
</div>
 +
</div>
 +
</ul>
 +
</ul>
 +
</div>
 +
</div>
 +
 +
== Getting started ==
 +
Create a working directory:
 +
 +
mkdir $OUT/ancestry
 +
cd $OUT/ancestry
 +
 +
Download and decompress software package:
 +
 +
wget http://csg.sph.umich.edu//chaolong/LASER/LASER-2.01.tar.gz
 +
tar xzvf LASER-2.01.tar.gz
    
== Preparing input files for LASER ==
 
== Preparing input files for LASER ==
Line 40: Line 101:     
This step uses samtools to generate pileup files from bam files.  
 
This step uses samtools to generate pileup files from bam files.  
Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  −
It takes about 2 mins for each pileup job.
     −
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup &
+
<div class="mw-collapsible mw-collapsed" style="width:500px">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup &
+
In person at workshop notes:
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup &
+
<div class="mw-collapsible-content">
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup &
+
*Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup &
+
*It takes about 2 mins for each pileup job.
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
+
</div>
 +
</div>
 +
 
 +
<div class="mw-collapsible" style="width:500px">
 +
Outside of the workshop notes:
 +
<div class="mw-collapsible-content">
 +
*The BAMs provided as part of the download are chr22 only BAMs.  They are used to demonstrate how to run this step.
 +
*Pileup files for the whole genome BAMs are provided with the download and will be used in the next step.
 +
* You only need to try one of these.
 +
</div>
 +
</div>
 +
 
 +
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup  
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup
 +
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup
 +
 
 +
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
    
=== Step 2: pileup --> seq ===
 
=== Step 2: pileup --> seq ===
Line 54: Line 132:  
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
We will use the pre-generated pileup files in the $BAM folder.
 
We will use the pre-generated pileup files in the $BAM folder.
 +
* These pre-generated pileup files are for the whole genome of all 6 samples
    
  python ./LASER-2.01/pileup2seq/pileup2seq.py \
 
  python ./LASER-2.01/pileup2seq/pileup2seq.py \
Line 65: Line 144:  
  $BAM/121101052.recal.pileup \
 
  $BAM/121101052.recal.pileup \
 
  $BAM/121101415.recal.pileup \
 
  $BAM/121101415.recal.pileup \
  $BAM/121101861.recal.pileup &
+
  $BAM/121101861.recal.pileup
    
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
 
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
Line 106: Line 185:  
  less -S hapmap_trios.SeqPC.coord
 
  less -S hapmap_trios.SeqPC.coord
   −
The results should look like below:
+
The results should look like below (results will vary slightly):
    
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
 
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
Line 131: Line 210:  
  evince Results_on_HGDP.pdf &
 
  evince Results_on_HGDP.pdf &
   −
We expect to see the following figure, in which 3 HapMap CEU samples cluster with HGDP Europeans and 3 HapMap YRI samples cluster with HGDP Africans:  
+
We expect to see the following figure, in which 3 CEU samples cluster with HGDP Europeans and 3 YRI samples cluster with HGDP Africans:  
 
[[File:Results_on_HGDP.png|thumb|center|alt=LASER results|400px|LASER results]]
 
[[File:Results_on_HGDP.png|thumb|center|alt=LASER results|400px|LASER results]]
96

edits

Navigation menu