Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  +
'''Note:''' the latest version of this practical is available at: [[SeqShop: Estimates of Genetic Ancestry Practical]]
 +
* The ones here is the original one from the June workshop (updated to be run from elsewhere)
 +
 +
 
== Introduction ==
 
== Introduction ==
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
 
See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software.
Line 4: Line 8:  
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
 
The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide.
   −
For more details about the options and usage of LASER, please read the [http://www.sph.umich.edu/csg/chaolong/LASER/LASER_Manual.pdf manual].
+
For more details about the options and usage of LASER, please read the [http://csg.sph.umich.edu//chaolong/LASER/LASER_Manual.pdf manual].
    
== LASER workflow ==
 
== LASER workflow ==
Line 50: Line 54:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   −
This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]]
+
This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]]
       
{{SeqShopRemoteEnv}}
 
{{SeqShopRemoteEnv}}
</div>
  −
</div>
  −
  −
=== Setup your run environment ===
  −
  −
Environment variables will be used throughout the tutorial.
  −
  −
We recommend that you setup these variables so you won't have to modify every command in the tutorial.
      +
<ul>
 +
<li> Additional variables for Ancestry:</li>
 +
<ul>
 
<div class="mw-collapsible" style="width:500px">
 
<div class="mw-collapsible" style="width:500px">
I'm using bash (replace the paths below with the appropriate paths):
+
<li>Using bash (replace the paths below with the appropriate paths):</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
* Point to where you installed GotCloud
+
:<pre>export REF=$SS/ancestry/ref&#10;export HGDP=$SS/ancestry/HGDP&#10;export BAM=$SS/ancestry/bams</pre>
*:<pre>export GC=/home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>export SS=/home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>export OUT=/home/username/seqshop_output/</pre>
  −
* Additional variables for Ancestry:
  −
*:<pre>export REF=$SS/ancestry/ref&#10;export HGDP=$SS/ancestry/HGDP&#10;export BAM=$SS/ancestry/bams</pre>
   
</div>
 
</div>
 
</div>
 
</div>
   
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
I'm using tcsh (replace the paths below with the appropriate paths):
+
<li>Using tcsh (replace the paths below with the appropriate paths):</li>
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
* Point to where you installed GotCloud
+
:<pre>setenv REF $SS/ancestry/ref&#10;setenv HGDP $SS/ancestry/HGDP&#10;setenv BAM $SS/ancestry/bams</pre>
*:<pre>setenv GC /home/username/gotcloud</pre>
  −
* Point to where you installed the seqshop files
  −
*:<pre>setenv SS /home/username/seqshop/</pre>
  −
* Point to where you want the output to go
  −
*:<pre>setenv OUT /home/username/seqshop_output/</pre>
  −
* Additional variables for Ancestry:
  −
*:<pre>setenv REF $SS/ancestry/ref&#10;setenv HGDP $SS/ancestry/HGDP&#10;setenv BAM $SS/ancestry/bams</pre>
   
</div>
 
</div>
 
</div>
 
</div>
 
+
</ul>
 +
</ul>
 
</div>
 
</div>
 
</div>
 
</div>
      
== Getting started ==
 
== Getting started ==
Line 103: Line 87:  
Download and decompress software package:
 
Download and decompress software package:
   −
  wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
+
  wget http://csg.sph.umich.edu//chaolong/LASER/LASER-2.01.tar.gz
 
  tar xzvf LASER-2.01.tar.gz
 
  tar xzvf LASER-2.01.tar.gz
      
== Preparing input files for LASER ==
 
== Preparing input files for LASER ==
Line 127: Line 110:  
</div>
 
</div>
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible" style="width:500px">
 
Outside of the workshop notes:
 
Outside of the workshop notes:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
*The BAMs provided as part of the download are chr22 only BAMs.  They are used to demonstrate how to run this step.
 
*The BAMs provided as part of the download are chr22 only BAMs.  They are used to demonstrate how to run this step.
 
*Pileup files for the whole genome BAMs are provided with the download and will be used in the next step.
 
*Pileup files for the whole genome BAMs are provided with the download and will be used in the next step.
 +
* You only need to try one of these.
 
</div>
 
</div>
 
</div>
 
</div>
   −
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup &
+
   $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup &
+
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup  
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup &
+
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup &
+
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup &
+
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
+
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup
    
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
 
We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20.
Line 160: Line 144:  
  $BAM/121101052.recal.pileup \
 
  $BAM/121101052.recal.pileup \
 
  $BAM/121101415.recal.pileup \
 
  $BAM/121101415.recal.pileup \
  $BAM/121101861.recal.pileup &
+
  $BAM/121101861.recal.pileup
    
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
 
In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID).  
Line 201: Line 185:  
  less -S hapmap_trios.SeqPC.coord
 
  less -S hapmap_trios.SeqPC.coord
   −
The results should look like below:
+
The results should look like below (results will vary slightly):
    
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
 
  popID  indivID  L1      Ci        K    t          PC1        PC2        PC3        PC4
96

edits

Navigation menu