Line 1: |
Line 1: |
| + | '''Note:''' the latest version of this practical is available at: [[SeqShop: Estimates of Genetic Ancestry Practical]] |
| + | * The ones here is the original one from the June workshop (updated to be run from elsewhere) |
| + | |
| + | |
| == Introduction == | | == Introduction == |
| See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software. | | See the [[Media:LASER-tutorial.pdf|tutorial slides]] for an introduction of the LASER analysis workflow, input/output file formats, and usage of the LASER software. |
Line 4: |
Line 8: |
| The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide. | | The main purpose of this page is to provide step-by-step command lines for using LASER to estimate ancestry of 6 targeted sequenced samples (2 HapMap trios) in a principal component space generated using genome-wide SNP data from the Human Genome Diversity Project (HGDP). The HGDP reference panel contains genotype data across 632,958 autosomal loci for 938 individuals from 53 populations worldwide. |
| | | |
− | For more details about the options and usage of LASER, please read the [http://www.sph.umich.edu/csg/chaolong/LASER/LASER_Manual.pdf manual]. | + | For more details about the options and usage of LASER, please read the [http://csg.sph.umich.edu//chaolong/LASER/LASER_Manual.pdf manual]. |
| | | |
| == LASER workflow == | | == LASER workflow == |
Line 50: |
Line 54: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| | | |
− | This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]] | + | This tutorial uses samtools from GotCloud, as well as example data downloaded in the Sequence Mapping & Assembly tutorial, so if you have not already installed GotCloud and the tutorial data in a previous tutorial, please do so now: [[SeqShop:_Sequence_Mapping_and_Assembly_Practical, June 2014#Setup_when_running_on_your_own_outside_of_the_SeqShop_Workshop|Tutorial Setup]] |
| | | |
| | | |
| {{SeqShopRemoteEnv}} | | {{SeqShopRemoteEnv}} |
− | </div>
| |
− | </div>
| |
− |
| |
− | === Setup your run environment ===
| |
− |
| |
− | Environment variables will be used throughout the tutorial.
| |
− |
| |
− | We recommend that you setup these variables so you won't have to modify every command in the tutorial.
| |
| | | |
| + | <ul> |
| + | <li> Additional variables for Ancestry:</li> |
| + | <ul> |
| <div class="mw-collapsible" style="width:500px"> | | <div class="mw-collapsible" style="width:500px"> |
− | I'm using bash (replace the paths below with the appropriate paths):
| + | <li>Using bash (replace the paths below with the appropriate paths):</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | * Point to where you installed GotCloud
| + | :<pre>export REF=$SS/ancestry/ref export HGDP=$SS/ancestry/HGDP export BAM=$SS/ancestry/bams</pre> |
− | *:<pre>export GC=/home/username/gotcloud</pre>
| |
− | * Point to where you installed the seqshop files
| |
− | *:<pre>export SS=/home/username/seqshop/</pre>
| |
− | * Point to where you want the output to go
| |
− | *:<pre>export OUT=/home/username/seqshop_output/</pre>
| |
− | * Additional variables for Ancestry:
| |
− | *:<pre>export REF=$SS/ancestry/ref export HGDP=$SS/ancestry/HGDP export BAM=$SS/ancestry/bams</pre>
| |
| </div> | | </div> |
| </div> | | </div> |
− |
| |
| <div class="mw-collapsible mw-collapsed" style="width:500px"> | | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
− | I'm using tcsh (replace the paths below with the appropriate paths):
| + | <li>Using tcsh (replace the paths below with the appropriate paths):</li> |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | * Point to where you installed GotCloud
| + | :<pre>setenv REF $SS/ancestry/ref setenv HGDP $SS/ancestry/HGDP setenv BAM $SS/ancestry/bams</pre> |
− | *:<pre>setenv GC /home/username/gotcloud</pre>
| |
− | * Point to where you installed the seqshop files
| |
− | *:<pre>setenv SS /home/username/seqshop/</pre>
| |
− | * Point to where you want the output to go
| |
− | *:<pre>setenv OUT /home/username/seqshop_output/</pre>
| |
− | * Additional variables for Ancestry:
| |
− | *:<pre>setenv REF $SS/ancestry/ref setenv HGDP $SS/ancestry/HGDP setenv BAM $SS/ancestry/bams</pre>
| |
| </div> | | </div> |
| </div> | | </div> |
− | | + | </ul> |
| + | </ul> |
| </div> | | </div> |
| </div> | | </div> |
− |
| |
| | | |
| == Getting started == | | == Getting started == |
Line 103: |
Line 87: |
| Download and decompress software package: | | Download and decompress software package: |
| | | |
− | wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz | + | wget http://csg.sph.umich.edu//chaolong/LASER/LASER-2.01.tar.gz |
| tar xzvf LASER-2.01.tar.gz | | tar xzvf LASER-2.01.tar.gz |
− |
| |
| | | |
| == Preparing input files for LASER == | | == Preparing input files for LASER == |
Line 127: |
Line 110: |
| </div> | | </div> |
| | | |
− | <div class="mw-collapsible mw-collapsed" style="width:500px"> | + | <div class="mw-collapsible" style="width:500px"> |
| Outside of the workshop notes: | | Outside of the workshop notes: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| *The BAMs provided as part of the download are chr22 only BAMs. They are used to demonstrate how to run this step. | | *The BAMs provided as part of the download are chr22 only BAMs. They are used to demonstrate how to run this step. |
| *Pileup files for the whole genome BAMs are provided with the download and will be used in the next step. | | *Pileup files for the whole genome BAMs are provided with the download and will be used in the next step. |
| + | * You only need to try one of these. |
| </div> | | </div> |
| </div> | | </div> |
| | | |
− | $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup & | + | $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup & | + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup & | + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup & | + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup & | + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup & | + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup |
| | | |
| We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20. | | We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20. |
Line 160: |
Line 144: |
| $BAM/121101052.recal.pileup \ | | $BAM/121101052.recal.pileup \ |
| $BAM/121101415.recal.pileup \ | | $BAM/121101415.recal.pileup \ |
− | $BAM/121101861.recal.pileup & | + | $BAM/121101861.recal.pileup |
| | | |
| In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID). | | In the above command, -b provides the targeted regions to exclude and -i specifies alternative IDs for the BAM files to be used in the .seq file (including popID and indivID). |
Line 201: |
Line 185: |
| less -S hapmap_trios.SeqPC.coord | | less -S hapmap_trios.SeqPC.coord |
| | | |
− | The results should look like below: | + | The results should look like below (results will vary slightly): |
| | | |
| popID indivID L1 Ci K t PC1 PC2 PC3 PC4 | | popID indivID L1 Ci K t PC1 PC2 PC3 PC4 |