Line 9: |
Line 9: |
| [[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]] | | [[File:LASER-workflow.png|thumb|center|alt=LASER workflow|400px|LASER workflow]] |
| | | |
− | {{SeqShopLogin}}
| |
| | | |
− | == Getting started == | + | == Setup in person at the SeqShop Workshop == |
− | Create a working directory:
| + | ''This section is specifically for the SeqShop Workshop computers.'' |
| + | <div class="mw-collapsible mw-collapsed" style="width:600px"> |
| + | ''If you are not running during the SeqShop Workshop, please skip this section.'' |
| + | <div class="mw-collapsible-content"> |
| | | |
− | mkdir ancestry
| |
− | cd ancestry
| |
| | | |
− | Download and decompress software package:
| + | {{SeqShopLogin}} |
| | | |
− | wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz
| + | === Setup your run environment=== |
− | tar xzvf LASER-2.01.tar.gz
| + | This is the same setup you did for the previous tutorial, but you need to redo it each time you log in. |
| | | |
− | Set up to access data:
| + | This will setup some environment variables to point you to |
| + | * Tutorial input files |
| source /home/chaolong/LASER-Tutorial/setup.txt | | source /home/chaolong/LASER-Tutorial/setup.txt |
| + | * You won't see any output after running <code>source</code> |
| + | ** It silently sets up your environment |
| | | |
| + | <div class="mw-collapsible mw-collapsed" style="width:400px"> |
| What is in the setup.txt file: | | What is in the setup.txt file: |
| + | <div class="mw-collapsible-content"> |
| export GC=/home/mktrost/seqshop/gotcloud | | export GC=/home/mktrost/seqshop/gotcloud |
| export REF=/home/mktrost/seqshop/reference/all | | export REF=/home/mktrost/seqshop/reference/all |
| export HGDP=/home/chaolong/LASER-Tutorial/HGDP | | export HGDP=/home/chaolong/LASER-Tutorial/HGDP |
| export BAM=/home/chaolong/LASER-Tutorial/BAM | | export BAM=/home/chaolong/LASER-Tutorial/BAM |
| + | </div> |
| + | </div> |
| + | |
| + | Point to where you want the output to go replacing the path with where you would like your output to go |
| + | <pre>export OUT=~/seqshop_output/</pre> |
| + | </div> |
| + | </div> |
| + | |
| | | |
| == Setup when running on your own outside of the SeqShop Workshop == | | == Setup when running on your own outside of the SeqShop Workshop == |
Line 37: |
Line 50: |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| === Download the example data === | | === Download the example data === |
| + | |
| | | |
| === Setup your run environment === | | === Setup your run environment === |
Line 53: |
Line 67: |
| * Point to where you want the output to go | | * Point to where you want the output to go |
| *:<pre>export OUT=/home/username/seqshop_output/</pre> | | *:<pre>export OUT=/home/username/seqshop_output/</pre> |
| + | * Additional variables for Ancestry: |
| + | *:<pre>export REF=$SS/ancestry/ref export HGDP=$SS/ancestry/HGDP export BAM=$SS/ancestry/bams</pre> |
| </div> | | </div> |
| </div> | | </div> |
Line 65: |
Line 81: |
| * Point to where you want the output to go | | * Point to where you want the output to go |
| *:<pre>setenv OUT /home/username/seqshop_output/</pre> | | *:<pre>setenv OUT /home/username/seqshop_output/</pre> |
| + | * Additional variables for Ancestry: |
| + | *:<pre>setenv REF $SS/ancestry/ref setenv HGDP $SS/ancestry/HGDP setenv BAM $SS/ancestry/bams</pre> |
| </div> | | </div> |
| </div> | | </div> |
Line 70: |
Line 88: |
| </div> | | </div> |
| </div> | | </div> |
| + | |
| + | |
| + | == Getting started == |
| + | Create a working directory: |
| + | |
| + | mkdir $OUT/ancestry |
| + | cd $OUT/ancestry |
| + | |
| + | Download and decompress software package: |
| + | |
| + | wget http://www.sph.umich.edu/csg/chaolong/LASER/LASER-2.01.tar.gz |
| + | tar xzvf LASER-2.01.tar.gz |
| + | |
| | | |
| == Preparing input files for LASER == | | == Preparing input files for LASER == |
Line 82: |
Line 113: |
| | | |
| This step uses samtools to generate pileup files from bam files. | | This step uses samtools to generate pileup files from bam files. |
− | Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps.
| |
− | It takes about 2 mins for each pileup job.
| |
| | | |
− | $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > $OUT/121101035.recal.pileup & | + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > $OUT/121101043.recal.pileup & | + | In person at workshop notes: |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > $OUT/121101050.recal.pileup & | + | <div class="mw-collapsible-content"> |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > $OUT/121101052.recal.pileup & | + | *Please only try one sample so that we won't overload the sever with everyone running 6 jobs at the same time. Pileup files for these 6 samples have been prepared for later steps. |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > $OUT/121101415.recal.pileup & | + | *It takes about 2 mins for each pileup job. |
− | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > $OUT/121101861.recal.pileup & | + | </div> |
| + | </div> |
| + | |
| + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| + | Outside of the workshop notes: |
| + | <div class="mw-collapsible-content"> |
| + | *The BAMs provided as part of the download are chr22 only BAMs. They are used to demonstrate how to run this step. |
| + | *Pileup files for the whole genome BAMs are provided with the download and will be used in the next step. |
| + | </div> |
| + | </div> |
| + | |
| + | $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101035.recal.bam > 121101035.recal.pileup & |
| + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101043.recal.bam > 121101043.recal.pileup & |
| + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101050.recal.bam > 121101050.recal.pileup & |
| + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101052.recal.bam > 121101052.recal.pileup & |
| + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101415.recal.bam > 121101415.recal.pileup & |
| + | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/human.g1k.v37.fa -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup & |
| | | |
| We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20. | | We use -q 30 and -Q 20 to exclude reads that have mapping quality score lower than 30 or base quality score lower than 20. |
Line 98: |
Line 143: |
| In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run. | | In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run. |
| We will use the pre-generated pileup files in the $BAM folder. | | We will use the pre-generated pileup files in the $BAM folder. |
| + | * These pre-generated pileup files are for the whole genome of all 6 samples |
| | | |
| python ./LASER-2.01/pileup2seq/pileup2seq.py \ | | python ./LASER-2.01/pileup2seq/pileup2seq.py \ |