Changes

From Genome Analysis Wiki
Jump to: navigation, search

SeqShop: Ancestry On Your Own Genome, December 2014

2,920 bytes added, 12:43, 12 December 2014
Visualizing Ancestry
__TOC__
 
== Setup ==
<div class="mw-collapsible" style="width:600px">
''If you are already logged in, you can skip this section.''
<div class="mw-collapsible-content">
{{SeqShopLogin}}
</div>
</div>
 
== Setup your terminal ==
 
Resume screen:
screen -r
 
<div class="mw-collapsible mw-collapsed" style= Setup =="width:700px">''If you were sequenced'Only if your snpcalling is still running, set let Mary Kate know, expand, and follow these values:instructions'''<div class="mw-collapsible-content">* Detach from screen Ctrl-a d* Start new screen screen* Reset environment variables (replace SampleXX with your sample name) export SAMPLE=Sample*SampleXX
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
* When you detach from screen and reattach, you will have multiple screen sessions
** The ancestry one will have today's date, the snpcall one will have an earlier date.
*** Mary Kate can explain how to reattach to a specific session when we get there.
</div>
</div>
 
 
'''Instructions for everyone: Setup LASER'''
''If you were sequenced, set these valuesSource the LASER setup:'' export SAMPLE=NA12878 source /net/seqshop-server/home/mktrostchaolong/seqshopLASER-Tutorial/setupSSsetup.txt(SAMPLE & OUT were previously set in your screen session.)
After setting this, also do
=== Step 1: bam --> pileup ===
$GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5human.g1k.v37.fa -l $HGDP/HGDP_938.bed $OUT/bams/${SAMPLE}.recal.bam > $OUT/ancestry/${SAMPLE}.recalHGDP.pileup This step takes ~1 hour for a genome sequenced at 17X. While this is running, we will go look at other results, so exit screen: Ctrl-a d While that's running, let's go run the Association Analysis Practical: [[SeqShop: Genetic Association Analysis Practical, December 2014]] === Checking if Pileup finished === screen -r '''Did everyone's finish?''' 
This step takes 5-6 minutes.If not, let's go to Indel: [[SeqShop:_Calling_Your_Own_Genome,_December_2014#Reviewing_Indel_Results]]
 
If yes, let's continue below:
=== Step 2: pileup --> seq ===
python $LASER/pileup2seq/pileup2seq.py \
-m $HGDP/HGDP_938.site \
-o $OUT/ancestry/$SAMPLE.laser HGDP \ $OUT/ancestry/$SAMPLE.recalHGDP.pileup
This step takes just a few seconds.
This step will take about 5-6 minutes.
$LASER/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s $OUT/ancestry/$SAMPLE.laserHGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.laser.2 HGDP &
View the results:
less -S $OUT/ancestry/${SAMPLE}.laser.2HGDP.SeqPC.coord
== Visualizing Ancestry ==
Generate the plot:
Rscript plotHGDP.r $HGDP/HGDP_938.RefPC.coord $OUT/ancestry/${SAMPLE}.laser.2HGDP.SeqPC.coord
Take a look:
evince Results_on_HGDP.pdf &
==Interested in looking just at European or East Asian or Central South Asian populations?===== Population information ===8 European populations in the HGDP dataset, including 156 individuals in total. 18 East Asian populations in the HGDP dataset, including 224 individuals after excluding 5 outliers (defined as more than 5 standard deviations from mean in any of top 10 PCs, 1 removal iteration). 8 Central/South Asian populations in the HGDP dataset, including 195 individuals after excluding 5 outliers (defined as more than 5 standard deviations from mean in any of top 10 PCs, 1 removal iteration). Note that 7 out of these 8 population were collected in Pakistan and 1 was collected in China (Uygur). [[File:HGDP Popualtions.png|thumb|center|alt=HGDP populations |300px|HGDP populations]] 
=== Step 1: bam --> pileup ===
You can skip this, you already did it.
=== Step 2: pileup --> seq ===
python $LASER/pileup2seq/pileup2seq.py \ -m $HGDP/HGDP.633K.euro.site \ -o $OUT/ancestry/$SAMPLE.Euro.laser \ $OUT/ancestry/${SAMPLE}.recal.pileup This You can skip this step takes just a few secondstoo, because you already did it.
=== Estimate ancestry ===
This step will take a few seconds.
For European ancestry: $LASER/laser -g $HGDP/HGDP.633K.euro.geno -c $HGDP/HGDP.633K.euro.RefPC.coord -s $OUT/ancestry/$SAMPLE.Euro.laserHGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-Euro.laser.1 &
For East Asian ancestry: $LASER/laser -g $HGDP/HGDP.633K.easia.2.geno -c $HGDP/HGDP.633K.easia.2.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-EAsia & For Central/South Asian ancestry: $LASER/laser -g $HGDP/HGDP.633K.csasia.2.geno -c $HGDP/HGDP.633K.csasia.2.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-CSAsia &  View the resultsfor European ancestry: less -S $OUT/ancestry/${SAMPLE}.HGDP-Euro.laserSeqPC.coordor for East Asian ancestry less -S $OUT/ancestry/${SAMPLE}.HGDP-EAsia.SeqPC.coordor for Central/South Asian ancestry less -S $OUT/ancestry/${SAMPLE}.1HGDP-CSAsia.SeqPC.coord
== Visualizing Ancestry ==
Copy the R code to plot your ancestry
cp -r $LASER/plot/ $OUT/ancestry/.
Change to that new directory:
Generate the plotfor European ancestry: Rscript plotHGDP.r $HGDP/HGDP.633K.euro.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-Euro.laserSeqPC.coord or for East Asian ancestry Rscript plotHGDP.r $HGDP/HGDP.633K.1easia.2.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-EAsia.SeqPC.coord or for Central/South Asian ancestry Rscript plotHGDP.r $HGDP/HGDP.633K.csasia.2.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-CSAsia.SeqPC.coord
Take a look:
evince Results_on_HGDP.pdf &
 
== Return to Calling your own Genome ==
Go to [[SeqShop:_Calling_Your_Own_Genome,_December_2014#Reviewing_Indel_Results]] and pick up where we left off.

Navigation menu