Open main menu

Genome Analysis Wiki β

SeqShop: Ancestry On Your Own Genome, May 2015

Revision as of 14:24, 21 May 2015 by Mktrost (talk | contribs) (Setup your terminal)

Contents

Setup

If you are already logged in, you can skip this section.

Login to the seqshop-server Linux Machine

This section will appear redundantly in each session. If you are already logged in or know how to log in to the server, please skip this section

  1. Login to the windows machine
  • The username/password for the Windows machine should be written on the right-hand monitor
  • Start xming so you can open external windows on our Linux machine
    • Start->Enter "Xming" in the search and select "Xming" from the program list
    • Nothing will happen, but Xming was started.
    • View Screenshot
    • Xming.png

  • Open putty
    • Start->Enter "putty" in the search and select "PuTTY" from the program list
    • View Screenshot
    • PuttyS.png

  • Configure PuTTY in the PuTTY Configuration window
    • Host Name: seqshop-server.sph.umich.edu
    • View Screenshot
    • Seqshop.png

    • Setup to allow you to open external windows:
      • In the left pannel: Connection->SSH->X11
        • Add a check mark in the box next to Enable X11 forwarding
        • View Screenshot
        • SeqshopX11.png

    • Click Open
    • If it prompts about a key, click OK
  • Enter your provided username & password as provided

  • You should now be logged into a terminal on the seqshop-server and be able to access the test files.

    • If you need another terminal, repeat from step 3.

    Login to the seqshop Machine

    So you can each run multiple jobs at once, we will have you run on 4 different machines within our seqshop setup.

    • You can only access these machines after logging onto seqshop-server

    3 users logon to:

    ssh -X seqshop1
    

    3 users logon to:

    ssh -X seqshop2
    

    2 users logon to:

    ssh -X seqshop3
    

    2 users logon to:

    ssh -X seqshop4
    

    Setup your terminal

    Resume screen:

    screen -r
    


    Setup LASER

    export SSREF=/net/seqshop-server/home/mktrost/seqshop/singleSample/ref
    export LASER=~/seqshop/output/ancestry/LASER-2.01
    

    (SAMPLE were previously set in your screen session.)

    After setting this, also do

    mkdir -p $OUT/ancestry
    

    Verify that this does not give an error:

    ls $OUT/bams/${SAMPLE}.recal.bam
    

    Run

    Step 1: bam --> pileup

    $GC/bin/samtools mpileup -q 30 -Q 20 -f $SSREF/gotcloud.ref/hs37d5.fa -l $SSREF/HGDP/HGDP_938.bed $OUT/bams/${SAMPLE}.recal.bam > $OUT/ancestry/${SAMPLE}.HGDP.pileup
    

    This step takes ~1 hour for a genome sequenced at 17X.

    While this is running, we will go look at other results, so exit screen:

    Ctrl-a d
    

    While that's running, let's go look at Calling results: SeqShop: Calling Your Own Genome, May 2015

    Checking if Pileup finished

    screen -r
    

    Did everyone's finish?


    If not, let's go to Indel: SeqShop:_Calling_Your_Own_Genome,_May_2015#Reviewing_Indel_Results


    If yes, let's continue below:

    Step 2: pileup --> seq

    python $LASER/pileup2seq/pileup2seq.py \
    -m $SSREF/HGDP_938.site \
    -o $OUT/ancestry/$SAMPLE.HGDP \
    $OUT/ancestry/$SAMPLE.HGDP.pileup
    

    This step takes just a few seconds.

    Estimate ancestry

    This step will take about 5-6 minutes.

    $LASER/laser -g $SSREF/HGDP_938.geno -c $SSREF/HGDP_938.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP &
    

    View the results:

    less -S $OUT/ancestry/${SAMPLE}.HGDP.SeqPC.coord
    

    Visualizing Ancestry

    Copy the R code to plot your ancestry

    cp -r $LASER/plot/ $OUT/ancestry/.
    

    Change to that new directory:

    cd $OUT/ancestry/plot
    

    Generate the plot:

    Rscript plotHGDP.r $HGDP/HGDP_938.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP.SeqPC.coord 
    

    Take a look:

    evince Results_on_HGDP.pdf &
    

    Interested in looking just at European or East Asian or Central South Asian populations?

    Population information

    8 European populations in the HGDP dataset, including 156 individuals in total.

    18 East Asian populations in the HGDP dataset, including 224 individuals after excluding 5 outliers (defined as more than 5 standard deviations from mean in any of top 10 PCs, 1 removal iteration).

    8 Central/South Asian populations in the HGDP dataset, including 195 individuals after excluding 5 outliers (defined as more than 5 standard deviations from mean in any of top 10 PCs, 1 removal iteration). Note that 7 out of these 8 population were collected in Pakistan and 1 was collected in China (Uygur).

    HGDP populations
    HGDP populations

    Step 1: bam --> pileup

    You can skip this, you already did it.

    Step 2: pileup --> seq

    You can skip this step too, because you already did it.

    Estimate ancestry

    This step will take a few seconds.

    For European ancestry:

    $LASER/laser -g $HGDP/HGDP.633K.euro.geno -c $HGDP/HGDP.633K.euro.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-Euro &
    

    For East Asian ancestry:

    $LASER/laser -g $HGDP/HGDP.633K.easia.2.geno -c $HGDP/HGDP.633K.easia.2.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-EAsia &
    

    For Central/South Asian ancestry:

    $LASER/laser -g $HGDP/HGDP.633K.csasia.2.geno -c $HGDP/HGDP.633K.csasia.2.RefPC.coord -s $OUT/ancestry/$SAMPLE.HGDP.seq -K 20 -k 4 -M 0.8 -o $OUT/ancestry/$SAMPLE.HGDP-CSAsia &
    


    View the results for European ancestry:

    less -S $OUT/ancestry/${SAMPLE}.HGDP-Euro.SeqPC.coord
    

    or for East Asian ancestry

    less -S $OUT/ancestry/${SAMPLE}.HGDP-EAsia.SeqPC.coord
    

    or for Central/South Asian ancestry

    less -S $OUT/ancestry/${SAMPLE}.HGDP-CSAsia.SeqPC.coord
    

    Visualizing Ancestry

    Change to that new directory:

    cd $OUT/ancestry/plot
    

    Move your other plot so you don't over-write it

    mv Results_on_HGDP.pdf Results_on_HGDP_All.pdf
    


    Generate the plot for European ancestry:

    Rscript plotHGDP.r $HGDP/HGDP.633K.euro.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-Euro.SeqPC.coord 
    

    or for East Asian ancestry

    Rscript plotHGDP.r $HGDP/HGDP.633K.easia.2.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-EAsia.SeqPC.coord 
    

    or for Central/South Asian ancestry

    Rscript plotHGDP.r $HGDP/HGDP.633K.csasia.2.RefPC.coord $OUT/ancestry/${SAMPLE}.HGDP-CSAsia.SeqPC.coord 
    

    Take a look:

    evince Results_on_HGDP.pdf &
    

    Return to Calling your own Genome