SeqShop: Calling Your Own Genome, June 2014

From Genome Analysis Wiki
Jump to navigationJump to search

Note: the latest version of this practical is available at: SeqShop: Calling Your Own Genome

  • The ones here is the original one from the June workshop (updated to be run from elsewhere)


Login to the seqshop-server Linux Machine

This section will appear redundantly in each session. If you are already logged in or know how to log in to the server, please skip this section

  1. Login to the windows machine
    • The username/password for the Windows machine should be written on the right-hand monitor
  2. Start xming so you can open external windows on our Linux machine
    • Start->Enter "Xming" in the search and select "Xming" from the program list
    • Nothing will happen, but Xming was started.
    • View Screenshot
    • Xming.png

  3. Open putty
    • Start->Enter "putty" in the search and select "PuTTY" from the program list
    • View Screenshot
    • PuttyS.png

  4. Configure PuTTY in the PuTTY Configuration window
    • Host Name: seqshop-server.sph.umich.edu
    • View Screenshot
    • Seqshop.png

    • Setup to allow you to open external windows:
      • In the left pannel: Connection->SSH->X11
        • Add a check mark in the box next to Enable X11 forwarding
        • View Screenshot
        • SeqshopX11.png

    • Click Open
    • If it prompts about a key, click OK
  5. Enter your provided username & password as provided


You should now be logged into a terminal on the seqshop-server and be able to access the test files.

  • If you need another terminal, repeat from step 3.

Login to the seqshop Machine

So you can each run multiple jobs at once, we will have you run on 4 different machines within our seqshop setup.

  • You can only access these machines after logging onto seqshop-server

3 users logon to:

ssh -X seqshop1

3 users logon to:

ssh -X seqshop2

2 users logon to:

ssh -X seqshop3

2 users logon to:

ssh -X seqshop4

Setup

Set these values. If you used a different path for any of these, please update here. Also, be sure to specify your sample name instead of Sample_XXXXX

source /home/mktrost/seqshop/setup.2x.txt
export SAMPLE=Sample_XXXXX
export ALIGN_OUT=~/personal/output
export CHR20_OUT=~/personal/output.20
mkdir -p $CHR20_OUT
export EXOME_OUT=~/personal/output.exome
mkdir -p $EXOME_OUT

ALIGN_OUT needs to point to where your alignment output went, so if your output is not ~/personal/output, please set OUT appropriately

Verify that this does not give an error:

ls $ALIGN_OUT/bams/${SAMPLE}.recal.bam

Chromosome 20

We want to add the 100 1000G chr20 BAMs to your bam list. Let's copy the original one into a new one so we can run other tests later.

cp $ALIGN_OUT/bam.index $CHR20_OUT/bam.20.index

Now add the chr20 BAMs to your new bam list:

cat $IN/chr20/bam.20.index >> $CHR20_OUT/bam.20.index

We are going to run on the cluster, so edit the first line of $CHR20_OUT/bam.20.index to give the cluster path to your info file.

nedit $CHR20_OUT/bam.20.index

Replace the /home on the first line with /net/seqshop-server

Verify you have 101 lines in your list:

wc -l $CHR20_OUT/bam.20.index

Update your gotcloud configuration file to indicate only chromosome 20 and point to the new list:

nedit ~/personal/gotcloud.2x.conf

Replace all occurrances of /home with /net/seqshop-server - this will allow you to access your home directory from jobs running on the mini-cluster

Update OUT_DIR & BAM_INDEX to:

OUT_DIR = $(IN_DIR)/output.20
BAM_INDEX = $(OUT_DIR)/bam.20.index

Tell it you only want to process chromosome 20, by adding the following anywhere in the file:

CHRS = 20

Since it would take a while to run chrom 20 for 101 samples, I already ran the first step for the 100 1000G samples.

We will "trick" GotCloud into thinking you already ran them by copying them into your output directory.

cp -r $IN/chr20/glfs $CHR20_OUT/.

Now you are ready to run. Specify your chr20 bam list on the command line (or you could update BAM_INDEX in your conf file.

Run 4 jobs on our mini-cluster

$GC/gotcloud snpcall --conf ~/personal/gotcloud.2x.conf --numjobs 4 --batchtype mosix --batchopts "-j10,11,12,13"
  • --batchtype says to use mosix (our cluster system)
  • --batchopts tells mosix the options to run with
    • for mosix, -j10,11,12,13 says to run on nodes 10, 11, 12, & 13 - the names of the 4 nodes on our mini-cluster

Exome

To speed things up, I extracted only exome regions from 100 1000g low coverage BAMs.

Let's create a new bam info file with your BAM combined with those BAMs.

cp $ALIGN_OUT/bam.index $EXOME_OUT/bam.exome.index

Now add the exome BAMs to your new bam list:

cat $IN/exome/bam.exome.index >> $EXOME_OUT/bam.exome.index

Verify you have 101 lines in your list:

wc -l $EXOME_OUT/bam.exome.index

We are going to run on the cluster, so edit the first line of $EXOME_OUT/bam.exome.index to give the cluster path to your info file.

nedit $EXOME_OUT/bam.exome.index

Replace the /home on the first line with /net/seqshop-server

Locate your gotcloud.2x.conf (probably at: ~/personal/gotcloud.2x.conf) and open it in your favorite editor:

nedit  ~/personal/gotcloud.2x.conf

Replace all occurrances of

/home with /net/seqshop-server

This is so you can run on the mini-cluster we have and can run more jobs at once

Update OUT_DIR & BAM_INDEX to:

OUT_DIR = $(IN_DIR)/output.exome
BAM_INDEX = $(OUT_DIR)/bam.exome.index

Update your gotcloud configuration file to indicate exomes:

# Specify the path to the regions we want to call
UNIFORM_TARGET_BED = $(REF_DIR)/20130108.exome.targets.nochr.bed

# We do not want any off target bases
OFFSET_OFF_TARGET = 0

WRITE_TARGET_LOCI = TRUE
TARGET_DIR = target

Remove CHRS = 20

Since it would take a while to run all 101 samples, I already ran the first step for the 100 1000G samples. We will "trick" GotCloud into thinking you already ran them by copying them into your output directory.

cp -r $IN/exome/glfs $EXOME_OUT/.


Run 4 jobs on our mini-cluster

$GC/gotcloud snpcall --conf ~/personal/gotcloud.2x.conf --numjobs 4 --batchtype mosix --batchopts "-j10,11,12,13"
  • --batchtype says to use mosix (our cluster system)
  • --batchopts tells mosix the options to run with
    • for mosix, -j10,11,12,13 says to run on nodes 10, 11, 12, & 13 - the names of the 4 nodes on our mini-cluster