Difference between revisions of "SeqShop: Calling Your Own Genome, December 2014"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 37: Line 37:
 
:<code>export OUT=~/$SAMPLE/output</code>
 
:<code>export OUT=~/$SAMPLE/output</code>
  
 
+
== Step 1 (Day 1): Start SnpCall ==
== List of BAMs ==
+
=== List of BAMs ===
 
The list of BAMs has already been created (just 1 BAM, your sample).
 
The list of BAMs has already been created (just 1 BAM, your sample).
 
* But it is simply SAMPLE\tBAM_name, so easy to figure out
 
* But it is simply SAMPLE\tBAM_name, so easy to figure out
Line 47: Line 47:
 
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
 
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
  
== Configuring SnpCall ==
+
=== Configuring SnpCall ===
  
 
  cat ~/$SAMPLE/gotcloud.conf
 
  cat ~/$SAMPLE/gotcloud.conf
Line 73: Line 73:
 
EXT_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample/ext
 
EXT_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample/ext
 
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz
 
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz
 +
</pre>
  
</pre>
+
==== Configuration Updates ====
 +
In order to complete SnpCall overnight, we are going to tell GotCloud to only call SNPs for the EXOME regions.
 +
 
 +
Edit 
 +
nedit $SAMPLE/gotcloud.conf&
 +
* Or you can use <code>vi</code> or <code>emacs</code> or your favorite editor
 +
 
 +
Specify the target region:
 +
UNIFORM_TARGET_BED = /net/seqshop-server/home/mktrost/seqshop/singleSample/20130108.exome.targets.bed
 +
* See [[http://genome.sph.umich.edu/wiki/GotCloud:_Variant_Calling_Pipeline#Targeted.2FExome_Sequencing_Settings|Targeted/Exome Sequenceing Settings]] for more information on the GotCloud configuration settings for running Targeted/Exome runs.
  
== Running SnpCall ==
+
=== Running SnpCall ===
 
Run GotCloud snpcall with 6 jobs running in parallel
 
Run GotCloud snpcall with 6 jobs running in parallel
 
* Why 6?   
 
* Why 6?   
Line 86: Line 96:
 
This will run overnight.  We will check if it completed at the practical in the morning.
 
This will run overnight.  We will check if it completed at the practical in the morning.
  
== Exome ==
+
=== Log Out ===
To speed things up, I extracted only exome regions from 100 1000g low coverage BAMs.
 
 
 
Let's create a new bam info file with your BAM combined with those BAMs.
 
cp $ALIGN_OUT/bam.index $EXOME_OUT/bam.exome.index
 
 
 
Now add the exome BAMs to your new bam list:
 
cat $IN/exome/bam.exome.index >> $EXOME_OUT/bam.exome.index
 
 
 
Verify you have 101 lines in your list:
 
wc -l $EXOME_OUT/bam.exome.index
 
 
 
We are going to run on the cluster, so edit the first line of $EXOME_OUT/bam.exome.index to give the cluster path to your info file.
 
nedit $EXOME_OUT/bam.exome.index
 
Replace the /home on the first line with /net/seqshop-server
 
 
 
Locate your gotcloud.2x.conf (probably at: ~/personal/gotcloud.2x.conf) and open it in your favorite editor:
 
nedit  ~/personal/gotcloud.2x.conf
 
 
 
Replace all occurrances of
 
/home with /net/seqshop-server
 
This is so you can run on the mini-cluster we have and can run more jobs at once
 
 
 
Update OUT_DIR & BAM_INDEX to:
 
OUT_DIR = $(IN_DIR)/output.exome
 
BAM_INDEX = $(OUT_DIR)/bam.exome.index
 
 
 
Update your gotcloud configuration file to indicate exomes:
 
# Specify the path to the regions we want to call
 
UNIFORM_TARGET_BED = $(REF_DIR)/20130108.exome.targets.nochr.bed
 
 
# We do not want any off target bases
 
OFFSET_OFF_TARGET = 0
 
 
WRITE_TARGET_LOCI = TRUE
 
TARGET_DIR = target
 
 
 
Remove CHRS = 20
 
 
 
Since it would take a while to run all 101 samples, I already ran the first step for the 100 1000G samples.
 
We will "trick" GotCloud into thinking you already ran them by copying them into your output directory.
 
cp -r $IN/exome/glfs $EXOME_OUT/.
 
 
 
 
 
Run 4 jobs on our mini-cluster
 
$GC/gotcloud snpcall --conf ~/personal/gotcloud.2x.conf --numjobs 4 --batchtype mosix --batchopts "-j10,11,12,13"
 
* --batchtype says to use mosix (our cluster system)
 
* --batchopts tells mosix the options to run with
 
** for mosix, -j10,11,12,13 says to run on nodes 10, 11, 12, & 13 - the names of the 4 nodes on our mini-cluster
 
 
 
== Log Out ==
 
 
;Want to log out and leave your job running?
 
;Want to log out and leave your job running?
 
In the screen window, type:
 
In the screen window, type:
Line 148: Line 108:
 
exit PuTTY
 
exit PuTTY
  
== Day 2 (Tuesday) FEEDBACK! ==
+
=== Tuesday FEEDBACK! ===
 
Please provide feedback on the lectures/tutorials from today:  
 
Please provide feedback on the lectures/tutorials from today:  
  
 
https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
 
https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
  
== Logging Back in to Check Jobs ==
+
== Step 2 (Day 2): Checking SnpCall ==
 +
=== Logging Back in to Check Jobs ===
  
 
;How do you log back into screen tomorrow?
 
;How do you log back into screen tomorrow?
 
  screen -r
 
  screen -r
 
This will resume an already running screen.
 
This will resume an already running screen.

Revision as of 23:08, 8 December 2014

Login instructions for seqshop-server

Login to the seqshop-server Linux Machine

This section will appear redundantly in each session. If you are already logged in or know how to log in to the server, please skip this section

  1. Login to the windows machine
    • The username/password for the Windows machine should be written on the right-hand monitor
  2. Start xming so you can open external windows on our Linux machine
    • Start->Enter "Xming" in the search and select "Xming" from the program list
    • Nothing will happen, but Xming was started.
    • View Screenshot
    • Xming.png

  3. Open putty
    • Start->Enter "putty" in the search and select "PuTTY" from the program list
    • View Screenshot
    • PuttyS.png

  4. Configure PuTTY in the PuTTY Configuration window
    • Host Name: seqshop-server.sph.umich.edu
    • View Screenshot
    • Seqshop.png

    • Setup to allow you to open external windows:
      • In the left pannel: Connection->SSH->X11
        • Add a check mark in the box next to Enable X11 forwarding
        • View Screenshot
        • SeqshopX11.png

    • Click Open
    • If it prompts about a key, click OK
  5. Enter your provided username & password as provided


You should now be logged into a terminal on the seqshop-server and be able to access the test files.

  • If you need another terminal, repeat from step 3.

Login to the seqshop Machine

So you can each run multiple jobs at once, we will have you run on 4 different machines within our seqshop setup.

  • You can only access these machines after logging onto seqshop-server

3 users logon to:

ssh -X seqshop1

3 users logon to:

ssh -X seqshop2

2 users logon to:

ssh -X seqshop3

2 users logon to:

ssh -X seqshop4

Setup

The snpcall pipeline will run overnight, but you'll want to log out.

How do I leave something running on the server even if I log out?
One solution is screen!
How do I use screen?
Before running your command, you need to start screen:
screen

Screen.png

As it says, press Space or Return.

  • It should now look basically the same as your normal command line.
Scrolling problems when using screen?
If you want to scroll and screen doesn't scroll like you normally would?
  • Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
    • Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)


Set these values. Also, be sure to specify your sample name instead of SampleXX

export SAMPLE=SampleXX
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt

See the settings you just used:

cat /net/seqshop-server/home/mktrost/seqshop/setupSS.txt

Shows you:

export GC=/net/seqshop-server/home/mktrost/seqshop/gotcloud
export OUT=~/$SAMPLE/output

Step 1 (Day 1): Start SnpCall

List of BAMs

The list of BAMs has already been created (just 1 BAM, your sample).

  • But it is simply SAMPLE\tBAM_name, so easy to figure out
cat ~/$SAMPLE/output/bam.list
SampleXX SampleXX/output/bams/SampleXX.recal.bam
  • Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).

Configuring SnpCall

cat ~/$SAMPLE/gotcloud.conf

You will see something like this:

# Cluster Settings
BATCH_TYPE = 
BATCH_OPTS = 

OUT_DIR = Sample13/output

# Align Settings
MAP_TYPE = BWA_MEM
BWA_THREADS = -t 24
FASTQ_LIST = fastq.list

# SNP Call Settings
UNIT_CHUNK = 20000000      # Chunk size of SNP calling : 20Mb
VCF_EXTRACT = /net/seqshop-server/home/mktrost/seqshop/singleSample/snpOnly.vcf.gz
MODEL_GLFSINGLE = TRUE
MODEL_SKIP_DISCOVER = FALSE
MODEL_AF_PRIOR = TRUE

EXT_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample/ext
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz

Configuration Updates

In order to complete SnpCall overnight, we are going to tell GotCloud to only call SNPs for the EXOME regions.

Edit

nedit $SAMPLE/gotcloud.conf&
  • Or you can use vi or emacs or your favorite editor

Specify the target region:

UNIFORM_TARGET_BED = /net/seqshop-server/home/mktrost/seqshop/singleSample/20130108.exome.targets.bed
  • See [Sequenceing Settings] for more information on the GotCloud configuration settings for running Targeted/Exome runs.

Running SnpCall

Run GotCloud snpcall with 6 jobs running in parallel

  • Why 6?
    • You want to run as many as you can.
    • 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine
${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6
  • Only need the configuration & number of threads, rest is specified within the configuration.

This will run overnight. We will check if it completed at the practical in the morning.

Log Out

Want to log out and leave your job running?

In the screen window, type:

Ctrl-a d

(Hold down Ctrl and type 'a', let go of both and type 'd')

  • This will "detach" from your screen session while your alignment continues to run.

If you have not detached from screen:

Ctrl-a d

exit PuTTY

Tuesday FEEDBACK!

Please provide feedback on the lectures/tutorials from today:

https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform

Step 2 (Day 2): Checking SnpCall

Logging Back in to Check Jobs

How do you log back into screen tomorrow?
screen -r

This will resume an already running screen.