Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 8: Line 8:     
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
== Tuesday - Start INDEL Calling ==
+
== Tuesday - Start SNP Calling ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Setup ===
+
=== Setup Screen ===
The indel & snpcall pipelines will run overnight, but you'll want to log out.
+
The snpcall pipeline will run overnight, but you'll want to log out.
 
; How do I leave something running on the server even if I log out?
 
; How do I leave something running on the server even if I log out?
 
: One solution is screen!
 
: One solution is screen!
Line 24: Line 24:  
* It should now look basically the same as your normal command line.
 
* It should now look basically the same as your normal command line.
    +
=== Setup Variables ===
 
Set these values.  Also, be sure to specify your sample name instead of SampleXX
 
Set these values.  Also, be sure to specify your sample name instead of SampleXX
 
  export SAMPLE=SampleXX
 
  export SAMPLE=SampleXX
  source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
+
or
 +
  export SAMPLE=NA12878
   −
See the settings you just used:
+
Point to your GotCloud & your output directory:
  cat /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
+
  export GC=~/seqshop/gotcloud
Shows you:
+
export OUT=~/$SAMPLE/output
:<code>export GC=/net/seqshop-server/home/mktrost/seqshop/gotcloud</code>
  −
:<code>export SS=/net/seqshop-server/home/mktrost/seqshop/singleSample</code>
  −
:<code>export OUT=~/$SAMPLE/output</code>
      
=== List of BAMs ===
 
=== List of BAMs ===
Line 44: Line 43:  
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
 
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
   −
=== Configuring Indel ===
+
=== Configuring SNPCALL ===
No special Configuration settings for Indel calling
+
 
 
  cat ~/$SAMPLE/gotcloud.conf
 
  cat ~/$SAMPLE/gotcloud.conf
   −
You will see something like this:
+
You will see this:
 
<pre>
 
<pre>
# Cluster Settings
+
# References
BATCH_TYPE =  
+
SS_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample
BATCH_OPTS =  
+
REF_DIR = $(SS_DIR)/ref/gotcloud.ref/
 
  −
OUT_DIR = Sample13/output
     −
# Align Settings
+
######### ALIGNMENT ########
 
MAP_TYPE = BWA_MEM
 
MAP_TYPE = BWA_MEM
BWA_THREADS = -t 24
   
FASTQ_LIST = fastq.list
 
FASTQ_LIST = fastq.list
 +
BATCH_TYPE =
 +
BATCH_OPTS =
 +
BWA_THREADS = -t 6
    
# SNP Call Settings
 
# SNP Call Settings
 
UNIT_CHUNK = 20000000      # Chunk size of SNP calling : 20Mb
 
UNIT_CHUNK = 20000000      # Chunk size of SNP calling : 20Mb
VCF_EXTRACT = /net/seqshop-server/home/mktrost/seqshop/singleSample/snpOnly.vcf.gz
+
VCF_EXTRACT = $(SS_DIR)/snpOnly.vcf.gz
 
MODEL_GLFSINGLE = TRUE
 
MODEL_GLFSINGLE = TRUE
 
MODEL_SKIP_DISCOVER = FALSE
 
MODEL_SKIP_DISCOVER = FALSE
 
MODEL_AF_PRIOR = TRUE
 
MODEL_AF_PRIOR = TRUE
   −
EXT_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample/ext
+
EXT_DIR = $(SS_DIR)/ext
 
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz
 
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz
 
</pre>
 
</pre>
   −
=== Running Indel Calling ===
+
=== Running SNP Calling ===
Run GotCloud indel with 6 jobs running in parallel
+
Run GotCloud snpcall with 8 jobs running in parallel
* Why 6?   
+
* Why 8?   
 
** You want to run as many as you can.
 
** You want to run as many as you can.
** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine
+
** 2-3 of you on the machine - 3*8 = 24 jobs will be running in parallel on that machine
  ${GC}/gotcloud indel --conf $SAMPLE/gotcloud.conf --numjobs 6
+
  ${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 8 --outdir $OUT
* Only need the configuration & number of threads, rest is specified within the configuration.
+
* Only need the configuration, number of threads, and the output directory, rest is specified within the configuration.
    
This will run overnight.  We will check if it completed at the practical in the morning.
 
This will run overnight.  We will check if it completed at the practical in the morning.
Line 93: Line 92:     
exit PuTTY
 
exit PuTTY
 +
 +
 +
=== FEEDBACK! ===
 +
Please provide feedback on today:
 +
 +
https://docs.google.com/forms/d/1ADTkBjzT-QNj2lrejyqGqDaahTponrw20kSgDNwqwH4/viewform
    
</div>
 
</div>
Line 99: Line 104:  
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
   −
== Wednesday ==
+
== Thursday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   −
<div class="mw-collapsible" style="width:500px">
+
=== Checking if snpcall Completed ===
=== Checking if INDEL Completed ===
+
==== Resume screen to Check Jobs ====
<div class="mw-collapsible-content">
  −
==== Logging Back in to Check Jobs ====
      
;How do you log back into screen?
 
;How do you log back into screen?
Line 111: Line 114:  
This will resume an already running screen.
 
This will resume an already running screen.
   −
Verify you got a "completed successfully" message.
+
Your screen session still has your environment variables set, so you do not need to reset them.
   −
How long did INDEL calling take?  Look at the log message - time in seconds.
     −
===  Need a BAM Index File? ===
+
Verify you got a "completed successfully" message.
Check your BAM files
  −
ls $SAMPLE/output/bams/*bam
  −
 
  −
Check your BAI files
  −
ls $SAMPLE/output/bams/*bai
  −
 
  −
Do you have the same number of BAM & BAI files?
  −
* If so, then you are good to go.
  −
* If you have more BAI's than BAM's - oops!!! (LET ME KNOW)
  −
* If you have more BAM's than BAI's, we need to generate the index:
  −
*: <pre>$GC/bin/samtools index $SAMPLE/output/bams/YOUR_BAM_WITH_MISSING_BAI.bam</pre>
  −
 
  −
 
  −
=== Detach From screen===
  −
Detach from screen.  We will resume it again later when we start SNPCall.
  −
Ctrl-a d
  −
 
  −
</div>
  −
</div>
  −
 
  −
=== Structural Variation Tutorial ===
  −
Now we are going to run the Structural Variation Practical
  −
 
  −
Please go to: [[SeqShop: Analysis of Structural Variation Practical, May 2015]]
  −
 
  −
We will Start SNP Calling after the practical.
  −
 
  −
 
  −
<div class="mw-collapsible" style="width:500px">
  −
 
  −
=== Start SNP Calling ===
  −
<div class="mw-collapsible-content">
  −
==== Resume screen ====
     −
;How do you log back into screen?
+
How long did snpcall calling takeLook at the log message - time in seconds.
  screen -r
  −
This will resume an already running screen.
     −
Your screen session still has your environment variables set, so you do not need to reset them.
      
==== List of BAMs ====
 
==== List of BAMs ====
Line 165: Line 131:  
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
 
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
   −
==== GotCloud SnpCall Configuration ====
+
 
 +
==== GotCloud INDEL Configuration ====
    
  cat ~/$SAMPLE/gotcloud.conf
 
  cat ~/$SAMPLE/gotcloud.conf
   −
You will see something like this:
+
Same as it looked the other day with no special Configuration settings for INDEL calling.
<pre>
  −
# Cluster Settings
  −
BATCH_TYPE =
  −
BATCH_OPTS =
     −
OUT_DIR = Sample13/output
+
==== Running INDEL ====
 
+
Run GotCloud indel with 6 jobs running in parallel
# Align Settings
+
  ${GC}/gotcloud indel --conf $SAMPLE/gotcloud.conf --numjobs 6 --outdir $OUT
MAP_TYPE = BWA_MEM
+
* Only need the configuration, number of threads, and the output directory, rest is specified within the configuration.
BWA_THREADS = -t 24
  −
FASTQ_LIST = fastq.list
  −
 
  −
# SNP Call Settings
  −
UNIT_CHUNK = 20000000      # Chunk size of SNP calling : 20Mb
  −
VCF_EXTRACT = /net/seqshop-server/home/mktrost/seqshop/singleSample/snpOnly.vcf.gz
  −
MODEL_GLFSINGLE = TRUE
  −
MODEL_SKIP_DISCOVER = FALSE
  −
MODEL_AF_PRIOR = TRUE
  −
 
  −
EXT_DIR = /net/seqshop-server/home/mktrost/seqshop/singleSample/ext
  −
EXT = $(EXT_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(EXT_DIR)/chrCHR.filtered.sites.vcf.gz
  −
</pre>
  −
 
  −
===== Configuration Updates =====
  −
'''No configuration updates from the original settings are necessary.'''
  −
* Originally you were going to update the configuration to do exome calling only, but we have decided to do whole genome
  −
** If it doesn't finish tonight, we will kill it at the tutorial in the morning, and take advantage of the GotCloud restart capability after the tutorial tomorrow.
  −
 
  −
If you started making updates to your configuration yesterday, you can try to make it match above (mktrost is actually the path you want in the paths above).
  −
* The only difference from above should be your OUT_DIR.
  −
Check it with (leave as ~mktrost - you are comparing to my NA12878 conf file):
  −
diff $SAMPLE/gotcloud.conf ~mktrost/NA12878/gotcloud.conf
  −
 
  −
'''If you see differences other than OUT_DIR, you can modify your file using nedit (or your favorite editor as you used yesterday).'''
  −
* '''Or you can copy my file and just change OUT_DIR:'''
  −
cp ~mktrost/NA12878/gotcloud.conf $SAMPLE/gotcloud.conf
  −
nedit $SAMPLE/gotcloud.conf&
  −
* Edit OUT_DIR
  −
OUT_DIR = Sample##/output
  −
* Save, and close
  −
 
  −
==== Running SnpCall ====
  −
Run GotCloud snpcall with 6 jobs running in parallel
  −
* Why 6? 
  −
** You want to run as many as you can.
  −
** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine
  −
  ${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6
  −
* Only need the configuration & number of threads, rest is specified within the configuration.
      
This will run overnight.  We will check if it completed at the practical in the morning.
 
This will run overnight.  We will check if it completed at the practical in the morning.
Line 230: Line 154:  
exit PuTTY
 
exit PuTTY
    +
=== FEEDBACK!===
 +
Please provide feedback for today.
 +
https://docs.google.com/a/umich.edu/forms/d/1iES6usHxLB7Ec9hRxtqYgH7v05lU3Ume4VJcksx8Ogg/viewform
    
</div>
 
</div>
 
</div>
 
</div>
</div>
  −
</div>
  −
  −
<div class="mw-collapsible mw-collapsed" style="width:500px">
  −
  −
== Thursday ==
  −
<div class="mw-collapsible-content">
  −
=== Checking if SnpCall Completed ===
  −
==== Logging Back in to Check Jobs ====
  −
  −
;How do you log back into screen?
  −
screen -r
  −
This will resume an already running screen.
  −
  −
==== Checking Completion ====
  −
  −
Did you get a "completed successfully" message?
  −
  −
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
  −
  −
If no, are you running on seqshop-server?  If so, KILL it.  ssh -X to one of the seqshop machines and run on there.
  −
Ctrl-c
  −
  −
Detach from screen.  We will resume it again later when we restart SNPCall (if necessary).
  −
Ctrl-a d
  −
  −
=== Ancestry Tutorial ===
  −
Now we are going to run the Structural Variation Practical
  −
  −
Please go to: [[SeqShop: Estimates of Genetic Ancestry Practical, May 2015]]
  −
  −
We will Resume SNP Calling (if necessary) after the practical.
  −
  −
=== Restart SnpCall ===
  −
==== Resume screen ====
  −
  −
;How do you log back into screen?
  −
screen -r
  −
This will resume an already running screen.
  −
  −
Your screen session still has your environment variables set, so you do not need to reset them.
  −
  −
==== Running SnpCall ====
  −
Run GotCloud snpcall with 6 jobs running in parallel
  −
${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6
  −
* GotCloud will pick up after the last completed step from before.
  −
  −
This will run overnight.  We will check if it completed at the practical in the morning.
  −
  −
==== Log Out ====
  −
;Want to log out and leave your job running?
  −
In the screen window, type:
  −
Ctrl-a d
  −
(Hold down Ctrl and type 'a', let go of both and type 'd')
  −
* This will "detach" from your screen session while your alignment continues to run.
  −
  −
exit PuTTY
  −
  −
   
</div>
 
</div>
 
</div>
 
</div>
Line 304: Line 172:  
[[SeqShop: Ancestry On Your Own Genome, May 2015]]
 
[[SeqShop: Ancestry On Your Own Genome, May 2015]]
   −
=== Association Analysis Tutorial ===
+
=== Setup Variables ===
Now we are going to run the Association Analysis Practical
+
Set these values.  Also, be sure to specify your sample name instead of SampleXX
 
+
export SAMPLE=SampleXX
Please go to: [[SeqShop: Association Analysis, May 2015]]
+
or
 
+
export SAMPLE=NA12878
We will look at our own genomes again after the practical.
     −
=== Return to SeqShop: Ancestry On Your Own Genome, May 2015 ===
+
Point to your GotCloud & your output directory:
Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_May_2015#Checking_if_Pileup_finished]]
+
export GC=~/seqshop/gotcloud
 +
export OUT=~/$SAMPLE/output
    
=== Reviewing Indel Results ===
 
=== Reviewing Indel Results ===
Set these values.  Also, be sure to specify your sample name (or NA12878) instead of SampleXX
  −
export SAMPLE=SampleXX
  −
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
  −
   
Look in the output directory
 
Look in the output directory
 
  ls ~/$SAMPLE/output
 
  ls ~/$SAMPLE/output
Line 383: Line 247:  
The insertion deletion ratio increases from 0.91 to 0.92.   
 
The insertion deletion ratio increases from 0.91 to 0.92.   
    +
=== Return to SeqShop: Ancestry On Your Own Genome, May 2015 ===
 +
Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_May_2015#Checking_if_Pileup_finished]]
    
=== Friday: Reviewing SNPCALL Results ===
 
=== Friday: Reviewing SNPCALL Results ===
Line 392: Line 258:     
=== Friday : More SNP Analysis ===
 
=== Friday : More SNP Analysis ===
 +
In addition, set another environmental variable for locating the binaries for custom analysis
   −
==== Environmental Variables ====
+
export HK=/net/seqshop-server/home/hmkang/apigenome/bin
 
+
  export EPACTS=/net/seqshop-server/home/mktrost/seqshop/epacts/
If you didn't set the environmental variable, you can set it again
+
  export REF=/net/seqshop-server/home/mktrost/seqshop/singleSample/ref/gotcloud.ref
 
+
export GC=~/seqshop/gotcloud
  source /net/seqshop-server/home/mktrost/seqshop/setup.txt
  −
  export SAMPLE=SampleXX (MAKE SURE TO CHANGE XX to your number or use NA12878 instead)
  −
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
     −
In addition, set another environmental variable for locating the binaries for custom analysis
     −
  export HK=/net/seqshop-server/home/hmkang/seqshop/bin
+
  export SAMPLE=SampleXX
 +
export OUT=~/$SAMPLE/output
    
==== Annotation / Lookup against dbSNP ====
 
==== Annotation / Lookup against dbSNP ====
Line 409: Line 273:  
If you want to add rsIDs to your variant files, you can do this by running the following command
 
If you want to add rsIDs to your variant files, you can do this by running the following command
   −
  $HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz
+
  $HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbsnp_142.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz
 
   
 
   
 
If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs
 
If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs
   −
  $HK/run-command-wgs --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6
+
  $HK/run-make --repeat-chr --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbsnp_142.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6 --out runmake.rsid
   −
Looking up SNPs by rsID is possible by (for example)
+
Looking up SNPs by rsID is possible by (for example, rs17766217) -- How can we find its position?
  $HK/vcf-lookup-rsid --vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --sepchr --rs rs17766217
+
  $HK/tabix $OUT/vcfs/chr8/chr8.filtered.rsid.vcf.gz 8:128504497 | less
 
* Be sure to look at the QUAL & your sample's PL, and not just the GL field.  Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0.  That means you probably didn't have any copies, so your GT may not be correct/is unknown.
 
* Be sure to look at the QUAL & your sample's PL, and not just the GL field.  Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0.  That means you probably didn't have any copies, so your GT may not be correct/is unknown.
    
If you want to browse the rsIDs of known GWAS SNPs, you can do this by
 
If you want to browse the rsIDs of known GWAS SNPs, you can do this by
  cut -f 1,8,22 $HK/../data/gwascatalog/gwascatalog.txt | less
+
  cut -f 1,8,12,13,22 $HK/../data/gwascatalog/gwascatalog.txt | grep -w rs17766217
    
==== Annotating your genome ====
 
==== Annotating your genome ====
Line 428: Line 292:     
Or you can run multiple chromosomes in parallel in one command
 
Or you can run multiple chromosomes in parallel in one command
  $HK/run-command-wgs --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6  
+
  $HK/run-make --repeat-chr --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6 --out runmake.anno
+
 
 
==== Extracting only exonic SNPs ====
 
==== Extracting only exonic SNPs ====
    
If you want to look at the exonic SNPs, you can extract using the following command
 
If you want to look at the exonic SNPs, you can extract using the following command
  $HK/run-command-wgs --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6
+
  $HK/run-make --repeat-chr --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6 --out runmake.exome
    
And they can be combined as follows
 
And they can be combined as follows
 
  (zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
 
  (zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
 +
$HK/tabix -pvcf $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
    
==== Exonic Variants NOT found by 1000G ====
 
==== Exonic Variants NOT found by 1000G ====
Line 460: Line 325:     
Want to see this from the BAM file?  Use samtools tview:
 
Want to see this from the BAM file?  Use samtools tview:
  $GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $GC/gotcloud.ref/human.g1k.v37.fa
+
  $GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $REF/hs37d5.fa
 
Use 'g' & enter the Chr:Pos
 
Use 'g' & enter the Chr:Pos
 
* Some patterns may indicate not real variants.
 
* Some patterns may indicate not real variants.
Line 469: Line 334:  
   
 
   
 
The phred score at the last column quantifies the degree of functional significance
 
The phred score at the last column quantifies the degree of functional significance
         
</div>
 
</div>
 
</div>
 
</div>
 +
 +
== OVERALL COURSE FEEDBACK! ==
 +
Please provide feedback:
 +
https://docs.google.com/forms/d/1pxfPXKwWfA71ZJM99Sevs3MwAUz2UbHAR8dnRI-kRNM/viewform

Navigation menu