Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  
__TOC__
 
__TOC__
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible" style="width:500px">
 
''Login instructions for seqshop-server''
 
''Login instructions for seqshop-server''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 94: Line 94:  
exit PuTTY
 
exit PuTTY
   −
=== Tuesday FEEDBACK! ===
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
   
</div>
 
</div>
 
</div>
 
</div>
 +
 +
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
 +
== Wednesday ==
 +
<div class="mw-collapsible-content">
    
<div class="mw-collapsible" style="width:500px">
 
<div class="mw-collapsible" style="width:500px">
== Wednesday Checking if INDEL Completed ==
+
=== Checking if INDEL Completed ===
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Logging Back in to Check Jobs ===
+
==== Logging Back in to Check Jobs ====
    
;How do you log back into screen?
 
;How do you log back into screen?
Line 114: Line 115:  
How long did INDEL calling take?  Look at the log message - time in seconds.
 
How long did INDEL calling take?  Look at the log message - time in seconds.
    +
===  Need a BAM Index File? ===
 +
Check your BAM files
 +
ls $SAMPLE/output/bams/*bam
 +
 +
Check your BAI files
 +
ls $SAMPLE/output/bams/*bai
 +
 +
Do you have the same number of BAM & BAI files?
 +
* If so, then you are good to go.
 +
* If you have more BAI's than BAM's - oops!!! (LET ME KNOW)
 +
* If you have more BAM's than BAI's, we need to generate the index:
 +
*: <pre>$GC/bin/samtools index $SAMPLE/output/bams/YOUR_BAM_WITH_MISSING_BAI.bam</pre>
 +
 +
 +
=== Detach From screen===
 
Detach from screen.  We will resume it again later when we start SNPCall.
 
Detach from screen.  We will resume it again later when we start SNPCall.
 
  Ctrl-a d
 
  Ctrl-a d
Line 120: Line 136:  
</div>
 
</div>
   −
== Wednesday - Structural Variation Tutorial ==
+
=== Structural Variation Tutorial ===
 
Now we are going to run the Structural Variation Practical
 
Now we are going to run the Structural Variation Practical
   Line 126: Line 142:     
We will Start SNP Calling after the practical.
 
We will Start SNP Calling after the practical.
 +
    
<div class="mw-collapsible" style="width:500px">
 
<div class="mw-collapsible" style="width:500px">
== Wednesday - Start SNP Calling ==
+
 
 +
=== Start SNP Calling ===
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Setup ===
+
==== Resume screen ====
The indel & snpcall pipelines will run overnight, but you'll want to log out.
  −
; How do I leave something running on the server even if I log out?
  −
: One solution is screen!
     −
; How do I use screen?
+
;How do you log back into screen?
: Before running your command, you need to start screen:
+
screen -r
: <pre>screen</pre>
+
This will resume an already running screen.
 
  −
[[File:Screen.png]]
  −
 
  −
As it says, press <code>Space</code> or <code>Return</code>.
  −
* It should now look basically the same as your normal command line.
     −
Set these values.  Also, be sure to specify your sample name instead of SampleXX
+
Your screen session still has your environment variables set, so you do not need to reset them.
export SAMPLE=SampleXX
  −
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
     −
See the settings you just used:
+
==== List of BAMs ====
cat /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
  −
Shows you:
  −
:<code>export GC=/net/seqshop-server/home/mktrost/seqshop/gotcloud</code>
  −
:<code>export SS=/net/seqshop-server/home/mktrost/seqshop/singleSample</code>
  −
:<code>export OUT=~/$SAMPLE/output</code>
  −
 
  −
=== List of BAMs ===
   
The list of BAMs has already been created (just 1 BAM, your sample).
 
The list of BAMs has already been created (just 1 BAM, your sample).
 
* But it is simply SAMPLE\tBAM_name, so easy to figure out
 
* But it is simply SAMPLE\tBAM_name, so easy to figure out
Line 164: Line 165:  
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
 
* Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path).
   −
=== Configuring SnpCall ===
+
==== GotCloud SnpCall Configuration ====
    
  cat ~/$SAMPLE/gotcloud.conf
 
  cat ~/$SAMPLE/gotcloud.conf
Line 192: Line 193:  
</pre>
 
</pre>
   −
==== Configuration Updates ====
+
===== Configuration Updates =====
In order to complete SnpCall overnight, we are going to tell GotCloud to only call SNPs for the EXOME regions.
+
'''No configuration updates from the original settings are necessary.'''
 +
* Originally you were going to update the configuration to do exome calling only, but we have decided to do whole genome
 +
** If it doesn't finish tonight, we will kill it at the tutorial in the morning, and take advantage of the GotCloud restart capability after the tutorial tomorrow.
 +
 
 +
If you started making updates to your configuration yesterday, you can try to make it match above (mktrost is actually the path you want in the paths above).
 +
* The only difference from above should be your OUT_DIR.
 +
Check it with (leave as ~mktrost - you are comparing to my NA12878 conf file):
 +
diff $SAMPLE/gotcloud.conf ~mktrost/NA12878/gotcloud.conf
   −
Edit  
+
'''If you see differences other than OUT_DIR, you can modify your file using nedit (or your favorite editor as you used yesterday).'''
 +
* '''Or you can copy my file and just change OUT_DIR:'''
 +
  cp ~mktrost/NA12878/gotcloud.conf $SAMPLE/gotcloud.conf
 
  nedit $SAMPLE/gotcloud.conf&
 
  nedit $SAMPLE/gotcloud.conf&
* Or you can use <code>vi</code> or <code>emacs</code> or your favorite editor
+
* Edit OUT_DIR
 
+
  OUT_DIR = Sample##/output
Specify the target region:
+
* Save, and close
  UNIFORM_TARGET_BED = /net/seqshop-server/home/mktrost/seqshop/singleSample/20130108.exome.targets.bed
  −
* See [http://genome.sph.umich.edu/wiki/GotCloud:_Variant_Calling_Pipeline#Targeted.2FExome_Sequencing_Settings|Targeted/Exome Sequencing Settings] for more information on the GotCloud configuration settings for running Targeted/Exome runs.
     −
=== Running SnpCall ===
+
==== Running SnpCall ====
 
Run GotCloud snpcall with 6 jobs running in parallel
 
Run GotCloud snpcall with 6 jobs running in parallel
 
* Why 6?   
 
* Why 6?   
Line 213: Line 221:  
This will run overnight.  We will check if it completed at the practical in the morning.
 
This will run overnight.  We will check if it completed at the practical in the morning.
   −
=== Log Out ===
+
==== Log Out ====
 
;Want to log out and leave your job running?
 
;Want to log out and leave your job running?
 
In the screen window, type:
 
In the screen window, type:
Line 219: Line 227:  
(Hold down Ctrl and type 'a', let go of both and type 'd')
 
(Hold down Ctrl and type 'a', let go of both and type 'd')
 
* This will "detach" from your screen session while your alignment continues to run.
 
* This will "detach" from your screen session while your alignment continues to run.
  −
If you have not detached from screen:
  −
Ctrl-a d
      
exit PuTTY
 
exit PuTTY
   −
=== Wednesday FEEDBACK! ===
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/a/umich.edu/forms/d/1CCHL9ODPsw4jX4hj0kGo6AMHwT4Gam0IpKNRnIR9yMk/viewform
      +
</div>
 +
</div>
 
</div>
 
</div>
 
</div>
 
</div>
    
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
== Thursday Checking if SnpCall Completed ==
+
 
 +
== Thursday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
=== Logging Back in to Check Jobs ===
+
=== Checking if SnpCall Completed ===
 +
==== Logging Back in to Check Jobs ====
    
;How do you log back into screen?
 
;How do you log back into screen?
Line 242: Line 247:  
This will resume an already running screen.
 
This will resume an already running screen.
   −
=== Checking Completion ===
+
==== Checking Completion ====
    
Did you get a "completed successfully" message?
 
Did you get a "completed successfully" message?
Line 248: Line 253:  
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
 
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
   −
If no, KILL it.  We will start it back running again after the tutorial today.
+
If no, are you running on seqshop-server?  If so, KILL it.  ssh -X to one of the seqshop machines and run on there.
 +
Ctrl-c
    
Detach from screen.  We will resume it again later when we restart SNPCall (if necessary).
 
Detach from screen.  We will resume it again later when we restart SNPCall (if necessary).
 
  Ctrl-a d
 
  Ctrl-a d
   −
</div>
+
=== Ancestry Tutorial ===
</div>
  −
 
  −
 
  −
== Thursday - Ancestry Tutorial ==
   
Now we are going to run the Structural Variation Practical
 
Now we are going to run the Structural Variation Practical
   Line 265: Line 267:       −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
=== Restart SnpCall ===
== Friday: Reviewing Indel Results ==
+
==== Resume screen ====
<div class="mw-collapsible-content">
  −
=== Logging Back in to Check Jobs ===
     −
;How do you log back into screen tomorrow?
+
;How do you log back into screen?
 
  screen -r
 
  screen -r
 
This will resume an already running screen.
 
This will resume an already running screen.
   −
=== Looking at INDEL results ===
+
Your screen session still has your environment variables set, so you do not need to reset them.
 +
 
 +
==== Running SnpCall ====
 +
Run GotCloud snpcall with 6 jobs running in parallel
 +
${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6
 +
* GotCloud will pick up after the last completed step from before.
 +
 
 +
This will run overnight.  We will check if it completed at the practical in the morning.
 +
 
 +
==== Log Out ====
 +
;Want to log out and leave your job running?
 +
In the screen window, type:
 +
Ctrl-a d
 +
(Hold down Ctrl and type 'a', let go of both and type 'd')
 +
* This will "detach" from your screen session while your alignment continues to run.
 +
 
 +
exit PuTTY
 +
 
 +
 
 +
</div>
 +
</div>
 +
 
 +
<div class="mw-collapsible" style="width:1000px">
 +
 
 +
== Friday ==
 +
<div class="mw-collapsible-content">
 +
 
 +
 
 +
=== SeqShop: Ancestry On Your Own Genome, December 2014 ===
 +
[[SeqShop: Ancestry On Your Own Genome, December 2014]]
 +
 
 +
 
 +
=== Association Analysis Tutorial ===
 +
Now we are going to run the Association Analysis Practical
 +
 
 +
Please go to: [[SeqShop: Genetic Association Analysis Practical, December 2014]]
 +
 
 +
We will look at our own genomes again after the practical.
 +
 
 +
 
 +
=== Return to SeqShop: Ancestry On Your Own Genome, December 2014 ===
 +
Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_December_2014#Checking_if_Pileup_finished]]
 +
 
 +
=== Reviewing Indel Results ===
 +
Set these values.  Also, be sure to specify your sample name (or NA12878) instead of SampleXX
 +
export SAMPLE=SampleXX
 +
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
 +
 
 
Look in the output directory
 
Look in the output directory
 
  ls ~/$SAMPLE/output
 
  ls ~/$SAMPLE/output
Line 311: Line 358:     
  $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz  
 
  $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz  
 +
 
       no. Indels                        :    588566
 
       no. Indels                        :    588566
 
           2 alleles (ins/del)            :          588566 (0.87) [273261/315305]
 
           2 alleles (ins/del)            :          588566 (0.87) [273261/315305]
Line 331: Line 379:  
Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias.
 
Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias.
   −
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC>0&&INFO.AB<0.7&&INFO.AB>0.3"
+
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC>0&&INFO.AB<0.7&&INFO.AB>0.3"
 +
 
 
       no. Indels                        :    490965
 
       no. Indels                        :    490965
 
           2 alleles (ins/del)            :          490965 (0.92) [235254/255711]
 
           2 alleles (ins/del)            :          490965 (0.92) [235254/255711]
Line 337: Line 386:  
The insertion deletion ratio increases from 0.91 to 0.92.   
 
The insertion deletion ratio increases from 0.91 to 0.92.   
   −
</div>
+
 
</div>
+
=== Friday: Reviewing SNPCALL Results ===
 +
Look in the output directory
 +
ls ~/$SAMPLE/output
 +
 
 +
Look at the vcfs:
 +
ls ~/$SAMPLE/output/vcfs
 +
 
 +
=== Friday : More SNP Analysis ===
 +
 
 +
==== Environmental Variables ====
 +
 
 +
If you didn't set the environmental variable, you can set it again
 +
 
 +
source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 +
export SAMPLE=SampleXX (MAKE SURE TO CHANGE XX to your number or use NA12878 instead)
 +
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
 +
 
 +
In addition, set another environmental variable for locating the binaries for custom analysis
 +
 
 +
export HK=/net/seqshop-server/home/hmkang/seqshop/bin
 +
 
 +
==== Annotation / Lookup against dbSNP ====
 +
 
 +
If you want to add rsIDs to your variant files, you can do this by running the following command
 +
 
 +
$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz
 +
 +
If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs
 +
 
 +
$HK/run-command-wgs --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6
 +
 
 +
Looking up SNPs by rsID is possible by (for example)
 +
$HK/vcf-lookup-rsid --vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --sepchr --rs rs17766217
 +
* Be sure to look at the QUAL & your sample's PL, and not just the GL field.  Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0.  That means you probably didn't have any copies, so your GT may not be correct/is unknown.
 +
 
 +
If you want to browse the rsIDs of known GWAS SNPs, you can do this by
 +
cut -f 1,8,22 $HK/../data/gwascatalog/gwascatalog.txt | less
 +
 
 +
==== Annotating your genome ====
 +
 
 +
You can annotate your genome using EPACTS software packages. Individual chromosome can be annotated by running.
 +
$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz
 +
 
 +
Or you can run multiple chromosomes in parallel in one command
 +
$HK/run-command-wgs --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6
 +
 +
==== Extracting only exonic SNPs ====
 +
 
 +
If you want to look at the exonic SNPs, you can extract using the following command
 +
$HK/run-command-wgs --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6
 +
 
 +
And they can be combined as follows
 +
(zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
 +
 
 +
==== Exonic Variants NOT found by 1000G ====
 +
 
 +
If you are interested in rare variants that are not identified by 1000G, you can extract them by running
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | less
 +
 +
For example,
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 +
will give you the counts of variants, separate by the filtering results
 +
 
 +
* Q1. How manny novel silent, missense, and nonsense SNPs are found? Is that too few, too small, or just about right?
 +
* Q2. Looking at each functional category, which functional categories has largest fraction of SNPs failed filter? Why do you think it is?
 +
* Q3. Can you exclude the sites that are also in dbSNP, and count how many nonsense variants are left?
 +
 
 +
 
 +
To also exclude those in dbsnp:
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | grep -v rs| perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 
 +
Exclude dbsnp and look at Stop_Gain variants
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" |grep -v rs | perl -lane 'print "$_" if ( /ANNO=Stop_Gain/)' |grep -w PASS
 +
 
 +
Want to see this from the BAM file?  Use samtools tview:
 +
$GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $GC/gotcloud.ref/human.g1k.v37.fa
 +
Use 'g' & enter the Chr:Pos
 +
* Some patterns may indicate not real variants.
 +
 
 +
If you want to know predicted functional significance of a particular variant, you can search by
 +
 
 +
$HK/tabix $HK/../data/CADD/whole_genome_SNVs.tsv.gz [chr]:[pos] | head -3
 +
 +
The phred score at the last column quantifies the degree of functional significance
      −
<div class="mw-collapsible mw-collapsed" style="width:500px">
  −
== Friday: Reviewing SNPCALL Results ==
  −
<div class="mw-collapsible-content">
      
</div>
 
</div>
 
</div>
 
</div>

Navigation menu