Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 94: Line 94:  
exit PuTTY
 
exit PuTTY
   −
=== Tuesday FEEDBACK! ===
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
   
</div>
 
</div>
 
</div>
 
</div>
    
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
 
== Wednesday ==
 
== Wednesday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 233: Line 230:  
exit PuTTY
 
exit PuTTY
   −
==== Wednesday FEEDBACK! ====
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/a/umich.edu/forms/d/1CCHL9ODPsw4jX4hj0kGo6AMHwT4Gam0IpKNRnIR9yMk/viewform
      
</div>
 
</div>
Line 243: Line 236:  
</div>
 
</div>
   −
<div class="mw-collapsible" style="width:500px">
+
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
 
 
== Thursday ==
 
== Thursday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 259: Line 253:  
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
 
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
   −
If no, KILL it.  We will start it back running again after the tutorial today.
+
If no, are you running on seqshop-server?  If so, KILL it.  ssh -X to one of the seqshop machines and run on there.
 
  Ctrl-c
 
  Ctrl-c
    
Detach from screen.  We will resume it again later when we restart SNPCall (if necessary).
 
Detach from screen.  We will resume it again later when we restart SNPCall (if necessary).
 
  Ctrl-a d
 
  Ctrl-a d
      
=== Ancestry Tutorial ===
 
=== Ancestry Tutorial ===
Line 303: Line 296:  
</div>
 
</div>
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible" style="width:1000px">
    
== Friday ==
 
== Friday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
  −
=== Checking if SnpCall Completed ===
  −
==== Logging Back in to Check Jobs ====
     −
;How do you log back into screen?
+
=== SeqShop: Ancestry On Your Own Genome, December 2014 ===
screen -r
+
[[SeqShop: Ancestry On Your Own Genome, December 2014]]
This will resume an already running screen.
  −
 
  −
==== Checking Completion ====
  −
 
  −
Did you get a "completed successfully" message?
  −
 
  −
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
  −
 
  −
Detach from screen.  We will resume it again later to review results.
  −
Ctrl-a d
        Line 334: Line 314:       −
=== Friday: Reviewing Indel Results ===
+
=== Return to SeqShop: Ancestry On Your Own Genome, December 2014 ===
<div class="mw-collapsible-content">
+
Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_December_2014#Checking_if_Pileup_finished]]
=== Logging Back in to Check Jobs ===
     −
;How do you log back into screen tomorrow?
+
=== Reviewing Indel Results ===
  screen -r
+
Set these values.  Also, be sure to specify your sample name (or NA12878) instead of SampleXX
This will resume an already running screen.
+
export SAMPLE=SampleXX
 +
  source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
   −
=== Looking at INDEL results ===
   
Look in the output directory
 
Look in the output directory
 
  ls ~/$SAMPLE/output
 
  ls ~/$SAMPLE/output
Line 407: Line 386:  
The insertion deletion ratio increases from 0.91 to 0.92.   
 
The insertion deletion ratio increases from 0.91 to 0.92.   
   −
</div>
  −
</div>
      +
=== Friday: Reviewing SNPCALL Results ===
 +
Look in the output directory
 +
ls ~/$SAMPLE/output
 +
 +
Look at the vcfs:
 +
ls ~/$SAMPLE/output/vcfs
 +
 +
=== Friday : More SNP Analysis ===
 +
 +
==== Environmental Variables ====
 +
 +
If you didn't set the environmental variable, you can set it again
 +
 +
source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 +
export SAMPLE=SampleXX (MAKE SURE TO CHANGE XX to your number or use NA12878 instead)
 +
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
 +
 +
In addition, set another environmental variable for locating the binaries for custom analysis
 +
 +
export HK=/net/seqshop-server/home/hmkang/seqshop/bin
 +
 +
==== Annotation / Lookup against dbSNP ====
 +
 +
If you want to add rsIDs to your variant files, you can do this by running the following command
 +
 +
$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz
 +
 +
If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs
 +
 +
$HK/run-command-wgs --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6
 +
 +
Looking up SNPs by rsID is possible by (for example)
 +
$HK/vcf-lookup-rsid --vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --sepchr --rs rs17766217
 +
* Be sure to look at the QUAL & your sample's PL, and not just the GL field.  Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0.  That means you probably didn't have any copies, so your GT may not be correct/is unknown.
 +
 +
If you want to browse the rsIDs of known GWAS SNPs, you can do this by
 +
cut -f 1,8,22 $HK/../data/gwascatalog/gwascatalog.txt | less
 +
 +
==== Annotating your genome ====
 +
 +
You can annotate your genome using EPACTS software packages. Individual chromosome can be annotated by running.
 +
$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz
 +
 +
Or you can run multiple chromosomes in parallel in one command
 +
$HK/run-command-wgs --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6
 +
 +
==== Extracting only exonic SNPs ====
 +
 +
If you want to look at the exonic SNPs, you can extract using the following command
 +
$HK/run-command-wgs --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6
 +
 +
And they can be combined as follows
 +
(zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
 +
 +
==== Exonic Variants NOT found by 1000G ====
 +
 +
If you are interested in rare variants that are not identified by 1000G, you can extract them by running
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | less
 +
 +
For example,
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 +
will give you the counts of variants, separate by the filtering results
 +
 +
* Q1. How manny novel silent, missense, and nonsense SNPs are found? Is that too few, too small, or just about right?
 +
* Q2. Looking at each functional category, which functional categories has largest fraction of SNPs failed filter? Why do you think it is?
 +
* Q3. Can you exclude the sites that are also in dbSNP, and count how many nonsense variants are left?
 +
 +
 +
To also exclude those in dbsnp:
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | grep -v rs| perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 +
Exclude dbsnp and look at Stop_Gain variants
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" |grep -v rs | perl -lane 'print "$_" if ( /ANNO=Stop_Gain/)' |grep -w PASS
 +
 +
Want to see this from the BAM file?  Use samtools tview:
 +
$GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $GC/gotcloud.ref/human.g1k.v37.fa
 +
Use 'g' & enter the Chr:Pos
 +
* Some patterns may indicate not real variants.
 +
 +
If you want to know predicted functional significance of a particular variant, you can search by
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
$HK/tabix $HK/../data/CADD/whole_genome_SNVs.tsv.gz [chr]:[pos] | head -3
 +
 +
The phred score at the last column quantifies the degree of functional significance
   −
=== Friday: Reviewing SNPCALL Results ===
  −
<div class="mw-collapsible-content">
     −
</div>
  −
</div>
      
</div>
 
</div>
 
</div>
 
</div>

Navigation menu