Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 94: Line 94:  
exit PuTTY
 
exit PuTTY
   −
=== Tuesday FEEDBACK! ===
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
   
</div>
 
</div>
 
</div>
 
</div>
    
<div class="mw-collapsible mw-collapsed" style="width:500px">
 
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
 
== Wednesday ==
 
== Wednesday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 233: Line 230:  
exit PuTTY
 
exit PuTTY
   −
==== Wednesday FEEDBACK! ====
  −
Please provide feedback on the lectures/tutorials from today:
  −
  −
https://docs.google.com/a/umich.edu/forms/d/1CCHL9ODPsw4jX4hj0kGo6AMHwT4Gam0IpKNRnIR9yMk/viewform
      
</div>
 
</div>
Line 243: Line 236:  
</div>
 
</div>
   −
<div class="mw-collapsible" style="width:500px">
+
<div class="mw-collapsible mw-collapsed" style="width:500px">
 +
 
 
== Thursday ==
 
== Thursday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 259: Line 253:  
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
 
If yes, how long did SNP calling take?  Look at the log message - time in seconds.
   −
If no, KILL it.  We will start it back running again after the tutorial today.
+
If no, are you running on seqshop-server?  If so, KILL it.  ssh -X to one of the seqshop machines and run on there.
 
  Ctrl-c
 
  Ctrl-c
   Line 302: Line 296:  
</div>
 
</div>
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
<div class="mw-collapsible" style="width:1000px">
    
== Friday ==
 
== Friday ==
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
  −
== Friday: Reviewing Indel Results ==
  −
<div class="mw-collapsible-content">
  −
=== Logging Back in to Check Jobs ===
     −
;How do you log back into screen tomorrow?
+
=== SeqShop: Ancestry On Your Own Genome, December 2014 ===
  screen -r
+
[[SeqShop: Ancestry On Your Own Genome, December 2014]]
This will resume an already running screen.
+
 
 +
 
 +
=== Association Analysis Tutorial ===
 +
Now we are going to run the Association Analysis Practical
 +
 
 +
Please go to: [[SeqShop: Genetic Association Analysis Practical, December 2014]]
 +
 
 +
We will look at our own genomes again after the practical.
 +
 
 +
 
 +
=== Return to SeqShop: Ancestry On Your Own Genome, December 2014 ===
 +
Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_December_2014#Checking_if_Pileup_finished]]
 +
 
 +
=== Reviewing Indel Results ===
 +
Set these values.  Also, be sure to specify your sample name (or NA12878) instead of SampleXX
 +
export SAMPLE=SampleXX
 +
  source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
   −
=== Looking at INDEL results ===
   
Look in the output directory
 
Look in the output directory
 
  ls ~/$SAMPLE/output
 
  ls ~/$SAMPLE/output
Line 381: Line 386:  
The insertion deletion ratio increases from 0.91 to 0.92.   
 
The insertion deletion ratio increases from 0.91 to 0.92.   
   −
</div>
  −
</div>
      +
=== Friday: Reviewing SNPCALL Results ===
 +
Look in the output directory
 +
ls ~/$SAMPLE/output
 +
 +
Look at the vcfs:
 +
ls ~/$SAMPLE/output/vcfs
 +
 +
=== Friday : More SNP Analysis ===
 +
 +
==== Environmental Variables ====
 +
 +
If you didn't set the environmental variable, you can set it again
 +
 +
source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 +
export SAMPLE=SampleXX (MAKE SURE TO CHANGE XX to your number or use NA12878 instead)
 +
source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt
 +
 +
In addition, set another environmental variable for locating the binaries for custom analysis
 +
 +
export HK=/net/seqshop-server/home/hmkang/seqshop/bin
 +
 +
==== Annotation / Lookup against dbSNP ====
 +
 +
If you want to add rsIDs to your variant files, you can do this by running the following command
 +
 +
$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz
 +
 +
If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs
 +
 +
$HK/run-command-wgs --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6
 +
 +
Looking up SNPs by rsID is possible by (for example)
 +
$HK/vcf-lookup-rsid --vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --sepchr --rs rs17766217
 +
* Be sure to look at the QUAL & your sample's PL, and not just the GL field.  Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0.  That means you probably didn't have any copies, so your GT may not be correct/is unknown.
 +
 +
If you want to browse the rsIDs of known GWAS SNPs, you can do this by
 +
cut -f 1,8,22 $HK/../data/gwascatalog/gwascatalog.txt | less
 +
 +
==== Annotating your genome ====
 +
 +
You can annotate your genome using EPACTS software packages. Individual chromosome can be annotated by running.
 +
$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz
 +
 +
Or you can run multiple chromosomes in parallel in one command
 +
$HK/run-command-wgs --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6
 +
 +
==== Extracting only exonic SNPs ====
 +
 +
If you want to look at the exonic SNPs, you can extract using the following command
 +
$HK/run-command-wgs --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6
 +
 +
And they can be combined as follows
 +
(zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz
 +
 +
==== Exonic Variants NOT found by 1000G ====
 +
 +
If you are interested in rare variants that are not identified by 1000G, you can extract them by running
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | less
 +
 +
For example,
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 +
will give you the counts of variants, separate by the filtering results
 +
 +
* Q1. How manny novel silent, missense, and nonsense SNPs are found? Is that too few, too small, or just about right?
 +
* Q2. Looking at each functional category, which functional categories has largest fraction of SNPs failed filter? Why do you think it is?
 +
* Q3. Can you exclude the sites that are also in dbSNP, and count how many nonsense variants are left?
 +
 +
 +
To also exclude those in dbsnp:
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | grep -v rs| perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c
 +
 +
Exclude dbsnp and look at Stop_Gain variants
 +
zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" |grep -v rs | perl -lane 'print "$_" if ( /ANNO=Stop_Gain/)' |grep -w PASS
 +
 +
Want to see this from the BAM file?  Use samtools tview:
 +
$GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $GC/gotcloud.ref/human.g1k.v37.fa
 +
Use 'g' & enter the Chr:Pos
 +
* Some patterns may indicate not real variants.
 +
 +
If you want to know predicted functional significance of a particular variant, you can search by
   −
<div class="mw-collapsible mw-collapsed" style="width:500px">
+
$HK/tabix $HK/../data/CADD/whole_genome_SNVs.tsv.gz [chr]:[pos] | head -3
 +
 +
The phred score at the last column quantifies the degree of functional significance
   −
== Friday: Reviewing SNPCALL Results ==
  −
<div class="mw-collapsible-content">
     −
</div>
  −
</div>
      
</div>
 
</div>
 
</div>
 
</div>

Navigation menu