Line 1: |
Line 1: |
| __TOC__ | | __TOC__ |
− | <div class="mw-collapsible mw-collapsed" style="width:500px"> | + | <div class="mw-collapsible" style="width:500px"> |
| ''Login instructions for seqshop-server'' | | ''Login instructions for seqshop-server'' |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
Line 7: |
Line 7: |
| </div> | | </div> |
| | | |
− | == Setup == | + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
− | The snpcall pipeline will run overnight, but you'll want to log out. | + | == Tuesday - Start INDEL Calling == |
| + | <div class="mw-collapsible-content"> |
| + | === Setup === |
| + | The indel & snpcall pipelines will run overnight, but you'll want to log out. |
| ; How do I leave something running on the server even if I log out? | | ; How do I leave something running on the server even if I log out? |
| : One solution is screen! | | : One solution is screen! |
Line 20: |
Line 23: |
| As it says, press <code>Space</code> or <code>Return</code>. | | As it says, press <code>Space</code> or <code>Return</code>. |
| * It should now look basically the same as your normal command line. | | * It should now look basically the same as your normal command line. |
− |
| |
− | ; Scrolling problems when using screen?
| |
− | : If you want to scroll and screen doesn't scroll like you normally would?
| |
− | :* Type Ctrl-a Esc and you should be able to scroll up with your mouse wheel
| |
− | :** Or at least that is what I do from my Linux machine - (sorry I'm typing this up/testing these commands from Linux and not windows, so can't test it out)
| |
− |
| |
| | | |
| Set these values. Also, be sure to specify your sample name instead of SampleXX | | Set these values. Also, be sure to specify your sample name instead of SampleXX |
Line 35: |
Line 32: |
| Shows you: | | Shows you: |
| :<code>export GC=/net/seqshop-server/home/mktrost/seqshop/gotcloud</code> | | :<code>export GC=/net/seqshop-server/home/mktrost/seqshop/gotcloud</code> |
| + | :<code>export SS=/net/seqshop-server/home/mktrost/seqshop/singleSample</code> |
| :<code>export OUT=~/$SAMPLE/output</code> | | :<code>export OUT=~/$SAMPLE/output</code> |
| | | |
− | <div class="mw-collapsible" style="width:500px">
| |
− | == SnpCall Step 1 (Day 1): Start SnpCall ==
| |
− | <div class="mw-collapsible-content">
| |
| === List of BAMs === | | === List of BAMs === |
| The list of BAMs has already been created (just 1 BAM, your sample). | | The list of BAMs has already been created (just 1 BAM, your sample). |
Line 49: |
Line 44: |
| * Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path). | | * Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path). |
| | | |
− | === Configuring SnpCall === | + | === Configuring Indel === |
− | | + | No special Configuration settings for Indel calling |
| cat ~/$SAMPLE/gotcloud.conf | | cat ~/$SAMPLE/gotcloud.conf |
| | | |
Line 77: |
Line 72: |
| </pre> | | </pre> |
| | | |
− | ==== Configuration Updates ====
| + | === Running Indel Calling === |
− | In order to complete SnpCall overnight, we are going to tell GotCloud to only call SNPs for the EXOME regions.
| + | Run GotCloud indel with 6 jobs running in parallel |
− | | |
− | Edit
| |
− | nedit $SAMPLE/gotcloud.conf&
| |
− | * Or you can use <code>vi</code> or <code>emacs</code> or your favorite editor
| |
− | | |
− | Specify the target region:
| |
− | UNIFORM_TARGET_BED = /net/seqshop-server/home/mktrost/seqshop/singleSample/20130108.exome.targets.bed
| |
− | * See [http://genome.sph.umich.edu/wiki/GotCloud:_Variant_Calling_Pipeline#Targeted.2FExome_Sequencing_Settings|Targeted/Exome Sequencing Settings] for more information on the GotCloud configuration settings for running Targeted/Exome runs.
| |
− | | |
− | === Running SnpCall === | |
− | Run GotCloud snpcall with 6 jobs running in parallel | |
| * Why 6? | | * Why 6? |
| ** You want to run as many as you can. | | ** You want to run as many as you can. |
| ** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine | | ** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine |
− | ${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6 | + | ${GC}/gotcloud indel --conf $SAMPLE/gotcloud.conf --numjobs 6 |
| * Only need the configuration & number of threads, rest is specified within the configuration. | | * Only need the configuration & number of threads, rest is specified within the configuration. |
| | | |
Line 110: |
Line 94: |
| exit PuTTY | | exit PuTTY |
| | | |
− | === Tuesday FEEDBACK! ===
| + | </div> |
− | Please provide feedback on the lectures/tutorials from today:
| + | </div> |
| | | |
− | https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
| + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| | | |
− | </div>
| + | == Wednesday == |
− | </div> | + | <div class="mw-collapsible-content"> |
| | | |
| <div class="mw-collapsible" style="width:500px"> | | <div class="mw-collapsible" style="width:500px"> |
− | == SnpCall Step 2 (Day 2): Checking SnpCall == | + | === Checking if INDEL Completed === |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | === Logging Back in to Check Jobs === | + | ==== Logging Back in to Check Jobs ==== |
| | | |
− | ;How do you log back into screen tomorrow? | + | ;How do you log back into screen? |
| screen -r | | screen -r |
| This will resume an already running screen. | | This will resume an already running screen. |
| + | |
| + | Verify you got a "completed successfully" message. |
| + | |
| + | How long did INDEL calling take? Look at the log message - time in seconds. |
| + | |
| + | === Need a BAM Index File? === |
| + | Check your BAM files |
| + | ls $SAMPLE/output/bams/*bam |
| + | |
| + | Check your BAI files |
| + | ls $SAMPLE/output/bams/*bai |
| + | |
| + | Do you have the same number of BAM & BAI files? |
| + | * If so, then you are good to go. |
| + | * If you have more BAI's than BAM's - oops!!! (LET ME KNOW) |
| + | * If you have more BAM's than BAI's, we need to generate the index: |
| + | *: <pre>$GC/bin/samtools index $SAMPLE/output/bams/YOUR_BAM_WITH_MISSING_BAI.bam</pre> |
| + | |
| + | |
| + | === Detach From screen=== |
| + | Detach from screen. We will resume it again later when we start SNPCall. |
| + | Ctrl-a d |
| | | |
| </div> | | </div> |
| </div> | | </div> |
| + | |
| + | === Structural Variation Tutorial === |
| + | Now we are going to run the Structural Variation Practical |
| + | |
| + | Please go to: [[SeqShop: Analysis of Structural Variation Practical, December 2014]] |
| + | |
| + | We will Start SNP Calling after the practical. |
| + | |
| | | |
| <div class="mw-collapsible" style="width:500px"> | | <div class="mw-collapsible" style="width:500px"> |
− | == Indel Step 1 (Day 1): Start Indel Calling == | + | |
| + | === Start SNP Calling === |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | === List of BAMs === | + | ==== Resume screen ==== |
| + | |
| + | ;How do you log back into screen? |
| + | screen -r |
| + | This will resume an already running screen. |
| + | |
| + | Your screen session still has your environment variables set, so you do not need to reset them. |
| + | |
| + | ==== List of BAMs ==== |
| The list of BAMs has already been created (just 1 BAM, your sample). | | The list of BAMs has already been created (just 1 BAM, your sample). |
| * But it is simply SAMPLE\tBAM_name, so easy to figure out | | * But it is simply SAMPLE\tBAM_name, so easy to figure out |
Line 142: |
Line 165: |
| * Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path). | | * Relative path, so assumes running from your home directory (I prefer absolute paths, but for simplicity of the workshop, we just use relative path). |
| | | |
− | === Configuring Indel === | + | ==== GotCloud SnpCall Configuration ==== |
− | No special Configuration settings for Indel calling
| + | |
| cat ~/$SAMPLE/gotcloud.conf | | cat ~/$SAMPLE/gotcloud.conf |
| | | |
Line 170: |
Line 193: |
| </pre> | | </pre> |
| | | |
− | === Running Indel Calling === | + | ===== Configuration Updates ===== |
− | Run GotCloud indel with 6 jobs running in parallel | + | '''No configuration updates from the original settings are necessary.''' |
| + | * Originally you were going to update the configuration to do exome calling only, but we have decided to do whole genome |
| + | ** If it doesn't finish tonight, we will kill it at the tutorial in the morning, and take advantage of the GotCloud restart capability after the tutorial tomorrow. |
| + | |
| + | If you started making updates to your configuration yesterday, you can try to make it match above (mktrost is actually the path you want in the paths above). |
| + | * The only difference from above should be your OUT_DIR. |
| + | Check it with (leave as ~mktrost - you are comparing to my NA12878 conf file): |
| + | diff $SAMPLE/gotcloud.conf ~mktrost/NA12878/gotcloud.conf |
| + | |
| + | '''If you see differences other than OUT_DIR, you can modify your file using nedit (or your favorite editor as you used yesterday).''' |
| + | * '''Or you can copy my file and just change OUT_DIR:''' |
| + | cp ~mktrost/NA12878/gotcloud.conf $SAMPLE/gotcloud.conf |
| + | nedit $SAMPLE/gotcloud.conf& |
| + | * Edit OUT_DIR |
| + | OUT_DIR = Sample##/output |
| + | * Save, and close |
| + | |
| + | ==== Running SnpCall ==== |
| + | Run GotCloud snpcall with 6 jobs running in parallel |
| * Why 6? | | * Why 6? |
| ** You want to run as many as you can. | | ** You want to run as many as you can. |
| ** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine | | ** 5 of you on the machine - 5*6 = 30 jobs will be running in parallel on that machine |
− | ${GC}/gotcloud indel --conf $SAMPLE/gotcloud.conf --numjobs 6 | + | ${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6 |
| * Only need the configuration & number of threads, rest is specified within the configuration. | | * Only need the configuration & number of threads, rest is specified within the configuration. |
| | | |
| This will run overnight. We will check if it completed at the practical in the morning. | | This will run overnight. We will check if it completed at the practical in the morning. |
| | | |
− | === Log Out === | + | ==== Log Out ==== |
| ;Want to log out and leave your job running? | | ;Want to log out and leave your job running? |
| In the screen window, type: | | In the screen window, type: |
Line 187: |
Line 228: |
| * This will "detach" from your screen session while your alignment continues to run. | | * This will "detach" from your screen session while your alignment continues to run. |
| | | |
− | If you have not detached from screen: | + | exit PuTTY |
| + | |
| + | |
| + | </div> |
| + | </div> |
| + | </div> |
| + | </div> |
| + | |
| + | <div class="mw-collapsible mw-collapsed" style="width:500px"> |
| + | |
| + | == Thursday == |
| + | <div class="mw-collapsible-content"> |
| + | === Checking if SnpCall Completed === |
| + | ==== Logging Back in to Check Jobs ==== |
| + | |
| + | ;How do you log back into screen? |
| + | screen -r |
| + | This will resume an already running screen. |
| + | |
| + | ==== Checking Completion ==== |
| + | |
| + | Did you get a "completed successfully" message? |
| + | |
| + | If yes, how long did SNP calling take? Look at the log message - time in seconds. |
| + | |
| + | If no, are you running on seqshop-server? If so, KILL it. ssh -X to one of the seqshop machines and run on there. |
| + | Ctrl-c |
| + | |
| + | Detach from screen. We will resume it again later when we restart SNPCall (if necessary). |
| + | Ctrl-a d |
| + | |
| + | === Ancestry Tutorial === |
| + | Now we are going to run the Structural Variation Practical |
| + | |
| + | Please go to: [[SeqShop: Estimates of Genetic Ancestry Practical, December 2014]] |
| + | |
| + | We will Resume SNP Calling (if necessary) after the practical. |
| + | |
| + | |
| + | === Restart SnpCall === |
| + | ==== Resume screen ==== |
| + | |
| + | ;How do you log back into screen? |
| + | screen -r |
| + | This will resume an already running screen. |
| + | |
| + | Your screen session still has your environment variables set, so you do not need to reset them. |
| + | |
| + | ==== Running SnpCall ==== |
| + | Run GotCloud snpcall with 6 jobs running in parallel |
| + | ${GC}/gotcloud snpcall --conf $SAMPLE/gotcloud.conf --numjobs 6 |
| + | * GotCloud will pick up after the last completed step from before. |
| + | |
| + | This will run overnight. We will check if it completed at the practical in the morning. |
| + | |
| + | ==== Log Out ==== |
| + | ;Want to log out and leave your job running? |
| + | In the screen window, type: |
| Ctrl-a d | | Ctrl-a d |
| + | (Hold down Ctrl and type 'a', let go of both and type 'd') |
| + | * This will "detach" from your screen session while your alignment continues to run. |
| | | |
| exit PuTTY | | exit PuTTY |
| | | |
− | === Tuesday FEEDBACK! ===
| |
− | Please provide feedback on the lectures/tutorials from today:
| |
| | | |
− | https://docs.google.com/forms/d/1n8xYxvsOq-HsabpDfGcHvwD84BYIRDx8_b-H5N3d-D8/viewform
| |
| </div> | | </div> |
| </div> | | </div> |
| | | |
− | <div class="mw-collapsible" style="width:500px"> | + | <div class="mw-collapsible" style="width:1000px"> |
− | == Indel Step 2 (Day 2): Checking Indel == | + | |
| + | == Friday == |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | === Logging Back in to Check Jobs ===
| |
| | | |
− | ;How do you log back into screen tomorrow?
| |
− | screen -r
| |
− | This will resume an already running screen.
| |
| | | |
− | === Looking at INDEL results === | + | === SeqShop: Ancestry On Your Own Genome, December 2014 === |
| + | [[SeqShop: Ancestry On Your Own Genome, December 2014]] |
| + | |
| + | |
| + | === Association Analysis Tutorial === |
| + | Now we are going to run the Association Analysis Practical |
| + | |
| + | Please go to: [[SeqShop: Genetic Association Analysis Practical, December 2014]] |
| + | |
| + | We will look at our own genomes again after the practical. |
| + | |
| + | |
| + | === Return to SeqShop: Ancestry On Your Own Genome, December 2014 === |
| + | Return to [[SeqShop:_Ancestry_On_Your_Own_Genome,_December_2014#Checking_if_Pileup_finished]] |
| + | |
| + | === Reviewing Indel Results === |
| + | Set these values. Also, be sure to specify your sample name (or NA12878) instead of SampleXX |
| + | export SAMPLE=SampleXX |
| + | source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt |
| + | |
| Look in the output directory | | Look in the output directory |
| ls ~/$SAMPLE/output | | ls ~/$SAMPLE/output |
Line 245: |
Line 358: |
| | | |
| $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz | | $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz |
| + | |
| no. Indels : 588566 | | no. Indels : 588566 |
| 2 alleles (ins/del) : 588566 (0.87) [273261/315305] | | 2 alleles (ins/del) : 588566 (0.87) [273261/315305] |
Line 265: |
Line 379: |
| Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias. | | Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias. |
| | | |
− | $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB<0.7&&INFO.AB>0.3" | + | $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC>0&&INFO.AB<0.7&&INFO.AB>0.3" |
| + | |
| no. Indels : 490965 | | no. Indels : 490965 |
| 2 alleles (ins/del) : 490965 (0.92) [235254/255711] | | 2 alleles (ins/del) : 490965 (0.92) [235254/255711] |
| | | |
| The insertion deletion ratio increases from 0.91 to 0.92. | | The insertion deletion ratio increases from 0.91 to 0.92. |
| + | |
| + | |
| + | === Friday: Reviewing SNPCALL Results === |
| + | Look in the output directory |
| + | ls ~/$SAMPLE/output |
| + | |
| + | Look at the vcfs: |
| + | ls ~/$SAMPLE/output/vcfs |
| + | |
| + | === Friday : More SNP Analysis === |
| + | |
| + | ==== Environmental Variables ==== |
| + | |
| + | If you didn't set the environmental variable, you can set it again |
| + | |
| + | source /net/seqshop-server/home/mktrost/seqshop/setup.txt |
| + | export SAMPLE=SampleXX (MAKE SURE TO CHANGE XX to your number or use NA12878 instead) |
| + | source /net/seqshop-server/home/mktrost/seqshop/setupSS.txt |
| + | |
| + | In addition, set another environmental variable for locating the binaries for custom analysis |
| + | |
| + | export HK=/net/seqshop-server/home/hmkang/seqshop/bin |
| + | |
| + | ==== Annotation / Lookup against dbSNP ==== |
| + | |
| + | If you want to add rsIDs to your variant files, you can do this by running the following command |
| + | |
| + | $HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz |
| + | |
| + | If you want to run this command across all chromosomes in parallel, you can use the special script run-command-wgs |
| + | |
| + | $HK/run-command-wgs --cmd "$HK/vcf-add-rsid -vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --db $HK/../data/dbSNP.b138/dbsnp_138.b37.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz" --numjobs 6 |
| + | |
| + | Looking up SNPs by rsID is possible by (for example) |
| + | $HK/vcf-lookup-rsid --vcf $OUT/vcfs/chr1/chr1.filtered.vcf.gz --sepchr --rs rs17766217 |
| + | * Be sure to look at the QUAL & your sample's PL, and not just the GL field. Check if QUAL is 0 or PL is 0,0,0 - NS is also probably 0; DP is probably 0. That means you probably didn't have any copies, so your GT may not be correct/is unknown. |
| + | |
| + | If you want to browse the rsIDs of known GWAS SNPs, you can do this by |
| + | cut -f 1,8,22 $HK/../data/gwascatalog/gwascatalog.txt | less |
| + | |
| + | ==== Annotating your genome ==== |
| + | |
| + | You can annotate your genome using EPACTS software packages. Individual chromosome can be annotated by running. |
| + | $EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz |
| + | |
| + | Or you can run multiple chromosomes in parallel in one command |
| + | $HK/run-command-wgs --cmd "$EPACTS/bin/epacts anno --in $OUT/vcfs/chr1/chr1.filtered.rsid.vcf.gz --out $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz" --numjobs 6 |
| + | |
| + | ==== Extracting only exonic SNPs ==== |
| + | |
| + | If you want to look at the exonic SNPs, you can extract using the following command |
| + | $HK/run-command-wgs --cmd "($HK/tabix -H $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz; zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.vcf.gz | grep Exon;)| $HK/bgzip -c > $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz" --numjobs 6 |
| + | |
| + | And they can be combined as follows |
| + | (zcat $OUT/vcfs/chr1/chr1.filtered.rsid.anno.exon.vcf.gz; zcat $OUT/vcfs/chr[2-9]/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chr??/chr*.filtered.rsid.anno.exon.vcf.gz $OUT/vcfs/chrX/chrX.filtered.rsid.anno.exon.vcf.gz | grep -v ^#) | $HK/bgzip -c > $OUT/wgs.filtered.rsid.anno.exon.vcf.gz |
| + | |
| + | ==== Exonic Variants NOT found by 1000G ==== |
| + | |
| + | If you are interested in rare variants that are not identified by 1000G, you can extract them by running |
| + | zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | less |
| + | |
| + | For example, |
| + | zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c |
| + | |
| + | will give you the counts of variants, separate by the filtering results |
| + | |
| + | * Q1. How manny novel silent, missense, and nonsense SNPs are found? Is that too few, too small, or just about right? |
| + | * Q2. Looking at each functional category, which functional categories has largest fraction of SNPs failed filter? Why do you think it is? |
| + | * Q3. Can you exclude the sites that are also in dbSNP, and count how many nonsense variants are left? |
| + | |
| + | |
| + | To also exclude those in dbsnp: |
| + | zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" | grep -v rs| perl -lane 'print "$1\t$F[6]" if ( /ANNO=([^;:]+)/)' | sort | uniq -c |
| + | |
| + | Exclude dbsnp and look at Stop_Gain variants |
| + | zcat $OUT/wgs.filtered.rsid.anno.exon.vcf.gz | grep "EXTFILTER=NA,NA" | grep -v -w "0/0" |grep -v rs | perl -lane 'print "$_" if ( /ANNO=Stop_Gain/)' |grep -w PASS |
| + | |
| + | Want to see this from the BAM file? Use samtools tview: |
| + | $GC/bin/samtools tview $SAMPLE/output/bams/$SAMPLE.recal.bam $GC/gotcloud.ref/human.g1k.v37.fa |
| + | Use 'g' & enter the Chr:Pos |
| + | * Some patterns may indicate not real variants. |
| + | |
| + | If you want to know predicted functional significance of a particular variant, you can search by |
| + | |
| + | $HK/tabix $HK/../data/CADD/whole_genome_SNVs.tsv.gz [chr]:[pos] | head -3 |
| + | |
| + | The phred score at the last column quantifies the degree of functional significance |
| + | |
| + | |
| | | |
| </div> | | </div> |
| </div> | | </div> |