Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:  
==Introduction==
 
==Introduction==
See the [[Media:SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial.
+
Main Workshop wiki page: [[SeqShop: December 2014]]
    +
See the [[Media:Dec2014 SeqShop - GotCloud snpcall.pdf|introductory slides]] for an intro to this tutorial.
    
== Goals of This Session ==
 
== Goals of This Session ==
Line 12: Line 13:  
== Setup in person at the SeqShop Workshop ==
 
== Setup in person at the SeqShop Workshop ==
 
''This section is specifically for the SeqShop Workshop computers.''
 
''This section is specifically for the SeqShop Workshop computers.''
<div class="mw-collapsible" style="width:600px">
+
<div class="mw-collapsible mw-collapsed" style="width:600px">
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
''If you are not running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 27: Line 28:  
* Setup an output directory
 
* Setup an output directory
 
** It will leave your output directory from the previous tutorial in tact.
 
** It will leave your output directory from the previous tutorial in tact.
  source /home/mktrost/seqshop/setup.txt
+
  source /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
* You won't see any output after running <code>source</code>
 
* You won't see any output after running <code>source</code>
 
** It silently sets up your environment
 
** It silently sets up your environment
 
** If you want to view the detail of the setup, type
 
** If you want to view the detail of the setup, type
  less /home/mktrost/seqshop/setup.txt
+
  less /net/seqshop-server/home/mktrost/seqshop/setup.txt
 
and press 'q' to finish.
 
and press 'q' to finish.
   Line 45: Line 46:  
== Setup when running on your own outside of the SeqShop Workshop ==
 
== Setup when running on your own outside of the SeqShop Workshop ==
 
''This section is specifically for running on your own outside of the SeqShop Workshop.''
 
''This section is specifically for running on your own outside of the SeqShop Workshop.''
<div class="mw-collapsible mw-collapsed" style="width:600px">
+
<div class="mw-collapsible" style="width:600px">
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
''If you are running during the SeqShop Workshop, please skip this section.''
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
Line 52: Line 53:     
{{SeqShopRemoteEnv}}
 
{{SeqShopRemoteEnv}}
 +
</div>
 +
</div>
 +
    
== Examining GotCloud SnpCall Input files ==
 
== Examining GotCloud SnpCall Input files ==
=== Sequnce Alignment Files: BAM Files ===
+
=== Sequence Alignment Files: BAM Files ===
 
Per sample BAM files contain sequence reads that are mapped to positions in the genome.
 
Per sample BAM files contain sequence reads that are mapped to positions in the genome.
   Line 104: Line 108:  
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
<ul>
 
<ul>
<li>/home/YourUserName/out/bams/HG00640.recal.bam</li>
+
<li>/net/seqshop-server/home/YourUserName/out/bams/HG00640.recal.bam</li>
 
[[File:BamindexNew.png|500px]]
 
[[File:BamindexNew.png|500px]]
 
</div>
 
</div>
Line 131: Line 135:  
<li>That's ok, we will use the <code>--base_prefix ${SS}</code> command-line option to prefix the BAM paths</li>
 
<li>That's ok, we will use the <code>--base_prefix ${SS}</code> command-line option to prefix the BAM paths</li>
 
<li>Alternatively, we could have set BAM_PREFIX in <code>gotcloud.conf</code> to the path to the BAMs
 
<li>Alternatively, we could have set BAM_PREFIX in <code>gotcloud.conf</code> to the path to the BAMs
<pre>BAM_PREFIX = /home/username/seqshop/example</pre> </li>
+
<pre>BAM_PREFIX = /net/seqshop-server/home/mktrost/seqshop/example</pre> </li>
 
<ul>
 
<ul>
 
<li>NOTE: the conf file can't interpret ${SS} environment variables or '~', so you would have to specify the full path</li>
 
<li>NOTE: the conf file can't interpret ${SS} environment variables or '~', so you would have to specify the full path</li>
<li>We just used the command-line option for this tutorial since this path will vary by user.</li>
+
<li>We just used the command-line option for this tutorial since this path will vary by user when running outside the workshop.</li>
 
</ul>
 
</ul>
 
</div>
 
</div>
Line 181: Line 185:     
== Run GotCloud SnpCall ==
 
== Run GotCloud SnpCall ==
[[File:SnpcallDiagram.png|500px]]
+
[[File:SnpcallDiagramNew.png|500px]]
    
Now that we have all of our input files, we need just a simple command to run:
 
Now that we have all of our input files, we need just a simple command to run:
  ${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
+
* When running at home if you don't have 6 CPUs, reduce the --numjobs setting (it will take longer to run).
 +
  ${GC}/gotcloud snpcall --conf ${SS}/gotcloud.conf --numjobs 6 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
 
* <code>${GC}/gotcloud</code> runs GotCloud
 
* <code>${GC}/gotcloud</code> runs GotCloud
* <code>align</code> tells GotCloud you want to run the alignment pipeline.
+
* <code>snpcall</code> tells GotCloud you want to run the snpcall pipeline.
 
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
 
* <code>--conf</code> tells GotCloud the name of the configuration file to use.
 
** The configuration for this test was downloaded with the seqshop input files.
 
** The configuration for this test was downloaded with the seqshop input files.
Line 196: Line 201:  
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
 
** The Configuration file cannot read environment variables, so we need to tell GotCloud the path to the input files, ${SS}
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
 
** Alternatively, gotcloud.conf could be updated to specify the full paths
* <code>--out_dir</code> tells GotCloud where to write the output.
+
* <code>--outdir</code> tells GotCloud where to write the output.
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
 
** This could be specified in gotcloud.conf, but to allow you to use the ${OUT} to change the output location, it is specified on the command-line
   Line 202: Line 207:  
Curious if it started running properly?  Check out this screenshot:
 
Curious if it started running properly?  Check out this screenshot:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:SnpcallStart.png|750px]]
+
[[File:SnpcallStartNew.png|550px]]
 
</div>
 
</div>
 
</div>
 
</div>
This should take about 5 minutes to run.
+
This should take about 5-8 minutes to run.
* After about 4 minutes of running, GotCloud snpcall will output some text to the screen.  Don't worry, that is expected and is just output from some of the intermediate tools.
+
* It should end with a line like: <code>Commands finished in 402 secs with no errors reported</code>
* It should end with a line like: <code>Commands finished in 289 secs with no errors reported</code>
      
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
 
If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off.
Line 223: Line 227:  
* View Annotated Screenshot:
 
* View Annotated Screenshot:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:gcsnpcallOut.png|500px]]
+
[[File:gcsnpcallOutNew.png|500px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 263: Line 267:  
View Screenshot
 
View Screenshot
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:filterSum.png]]
+
[[File:filterSumNew.png|700px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 286: Line 290:  
<li>It failed SVM filter</li>
 
<li>It failed SVM filter</li>
 
</ul>
 
</ul>
[[File:SvmFilt.png|550px]]
+
[[File:SvmFiltNew.png|550px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 308: Line 312:  
* View annotated screenshot:
 
* View annotated screenshot:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:SvmFiltGL.png|550px]]
+
[[File:SvmFiltGLNew.png|550px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 358: Line 362:  
The GotCloud genotype refinement pipeline takes as input ${OUT}/split/chr22/chr22.filtered.PASS.vcf.gz (the VCF file of PASS'ing SNPs from snpcall).
 
The GotCloud genotype refinement pipeline takes as input ${OUT}/split/chr22/chr22.filtered.PASS.vcf.gz (the VCF file of PASS'ing SNPs from snpcall).
   −
The bam index and the configuration file we used for GotCloud snpcall will tell GotCloud genotype refinement everything it needs to know, so no new input files need to be prepared.
+
The bam list and the configuration file we used for GotCloud snpcall will tell GotCloud genotype refinement everything it needs to know, so no new input files need to be prepared.
    
Note: the configuration file overrides the THUNDER command to make it go faster than the default settings so the tutorial will run faster:
 
Note: the configuration file overrides the THUNDER command to make it go faster than the default settings so the tutorial will run faster:
Line 365: Line 369:  
=== Running GotCloud Genotype Refinement ===
 
=== Running GotCloud Genotype Refinement ===
 
Since everything is setup, just run the following command (very similar to snpcall).
 
Since everything is setup, just run the following command (very similar to snpcall).
  ${GC}/gotcloud ldrefine --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
+
  ${GC}/gotcloud ldrefine --conf ${SS}/gotcloud.conf --numjobs 6 --region 22:36000000-37000000 --base_prefix ${SS} --outdir ${OUT}
   −
* Beagle will take about 2-3 minutes to complete
+
* Beagle will take about 1-3 minutes to complete
* Thunder will automatically run and will take another 3-4 minutes
+
* Thunder will automatically run and will take another 2-4 minutes
    
<div class="mw-collapsible mw-collapsed" style="width:350px">
 
<div class="mw-collapsible mw-collapsed" style="width:350px">
 
When completed, it should look like this:
 
When completed, it should look like this:
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:GcldrefineOut.png]]
+
[[File:GcldrefineOutNew.png]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 390: Line 394:  
<li><code>thunder</code> directory : Thunder output</li>
 
<li><code>thunder</code> directory : Thunder output</li>
 
<li><code>umake.beagle.*</code> : Contain the configuration & steps used in GotCloud beagle</li>
 
<li><code>umake.beagle.*</code> : Contain the configuration & steps used in GotCloud beagle</li>
<li><code>umake.thunder.*</code> files : Contain the configuration & steps used in GotCloud thunder</li>
+
<li><code>umake.thunder.*</code> : Contain the configuration & steps used in GotCloud thunder</li>
 
</ul>
 
</ul>
 
</div>
 
</div>
Line 434: Line 438:  
<ul>
 
<ul>
 
<li>It is the first sample</li>
 
<li>It is the first sample</li>
<li><code>0|1</code>: Heterozygous</li>
+
<li><code>0|1</code>: Heterozygous (although low GQ - quality)</li>
 
<li><code>1|1</code>; Homozygous Alt (C)</li>
 
<li><code>1|1</code>; Homozygous Alt (C)</li>
 
</ul>
 
</ul>
Line 452: Line 456:  
Did you see a variant at the position?
 
Did you see a variant at the position?
   −
${GC}/bin/tabix ${OUT}/vcfs/chr22/chr22.filtered.vcf.gz 22:36661906 | head -1
+
  22 36661906 . A G 23 PASS DP=409;MQ=59;NS=62;AN=124;AC=2;AF=0.013847;
  22 36661906 . A G 18 PASS DP=409;MQ=59;NS=62;AN=124;AC=2;AF=0.013827;AB=0.4065;AZ=-0.5287;
+
AB=0.4169;AZ=-0.3525;FIC=-0.0089;SLRT=-0.0071;LBS=36,36,0,0,1,1,0,0;OBS=145,191,0,0,3,2,0,0;STR=-0.040;
                    FIC=-0.0092;SLRT=-0.0075;HWEAF=0.0138;HWDAF=0.0276,0.0000;LBS=36,36,0,0,1,1,0,0;
+
STZ=-0.740;CBR=0.008;CBZ=0.144;IOR=0.000;IOZ=-1.370;AOI=-5.614;AOZ=-4.243;LQR=0.178;MQ0=0.000;
                    OBS=145,191,0,0,3,2,0,0;STR=-0.040;STZ=-0.740;CBR=0.008;CBZ=0.144;IOR=0.000;IOZ=-1.370;
+
MQ10=0.000;MQ20=0.000;MQ30=0.000;SVM=1.45191 GT:DP:GQ:PL 0/0:4:28:0,12,65
                    AOI=-5.614;AOZ=-4.243;LQR=0.178;MQ0=0.000;MQ10=0.000;MQ20=0.000;MQ30=0.000;SVM=1.51214
+
 
              GT:DP:GQ:PL 0/0:4:28:0,12,65
      
Let's check the sequence data to confirm that the variant really exists
 
Let's check the sequence data to confirm that the variant really exists
Line 472: Line 475:  
View Screenshot
 
View Screenshot
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
[[File:Samtoolstviewsnp.png|600px]]
+
[[File:SamtoolstviewsnpNew.png|600px]]
 
</div>
 
</div>
 
</div>
 
</div>
Line 480: Line 483:  
Let's get some information on the BEAGLE VCF:
 
Let's get some information on the BEAGLE VCF:
   −
  perl ${SS}/ext/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.beagle
+
  perl ${GC}/scripts/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/beagle/chr22/chr22.filtered.PASS.beagled.ALL.vcf.gz --out ${OUT}/diffs/bedDiff.beagle
       
Look at the results:
 
Look at the results:
  more ${OUT}/bedDiff.beagle.summary
+
  more ${OUT}/diffs/bedDiff.beagle.summary
    
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
*Results
 
*Results
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
  OVERALL: 43601 44293 0.9844
+
  OVERALL: 43588 44293 0.9841
  NREF-EITHER: 19667 20359 0.9660
+
  NREF-EITHER: 19644 20349 0.9654
  NMAJ-EITHER: 14585 15277 0.9547
+
  NMAJ-EITHER: 14560 15265 0.9538
   −
  HOMREF: 23934 100 1 0.9958
+
  HOMREF: 23944 91 0 0.9962
  HET: 329 11959 175 0.9596
+
  HET: 355 11936 172 0.9577
  HOMALT: 4 83 7708 0.9888
+
  HOMALT: 6 81 7708 0.9888
   −
  HOMMAJ: 29016 126 2 0.9956
+
  HOMMAJ: 29028 112 4 0.9960
  HET: 364 11959 140 0.9596
+
  HET: 380 11936 147 0.9577
  HOMMIN: 3 57 2626 0.9777
+
  HOMMIN: 2 60 2624 0.9769
 
</div>
 
</div>
 
</div>
 
</div>
    
Now, let's see if it improved after running Thunder VCF:
 
Now, let's see if it improved after running Thunder VCF:
  perl ${SS}/ext/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --gcRoot ${GC} --out ${OUT}/bedDiff.thunder
+
  perl ${GC}/scripts/bed-diff.pl --vcf1 ${SS}/ref22/1kg.omni.chr22.36Mb.vcf.gz --vcf2 ${OUT}/thunder/chr22/ALL/thunder/chr22.filtered.PASS.beagled.ALL.thunder.vcf.gz --out ${OUT}/diffs/bedDiff.thunder
    
Look at the results:
 
Look at the results:
  more ${OUT}/bedDiff.thunder.summary
+
  more ${OUT}/diffs/bedDiff.thunder.summary
    
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
<div class="mw-collapsible mw-collapsed" style="width:400px">
 
*Results
 
*Results
 
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
  OVERALL: 43685 44293 0.9863
+
  OVERALL: 43711 44293 0.9869
  NREF-EITHER: 19758 20366 0.9701
+
  NREF-EITHER: 19777 20359 0.9714
  NMAJ-EITHER: 14688 15296 0.9603
+
  NMAJ-EITHER: 14715 15297 0.9620
+
 
  HOMREF: 23927 106 2 0.9955
+
  HOMREF: 23934 101 0 0.9958
  HET: 286 12057 120 0.9674
+
  HET: 272 12084 107 0.9696
  HOMALT: 6 88 7701 0.9879
+
  HOMALT: 5 97 7693 0.9869
+
 
  HOMMAJ: 28997 144 3 0.9950
+
  HOMMAJ: 28996 145 3 0.9949
  HET: 286 12057 120 0.9674
+
  HET: 272 12084 107 0.9696
  HOMMIN: 5 50 2631 0.9795
+
  HOMMIN: 2 53 2631 0.9795
 
</div>
 
</div>
 
</div>
 
</div>
Line 536: Line 539:  
   
 
   
 
Aren't you glad you didn't have to configure & run each one yourself?
 
Aren't you glad you didn't have to configure & run each one yourself?
 +
 +
 +
== Return to Workshop Wiki Page ==
 +
Return to main workshop wiki page: [[SeqShop: December 2014]]
87

edits

Navigation menu