Line 1: |
Line 1: |
− | seqshop/gotcloud/gotcloud snpcall --conf seqshop/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 | + | == Goals of This Session == |
| + | * What we want to learn |
| + | ** Basic variant call file format (VCF) |
| + | ** How to generate filtered variant calls for SNPs |
| + | ** How to evaluate the quality of variant calls |
| + | ** How to visualize the variant calls to examine the variants at particular genomic positions |
| + | |
| + | == GotCloud SnpCall Pipeline == |
| + | |
| + | [[File:SnpcallDiagram.png|500px]] |
| + | |
| + | === Why GotCloud?=== |
| + | Many of the same reasons as using GotCloud align |
| + | * Easy to learn & run |
| + | ** All-in-one package for '''snp calling pipeline''' |
| + | ** You don’t have to know the details of individual component |
| + | * Robust parallelization |
| + | ** Automatic partitions '''by regions''' |
| + | ** Reliable and fault-tolerant parallelization via GNU make |
| + | *** Restart from where it stopped upon unexpected crash |
| + | * Cloud & Cluster-friendly |
| + | ** Supports multiple clusters such as MOSIX, Slurm, & SGE |
| + | ** Amazon instances allow running large-scale jobs without having your own cluster |
| + | * '''Easy to add new samples to your study''' |
| + | ** Just add them to your index |
| + | ** GotCloud will reuse the genotype likelihoods for samples already completed |
| + | |
| + | |
| + | {{SeqShopLogin}} |
| + | |
| + | == Setup your run environment == |
| + | |
| + | This will setup some environment variables to point you to the Tutorial files as well as to an output directory. |
| + | source /home/mktrost/seqshop/setup.txt |
| + | |
| + | Alternatively, if you would like to change the output directory, copy the file, make the modifications and source your file: |
| + | cp /home/mktrost/seqshop/setup.txt ~/setup.txt |
| + | nedit ~/setup.txt |
| + | source ~/setup.txt |
| + | (You can use your favorite editor instead of nedit. I typically use emacs, but nedit is more like Windows.) |
| + | |
| + | |
| + | == Examining GotCloud SnpCall Input files == |
| + | === Sequnce Alignment Files: BAM Files === |
| + | |
| + | === Reference Files === |
| + | |
| + | === GotCloud BAM Index File === |
| + | |
| + | === GotCloud Configuration File === |
| + | We will use the same configuration file as we used yesterday in GotCloud Align. |
| + | |
| + | |
| + | == Run GotCloud SnpCall == |
| + | Now that we have all of our input files, we need just a simple command to run: |
| + | ${GC}/gotcloud/gotcloud snpcall --conf ${GC}/inputs/gotcloud.conf --numjobs 4 --region 22:36000000-37000000 |
| + | * --numjobs tells GotCloud how many jobs to run in parallel |
| + | ** Depends on your system |
| + | * --region 22:36000000-37000000 |
| + | ** The sample files are just a small region of chromosome 22, so to save time, we tell Gotcloud to ignore the other regions |
| + | |
| + | This should take about 5 minutes to run. |
| + | |
| + | It should end with a line like: <code>TBD</code> |
| + | |
| + | If you cancelled GotCloud part way through, just rerun your GotCloud command and it will pick up where it left off. |
| + | |
| + | == Examining GotCloud SnpCall Output == |
| + | |
| + | === Filtering Summary Statistics === |
| + | |
| + | cat ${OUTPUT}/vcfs/chr22/chr22.filtered.sites.vcf.summary |
| + | |
| + | <div class="mw-collapsible mw-collapsed" style="width:250px"> |
| + | View Annotated Screenshot |
| + | <div class="mw-collapsible-content"> |
| + | [[File:filterSum.png]] |
| + | </div> |
| + | </div> |
| + | |
| + | |
| + | == GotCloud Genotype Refinement == |
| + | |
| seqshop/gotcloud/gotcloud beagle --conf seqshop/inputs/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 | | seqshop/gotcloud/gotcloud beagle --conf seqshop/inputs/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 |