Changes

Tutorial: GotCloud (view source)

Revision as of 14:39, 18 March 2013

1,674 bytes removed , 14:39, 18 March 2013

Line 2: Line 2:

In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.

−

Analysis ~~starts with~~ [~~http://en.wikipedia.org/wiki/FASTQ_format FASTQ files~~]~~, the typical format provided from your sequencing center containing the sequence & base quality information for your data.~~

+

For a background on GotCloud and Sequence Analysis Pipelines, see [[GotCloud]]

−

The fastq files are processed using the alignment pipeline which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]]. In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping. As part of the alignment pipeline, the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.

+

While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.

−

The variant calling pipeline processes the BAMs file produced by the alignment pipeline, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file] and then uses haplotype information to refine these genotypes in an updated VCF file.

−

After variant calling, there is an optional step to further filter the variants using a [[SVM Filtering|Support Vector Machine (SVM)]]. This feature is in development and will soon be added to gotcloud and this tutorial.

−

This tutorial then demonstrates how [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.

−

~~[[File:GotCloudDiagram.png]]~~

−

~~[[GotCloud]] incorporates the alignment and variant calling pipelines into one easy to use tool.~~ GotCloud can run on ~~a user's computer, on an instance in a compute cloud, or can split the work up onto~~ a cluster of machines or instances~~. This~~ tutorial is just a small test that just runs on the machine the commands are run on.

GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop]. It was presented in two sessions. On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]]. On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].

Mktrost

Administrators

3,045

edits

Changes

Tutorial: GotCloud (view source)

Revision as of 14:39, 18 March 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools