Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,674 bytes removed ,  14:39, 18 March 2013
Line 2: Line 2:  
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.  
 
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.  
   −
Analysis starts with [http://en.wikipedia.org/wiki/FASTQ_format FASTQ files], the typical format provided from your sequencing center containing the sequence & base quality information for your data.
+
For a background on GotCloud and Sequence Analysis Pipelines, see [[GotCloud]]
   −
The fastq files are processed using the alignment pipeline which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the alignment pipeline, the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.
+
While GotCloud can run on a cluster of machines or instances, this tutorial is just a small test that just runs on the machine the commands are run on.
 
  −
The variant calling pipeline processes the BAMs file produced by the alignment pipeline, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file] and then uses haplotype information to refine these genotypes in an updated VCF file.
  −
 
  −
After variant calling, there is an optional step to further filter the variants using a [[SVM Filtering|Support Vector Machine (SVM)]].  This feature is in development and will soon be added to gotcloud and this tutorial.
  −
 
  −
This tutorial then demonstrates how [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.
  −
 
  −
[[File:GotCloudDiagram.png]]
  −
 
  −
[[GotCloud]] incorporates the alignment and variant calling pipelines into one easy to use tool.  GotCloud can run on a user's computer, on an instance in a compute cloud, or can split the work up onto a cluster of machines or instances.  This tutorial is just a small test that just runs on the machine the commands are run on.
      
GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop].  It was presented in two sessions.  On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]].  On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].
 
GotCloud and this basic tutorial were presented at the [http://ibg.colorado.edu/dokuwiki/doku.php?id=workshop:2013:announcement 2013 IBG Workshop].  It was presented in two sessions.  On Wednesday an overview was presented with steps for running the tutorial data: [[Media:IBG2013GotCloud.pdf|IBG2013GotCloud.pdf]].  On Friday more detail on the input files and what goes into generating the input files was presented: [[Media:GotCloudIBGWorkshop2013Friday.pdf|GotCloudIBGWorkshop2013Friday.pdf]].

Navigation menu