Changes

From Genome Analysis Wiki
Jump to navigationJump to search
952 bytes added ,  11:47, 27 February 2013
no edit summary
Line 2: Line 2:  
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.  
 
In this tutorial, we illustrate some of the essential steps in the analysis of next generation sequence data.  
   −
We will start with a set of sequence reads and associated base quality scores stored in fastq file.
+
Analysis starts with [http://en.wikipedia.org/wiki/FASTQ_format FASTQ files], the typical format provided from your sequencing center containing the sequence & base quality information for your data.
   −
The alignment pipeline will find the most likely genomic location for each read producing a BAM file.
+
The fastq files are processed using the alignment pipeline which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the alignment pipeline, the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.  
   −
The variant calling pipeline generates an initial list of polymorphic sites and genotypes stored in a VCF file and then uses haplotype information to refine these genotypes in an updated VCF file.
+
The variant calling pipeline processes the BAMs file produced by the alignment pipeline, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file] and then uses haplotype information to refine these genotypes in an updated VCF file.
 +
 
 +
This tutorial then demonstrates how [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.
 +
 
 +
[[File:GotCloudDiagram.png]]
    
== STEP 1 : Setup GotCloud ==
 
== STEP 1 : Setup GotCloud ==

Navigation menu