Changes

From Genome Analysis Wiki
Jump to navigationJump to search
105 bytes added ,  17:37, 15 May 2014
Line 39: Line 39:  
Next generation sequence data analysis starts with [http://en.wikipedia.org/wiki/FASTQ_format FASTQ files], the typical format provided from your sequencing center containing the sequence & base quality information for your data.
 
Next generation sequence data analysis starts with [http://en.wikipedia.org/wiki/FASTQ_format FASTQ files], the typical format provided from your sequencing center containing the sequence & base quality information for your data.
   −
The fastq files are processed using the alignment pipeline which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the alignment pipeline, the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.  
+
The fastq files are processed using the [[GotCloud: Alignment Pipeline|alignment pipeline]] which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the [[GotCloud: Alignment Pipeline|alignment pipeline]], the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.  
    
The [[GotCloud: Alignment Pipeline|alignment pipeline]] can be skipped if you already have Deduped and Recalibrated BAM files.
 
The [[GotCloud: Alignment Pipeline|alignment pipeline]] can be skipped if you already have Deduped and Recalibrated BAM files.
   −
The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] processes the deduped and recalibrated BAMs file produced by the alignment pipeline or that you provide it, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file].  The variant calling pipeline then filters the  variants using both hard filters and a [[SVM Filtering|Support Vector Machine (SVM)]].  It then uses haplotype information to refine these genotypes in an updated VCF file.
+
The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] processes the deduped and recalibrated BAM files produced by the alignment pipeline or that you provide it, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file].  The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] then filters the  variants using both hard filters and a [[SVM Filtering|Support Vector Machine (SVM)]].  It then uses haplotype information to refine these genotypes in an updated VCF file.
    
After completing the GotCloud Variant Calling Pipeline, [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.
 
After completing the GotCloud Variant Calling Pipeline, [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.

Navigation menu