Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,686 bytes added ,  13:02, 18 March 2013
no edit summary
Line 1: Line 1: −
= Genomes on the Cloud (GotCloud)=
+
= Genomes on the Cloud (GotCloud) Introduction=
    
To handle the increasing volume of next generation sequencing and genotyping data, we created and developed software pipelines called '''Genomes on the Cloud (GotCloud).'''
 
To handle the increasing volume of next generation sequencing and genotyping data, we created and developed software pipelines called '''Genomes on the Cloud (GotCloud).'''
Line 14: Line 14:  
** Simplifies running on clusters
 
** Simplifies running on clusters
 
* Scalable to tens of thousands of samples
 
* Scalable to tens of thousands of samples
* Easy to use - Automates series of configurable steps so the user doesn't have to understand/configure/know the many tools required to create high quality results
+
* Easy to use - Automates series of configurable steps
 +
** user doesn't have to understand/configure/know the many tools required to create high quality results
 
* Available on Amazon Web Services (AWS) Elastic Compute Cloud (EC2)
 
* Available on Amazon Web Services (AWS) Elastic Compute Cloud (EC2)
 
* Run on local machines/clusters
 
* Run on local machines/clusters
Line 31: Line 32:       −
== Detailed Background Information ==
+
== Sequence Analysis Background Information ==
   −
*Why use GotCloud?
+
There are many essential steps in the analysis of next generation sequence data.
** Many tools required to create high quality
     −
[[File:GotCloudDiagram.png]]
+
Next generation sequence data analysis starts with [http://en.wikipedia.org/wiki/FASTQ_format FASTQ files], the typical format provided from your sequencing center containing the sequence & base quality information for your data.
 +
 
 +
The fastq files are processed using the alignment pipeline which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the alignment pipeline, the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.
    +
The alignment pipeline can be skipped if you already have Deduped and Recalibrated BAM files.
   −
== AWS==
+
The variant calling pipeline processes the deduped and recalibrated BAMs file produced by the alignment pipeline or that you provide it, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file].  The variant calling pipeline then filters the variants using both hard and a [[SVM Filtering|Support Vector Machine (SVM)]].  It then uses haplotype information to refine these genotypes in an updated VCF file.
The following describes the use of this software with the Amazon Web Services (https://aws.amazon.com/),
  −
but you can just as easily use the pipelines on your own machine(s) by just installing them.
     −
Latest Documentation at [[Tutorial: GotCloud]]
+
After completing the GotCloud Variant Calling PIpeline, [[EPACTS|EPACTS (Efficient and Parallelizable Association Container Toolbox)]] can be used to perform statistical tests to identify genome-wide association from sequence data.
    +
[[File:GotCloudDiagram.png]]
      −
== Setup ==
+
= GotCloud Setup =
 
You may run the GotCloud software in several modes:
 
You may run the GotCloud software in several modes:
   Line 55: Line 57:     
Details for the Choices of Your Install
 
Details for the Choices of Your Install
 +
 +
== AWS==
 +
The following describes the use of this software with the Amazon Web Services (https://aws.amazon.com/),
 +
but you can just as easily use the pipelines on your own machine(s) by just installing them.
 +
 +
Latest Documentation at [[Tutorial: GotCloud]]
 +
 +
    
=== Install GotCloud Software ===
 
=== Install GotCloud Software ===

Navigation menu