Changes

From Genome Analysis Wiki
Jump to navigationJump to search
777 bytes added ,  18:09, 15 December 2015
Line 25: Line 25:     
[[File:Gotcloud.puzzles.v2.png|500px]]
 
[[File:Gotcloud.puzzles.v2.png|500px]]
 +
    
=== Getting Help with GotCloud ===
 
=== Getting Help with GotCloud ===
Line 43: Line 44:  
The fastq files are processed using the [[GotCloud: Alignment Pipeline|alignment pipeline]] which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the [[GotCloud: Alignment Pipeline|alignment pipeline]], the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.  
 
The fastq files are processed using the [[GotCloud: Alignment Pipeline|alignment pipeline]] which finds the most likely genomic location for each read and stores that information in a [[BAM|BAM (Binary Sequence Alignment/Map format) file]].  In addition to the sequence and base quality information contained in FASTQ files, a BAM file also contains the genomic location and some additional information about the mapping.  As part of the [[GotCloud: Alignment Pipeline|alignment pipeline]], the base qualities are adjusted to more accurately reflect the likelihood that the base is correct.  
   −
The [[GotCloud: Alignment Pipeline|alignment pipeline]] can be skipped if you already have Deduped and Recalibrated BAM files.
+
The [[GotCloud: Alignment Pipeline|alignment pipeline]] can be skipped if you already have Deduped and Recalibrated BAM files.  If you have BAMs, but they needed to be deduped and recalibrated, you can use our [[GotCloud:_Alignment_Sub-Pipelines#recabQC_2|recabQC pipeline]].
    
The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] processes the deduped and recalibrated BAM files produced by the alignment pipeline or that you provide it, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file].  The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] then filters the  variants using both hard filters and a [[SVM Filtering|Support Vector Machine (SVM)]].  It then uses haplotype information to refine these genotypes in an updated VCF file.
 
The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] processes the deduped and recalibrated BAM files produced by the alignment pipeline or that you provide it, generating an initial list of polymorphic sites and genotypes stored in a [http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 VCF (Variant Call Format) file].  The [[GotCloud: Variant Calling Pipeline|variant calling pipeline]] then filters the  variants using both hard filters and a [[SVM Filtering|Support Vector Machine (SVM)]].  It then uses haplotype information to refine these genotypes in an updated VCF file.
Line 50: Line 51:     
[[File:GotCloudDiagram.jpg|500px]]
 
[[File:GotCloudDiagram.jpg|500px]]
 +
 +
 +
== Publication ==
 +
If you use GotCloud, please cite our publication:
 +
[http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract Jun, Goo, et al. "An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data." Genome research (2015): gr-176552.]
    
== GotCloud Setup ==
 
== GotCloud Setup ==
Line 59: Line 65:  
** You can run on an EC2 cluster instance created by StarCluster.  
 
** You can run on an EC2 cluster instance created by StarCluster.  
   −
GotCloud has been developed and tested on Linux Ubuntu 12.10 and 12.04.2 LTS.  While it should work on other Linux systems, they have not yet been tested.  
+
GotCloud has been developed and tested on Linux Ubuntu 12.10 and 12.04.2 LTS and Red Hat 6.6.  While it should work on other Linux systems, they have not yet been tested.  
    
=== GotCloud on Amazon ===
 
=== GotCloud on Amazon ===
Line 66: Line 72:  
See [[GotCloud: Amazon]] for instructions on using GotCloud on Amazon.
 
See [[GotCloud: Amazon]] for instructions on using GotCloud on Amazon.
   −
=== GotCloud Dependencies ===
+
=== GotCloud Setup on Any Linux Machine ===
 +
 
 +
==== GotCloud Dependencies ====
    
GotCloud requires certain things to be installed in order to run:
 
GotCloud requires certain things to be installed in order to run:
Line 81: Line 89:  
  [gotcloud_path]/scripts/check_requirements.sh
 
  [gotcloud_path]/scripts/check_requirements.sh
   −
=== Install GotCloud Software ===
+
==== Install GotCloud Software ====
    
You can install gotCloud on your system as (follow the links for the appropriate instructions):
 
You can install gotCloud on your system as (follow the links for the appropriate instructions):
Line 97: Line 105:  
For more information on Amazon Web Services, see: https://aws.amazon.com/
 
For more information on Amazon Web Services, see: https://aws.amazon.com/
   −
=== GotCloud Reference/Resource Files ===
+
==== GotCloud Reference/Resource Files ====
 
In order to run gotCloud, you need to provide Genetic Reference and Resource Files.
 
In order to run gotCloud, you need to provide Genetic Reference and Resource Files.
   Line 105: Line 113:  
* When running on Amazon, a default set of reference files are included in the GotCloud AMI.
 
* When running on Amazon, a default set of reference files are included in the GotCloud AMI.
   −
=== Configure GotCloud ===
+
==== Configure GotCloud ====
 
* [[Configure GotCloud|Configure Gotcloud]] for your installation
 
* [[Configure GotCloud|Configure Gotcloud]] for your installation
   Line 111: Line 119:     
* [[GotCloud: Alignment Pipeline|Alignment Pipeline]]  
 
* [[GotCloud: Alignment Pipeline|Alignment Pipeline]]  
 +
** [[GotCloud: Alignment Sub-Pipelines|Alignment Sub-Pipelines]] - for if you do not want to run the entire Alignment Pipeline
 
* [[GotCloud: Variant Calling Pipeline|Variant Calling Pipeline]]
 
* [[GotCloud: Variant Calling Pipeline|Variant Calling Pipeline]]
* [[GotCloud: Indel Calling Pipeline|Indel Calling Pipeline]]
+
* Indel Calling Pipeline
* [[GotCloud: GenomeSTRiP Pipeline|GenomeSTRiP Pipeline]] (Structural Variation) - ''Coming Soon''
+
* [[GotCloud: GenomeSTRiP Pipeline|GenomeSTRiP Pipeline]] (Structural Variation)
* [[GotCloud: MEI Calling Pipeline|MEI Calling Pipeline]] - ''Coming Soon'''
+
* MEI Calling Pipeline - ''Ask if you're interested''
 +
 
 +
You can also create your own pipelines.  Instructions are here:
 +
* [[GotCloud: Creating a New Pipeline]]  
    
=== GotCloud Demos ===
 
=== GotCloud Demos ===
SeqShop GotCloud Demos:
+
GotCloud Demos (originally from our sequencing workshop):
 
* [[SeqShop: Sequence Mapping and Assembly Practical]]
 
* [[SeqShop: Sequence Mapping and Assembly Practical]]
 
* [[SeqShop: Variant Calling and Filtering for SNPs Practical]]
 
* [[SeqShop: Variant Calling and Filtering for SNPs Practical]]
Line 129: Line 141:     
== UMich Development/Release How-To Notes ==
 
== UMich Development/Release How-To Notes ==
* [[Creating Packages]]
+
* [[Releasing GotCloud]]
 
* Amazon EC2
 
* Amazon EC2
 
** [[Creating an AMI on EC2]]
 
** [[Creating an AMI on EC2]]
Line 135: Line 147:  
** [[Mount S3 Volume]]
 
** [[Mount S3 Volume]]
 
** Notes on sequence data preparation in [[Amazon Storage|Amazon Storage]].
 
** Notes on sequence data preparation in [[Amazon Storage|Amazon Storage]].
 +
 +
* [[Git_FAQs#Subtrees|Upgrade Git Subtree]]
61

edits

Navigation menu