Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 89: Line 89:     
=== Reference Files ===  
 
=== Reference Files ===  
 +
See [[GotCloud: Genetic Reference and Resource Files]] for detailed information about the multiple required reference files for the alignment pipeline, including:
 +
* How to obtain default references
 +
* Configuration keys & default values
 +
* How to generate your own references
 +
* How to point GotCloud to your reference files
   −
The following Reference Files are required:
+
Required Reference File Types:
* Reference File fasta files
+
* [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]]
** Files required: .fa, -bs.umfa, .GCContent, .amb, .ann, .bwt, .pac, .sa
+
* [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF Files|DBSNP VCF Files]]
*** If you don't have the -bs.umfa file, the software will try to create it in the same directory as the reference fasta.
+
* [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF Files|HapMap3 VCF Files]]
*** .GCContent can be generated using qplot, see: [[QPLOT#Input_files| QPLOT: Input Files: --gccontent]] and name the resulting file as <code>.fa.GCcontent</code>
  −
*** Use <code>bin/bwa index ref.fa</code> if you need to generate the bwa reference files (.amb, .ann, .bwt, .pac, .sa)
  −
** Configuration Name: REF - specify the ref.fa/ref.fa.gz name
  −
* DBSNP File
  −
** VCF file containing dbsnp variants
  −
** Configuration Name: DBSNP_VCF
  −
* HapMap3 VCF
  −
** VCF file containing HM3 variants
  −
** Configuration Name: HM3_VCF
  −
 
  −
For more information on obtaining/setting/generating the GotCloud reference files, see: [[GotCloud: Genetic Reference and Resource Files]]
      
=== Configuration File ===  
 
=== Configuration File ===  
Configuration file contains the run-time options including the software binaries and command line arguments.  A default configuration file is automatically loaded.  Users may specify their own configuration file specifying just the values different than the defaults.  The configuration file is not required if there are no values to override.
+
{{:GotCloud: Configuration}}
 
  −
Comments begin with a <code>#</code>
  −
 
  −
Format: KEY = value
     −
Where KEY is the item being set and value is its new value
+
==== Recommended Settings ====
   −
See [[#Command-Line Options|Command-Line Options]] for values that can be set either via command line or via configuration.  
+
As of GotCloud version 1.16, the alignment pipeline uses <code>bwa mem</code> by default.  Prior to version 1.16, the default aligner was <code>bwa aln</code>. 
   −
Note: Command-line options take priority over configuration file settings
+
You can override the defaults by setting in your configuration file:
 +
* to use <code>bwa mem</code> (you do not need to set this in version 1.16 and later since it is the default)
 +
MAP_TYPE = BWA_MEM
 +
* to use <code>bwa aln</code> (you do not need to set this prior to version 1.16 since it is the default)
 +
MAP_TYPE = BWA
   −
==== Required Settings ====  
+
==== Additional Required Settings ====  
See [[#Reference Files|Reference Files]] for the required reference file settings.
      
See [[#FASTQ List File|FASTQ List File]] for how to set the index file either via command line options or via configuration.
 
See [[#FASTQ List File|FASTQ List File]] for how to set the index file either via command line options or via configuration.
Line 140: Line 133:     
* BWA_THREADS = -t N  
 
* BWA_THREADS = -t N  
** Fill in the N with the number of threads you want BWA to run with, default is 1  
+
** Fill in the N with the number of threads you want BWA to run with, default is 1
 +
* BWA_QUAL = -q N
 +
** Fill in the N with the trim quality you want BWA aln to run with, default is 15.  This parameter is only applied to bwa aln.  It is not used for BWA_MEM.
 +
* BWA_MEM_OPTS =
 +
** Specify any additional bwa mem options using this parameter.
 
* SORT_MAX_MEM = 2000000000  
 
* SORT_MAX_MEM = 2000000000  
** Maximum amount of memory used by samtools sort after running bwa  
+
** Maximum amount of memory used by samtools sort after running bwa
*BATCH_TYPE = type
  −
** Tells GotCloud to use "type" for sending jobs to the cluster
  −
** "type" can be mosix, slurm, slurmi, pbs, sge, sgei
  −
*BATCH_OPTS = options
  −
** replace "options" with the options you want to pass to your cluster type
  −
** For example: BATCH_OPTS = -j36,37,38,39,40,41,45,46,47,48,49
  −
*** Specifies which client nodes mosix should send jobs to.
      
== Running the Alignment Pipeline ==  
 
== Running the Alignment Pipeline ==  
Line 225: Line 215:  
**** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better
 
**** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better
 
**** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination
 
**** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination
** Qplot Output -see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results
+
** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results
 
*** ''*.qplot.done'' - temp file indicating this step completed successfully
 
*** ''*.qplot.done'' - temp file indicating this step completed successfully
 
*** '''*.qplot.R''' - Rscript that can be used to generate the pdf graphs
 
*** '''*.qplot.R''' - Rscript that can be used to generate the pdf graphs

Navigation menu