Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 89: Line 89:     
=== Reference Files ===  
 
=== Reference Files ===  
 +
See [[GotCloud: Genetic Reference and Resource Files]] for detailed information about the multiple required reference files for the alignment pipeline, including:
 +
* How to obtain default references
 +
* Configuration keys & default values
 +
* How to generate your own references
 +
* How to point GotCloud to your reference files
   −
The following Reference Files are required:  
+
Required Reference File Types:
* Reference File fasta files
+
* [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]]
** Files required: .fa, -bs.umfa, .GCContent, .amb, .ann, .bwt, .pac, .sa
+
* [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF Files|DBSNP VCF Files]]
*** If you don't have the -bs.umfa file, the software will try to create it in the same directory as the reference fasta.
+
* [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF Files|HapMap3 VCF Files]]
*** .GCContent can be generated using qplot, see: [[QPLOT#Input_files| QPLOT: Input Files: --gccontent]] and name the resulting file as <code>.fa.GCcontent</code>
  −
*** Use <code>bin/bwa index ref.fa</code> if you need to generate the bwa reference files (.amb, .ann, .bwt, .pac, .sa)
  −
** Configuration Name: REF - specify the ref.fa/ref.fa.gz name
  −
* DBSNP File
  −
** VCF file containing dbsnp variants
  −
** Configuration Name: DBSNP_VCF
  −
* HapMap3 VCF
  −
** VCF file containing HM3 variants
  −
** Configuration Name: HM3_VCF
      
=== Configuration File ===  
 
=== Configuration File ===  
Configuration file contains the run-time options including the software binaries and command line arguments.  A default configuration file is automatically loaded.  Users may specify their own configuration file specifying just the values different than the defaults.  The configuration file is not required if there are no values to override.
+
{{:GotCloud: Configuration}}
   −
Comments begin with a <code>#</code>
+
==== Recommended Settings ====
   −
Format: KEY = value
+
As of GotCloud version 1.16, the alignment pipeline uses <code>bwa mem</code> by default.  Prior to version 1.16, the default aligner was <code>bwa aln</code>. 
   −
Where KEY is the item being set and value is its new value
+
You can override the defaults by setting in your configuration file:
 +
* to use <code>bwa mem</code> (you do not need to set this in version 1.16 and later since it is the default)
 +
MAP_TYPE = BWA_MEM
 +
* to use <code>bwa aln</code> (you do not need to set this prior to version 1.16 since it is the default)
 +
MAP_TYPE = BWA
   −
See [[#Command-Line Options|Command-Line Options]] for values that can be set either via command line or via configuration.
+
==== Additional Required Settings ====
   −
Note: Command-line options take priority over configuration file settings
+
See [[#FASTQ List File|FASTQ List File]] for how to set the index file either via command line options or via configuration.
 
  −
==== Required Settings ====
  −
See [[#Reference Files|Reference Files]] for the required reference file settings.
  −
 
  −
See [[#Sequence Index File|Sequence Index File]] for how to set the index file either via command line options or via configuration.  
      
==== Turning Off Optional Steps====  
 
==== Turning Off Optional Steps====  
 
Quality Control steps can be disabled.  
 
Quality Control steps can be disabled.  
   −
To Disable QPLOT, set:  
+
To Disable QPLOT, remove qplot from the PER_MERGE_STEPS configuration by setting:  
  RUN_QPLOT = 0
+
  PER_MERGE_STEPS = verifyBamID index recab
 +
 
   −
To Disable VerifyBamID, set:  
+
To Disable VerifyBamID, remove qplot from the PER_MERGE_STEPS configuration by setting:  
  RUN_VERIFY_BAM_ID = 0
+
  PER_MERGE_STEPS = qplot index recab
    
==== Optional Configurable Settings ====  
 
==== Optional Configurable Settings ====  
Line 137: Line 133:     
* BWA_THREADS = -t N  
 
* BWA_THREADS = -t N  
** Fill in the N with the number of threads you want BWA to run with, default is 1  
+
** Fill in the N with the number of threads you want BWA to run with, default is 1
 +
* BWA_QUAL = -q N
 +
** Fill in the N with the trim quality you want BWA aln to run with, default is 15.  This parameter is only applied to bwa aln.  It is not used for BWA_MEM.
 +
* BWA_MEM_OPTS =
 +
** Specify any additional bwa mem options using this parameter.
 
* SORT_MAX_MEM = 2000000000  
 
* SORT_MAX_MEM = 2000000000  
** Maximum amount of memory used by samtools sort after running bwa  
+
** Maximum amount of memory used by samtools sort after running bwa
*BATCH_TYPE = type
  −
** Tells GotCloud to use "type" for sending jobs to the cluster
  −
** "type" can be mosix, slurm, slurmi, pbs, sge, sgei
  −
*BATCH_OPTS = options
  −
** replace "options" with the options you want to pass to your cluster type
  −
** For example: BATCH_OPTS = -j36,37,38,39,40,41,45,46,47,48,49
  −
*** Specifies which client nodes mosix should send jobs to.
      
== Running the Alignment Pipeline ==  
 
== Running the Alignment Pipeline ==  
Line 198: Line 191:     
On success, you will see:
 
On success, you will see:
  Processing finished in nn secs with no errors reported  
+
  Processing finished in nn secs with no errors reported
and should see the following subdirectories under the user specified output directory:  
+
 
* bams/  
+
If processing fails part way through, you can pick up where you left off by rerunning gotcloud or the make command.
* Makefiles/  
+
 
* QCFiles/ (if all quality control is not disabled)
+
=== Alignment Pipeline Output ===
* tmp/  
+
Upon successful completion of the alignment pipeline, you should see the following files/ subdirectories under the user specified output directory:  
 +
* '''bam.list''' - file containing sample->BAM mapping that can be used in other GotCloud pipelines
 +
* '''bams/''' - contains the final BAM and bai (BAM index) files
 +
** '''*.recal.bam'''
 +
** '''*.recal.bam.bai'''
 +
** ''*.recal.bam.bai.done'' - temp file indicating this step completed successfully
 +
** ''*.recal.bam.done'' - temp file indicating this step completed successfully
 +
** *.recal.bam.metrics - dedup & recalibration log
 +
** *.recal.bam.qemp - recalibration tables
 +
* Makefiles/ - contains the Makefiles and logs used by GotCloud to run the alignment pipeline
 +
* '''QCFiles/''' - contains quality control results if quality control is not disabled
 +
** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information
 +
*** *.genoCheck.depthRG - depth distribution of the sequence reads per read group
 +
*** *.genoCheck.depthSM - depth distribution of the sequence reads per sample
 +
*** ''*.genoCheck.done'' - temp file indicating this step completed successfully
 +
*** *.genoCheck.selfRG - per-readGroup statistics describing how well each lane matches to the annotated sample
 +
*** '''*.genoCheck.selfSM''' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample
 +
**** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better
 +
**** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination
 +
** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results
 +
*** ''*.qplot.done'' - temp file indicating this step completed successfully
 +
*** '''*.qplot.R''' - Rscript that can be used to generate the pdf graphs
 +
*** '''*.qplot.stats''' - sample statistics
 +
* tmp/ - contains intermediate files (most are deleted unless --keepTmp is specified)
 +
* *.OK - one OK file per sample; indicates the Sample successfully completed alignment
   −
You should see a <code>.OK</code> for each Sample in the index file.  
+
You should also see a <code>.OK</code> for each Sample in the index file.  
    
If you do not see these <code>.OK</code> files, then your Alignment Pipeline failed.  
 
If you do not see these <code>.OK</code> files, then your Alignment Pipeline failed.  
   −
On success, the bams/ directory contains the final BAMs and bais.
+
'''On success, the bams/ directory contains the final BAMs and bais.'''
 
  −
If processing fails part way through, you can pick up where you left off by rerunning gotcloud or the make command.
 

Navigation menu