Changes

From Genome Analysis Wiki
Jump to navigationJump to search
→‎Command Line Parameters: fix formatting about BAM_LIST
Line 47: Line 47:  
# [[#Overall Pipeline Definition|Overall Pipeline Definition]]
 
# [[#Overall Pipeline Definition|Overall Pipeline Definition]]
 
#* Basics for the overall pipeline
 
#* Basics for the overall pipeline
#* '''NOTE: Curretnly, configurations set in the overall pipeline's section do not by default pass onto the step's configurations'''
+
#* '''NOTE: Currently, configurations set in the overall pipeline's section do not by default pass onto the step's configurations'''
 
# [[#Configure Each Step|Configure Each Step]]
 
# [[#Configure Each Step|Configure Each Step]]
   Line 68: Line 68:  
* BATCH_TYPE
 
* BATCH_TYPE
 
* IGNORE_SM_CHECK - turn off the default validation that the @RG SM tag matches the bam list sample name.
 
* IGNORE_SM_CHECK - turn off the default validation that the @RG SM tag matches the bam list sample name.
* IGNORE_REF_CHR_CHECK
+
* IGNORE_REF_CHR_CHECK - turn off the default validation that checks that all of the BAM's chromosomes are in the reference file - eventually we may update to just validate those in CHRS.
 
* OUT_DIR
 
* OUT_DIR
 
* BAM_LIST
 
* BAM_LIST
Line 76: Line 76:  
* UNIFORM_TARGET_BED
 
* UNIFORM_TARGET_BED
 
* OFFSET_OFF_TARET
 
* OFFSET_OFF_TARET
* CHRS
+
* CHRS - defines which chromosomes to run.
 
* UNIT_CHUNK
 
* UNIT_CHUNK
 
* NO_CRAM - do not allow CRAM files as input
 
* NO_CRAM - do not allow CRAM files as input
 
* MAKE_BASE_NAME_PIPE - base makefile name
 
* MAKE_BASE_NAME_PIPE - base makefile name
* MAKE_OPTS - otpions to pass to the make command that runs the jobs.
+
* MAKE_OPTS - options to pass to the make command that runs the jobs.
 
* BAM_DEPEND - set to TRUE if you want the BAM file to be included as a make dependency
 
* BAM_DEPEND - set to TRUE if you want the BAM file to be included as a make dependency
   Line 89: Line 89:  
* By default if a value is not defined in the section, it will check global.
 
* By default if a value is not defined in the section, it will check global.
    +
==== Configure Each Step ====
 +
'''Create a section for each step'''
 +
* Example: <code>[stepName1]</code>
 +
 +
 +
====Required keys for each step:====
 +
 +
# <code>DEPEND</code> - dependencies for this step
 +
#: Valid Values (separate multiple dependencies with a space):
 +
#:*<code>BAM</code>
 +
#:*Name of step that must complete prior to this step
 +
#:*PER_SAMPLE_BAM??? can only be BAM or PER_SAMPLE_BAM
 +
#<code>OUTPUT</code> - name of output file
 +
#* See below for temporary keys for step iteration
 +
#<code>CMD</code> - command for running the step
 +
#* See below for temporary keys for step iteration
 +
 +
 +
====Optional Step Settings:====
 +
General Settings:
 +
* <code>LOCAL</code> - run the step locally rather than on the cluster
 +
* <code>NEED_BAI</code> - Set if a step requires a BAI file
 +
** Per chromosome steps always require a BAI file
 +
** Tells GotCloud to fail if a BAI can't be found
 +
* <code>BAM_DEPEND</code> - Add the BAM file as a Makefile dependency for this step
 +
 +
Settings to limit which samples this step runs on:
 +
* <code>SAMPLES</code> - use this to define a step to run only for samples with a single BAM or multiple BAMs (merging)
 +
*: Possible values:
 +
*:* <code>MULTI_BAM</code> - run the step only for samples that have multiple BAMs
 +
*:* <code>SINGLE_BAM</code> - run the step only for samples that have one BAM
 +
*Deprecated settings - still in pipeline.pl and may or may not work:
 +
** <code>MULTI_ONLY</code> - set to non-blank if step should run if there are more than 1 input per output.
 +
** <code>SINGLE_ONLY</code> - set to non-blank if step should run if there is only 1 input per output.
    +
Joining multiple inputs for a single output:
 +
* Can occur if there are multiple dependencies
 +
* Can occur if a step runs at a more generic iteration level than a dependency
 +
* <code>INPUT_JOIN</code> - value to pass to perl "join" command for joining multiple inputs for each output.
 +
** Looks across all dependencies
 +
* <code>dependStepName_JOIN</code> - how to join the "dependStepName"'s output into the command line for a step that depends on it if there are multiple outputs per input of this step
 +
** Substitutes <code>?(${depend}/OUTPUT)</code> with perl "join" using the specified value to join multiple outputs for that dependency
   −
==== Configure Each Step ====
+
Log Output filenames
<ol>
+
* <code>FILELIST</code> - writes/appends the iteration's output file name into the specified file list.
<li>Create a section for each step
+
** Typically will be used in a later "merge" step
<ul><li> Example:
+
** See below for temporary keys for step iteration that can be used in this filename
<dd> <pre>[stepName1]</pre>
+
*** Temporary keys can be more general than those in OUTPUT, but cannot be more specific.
</li>
+
 
</ul>
+
 
<ol>
+
====Iterating a command for each Bam/Sample/Chromosome/Region====
<li>Set required keys for each step:
+
Temporary keys are used when iterating a command per BAM/sample/chromosome/region.
<ol>
+
* Specify using <code>?()</code> rather than <code>$()</code>
<li><code>DEPEND</code> - dependencies for this step
+
* Temporary keys can be used in:
<dd> Valid Values (separate multiple dependencies with a space):
+
** <code>OUTPUT</code>
<ul>
+
** <code>CMD</code>
<li><code>BAM</code></li>
+
** <code>FILELIST</code>
<li>Name of step that must complete prior to this step</li>
+
* They will be substituted as it iterates
<li></li>
+
* How to iterate a command is determined by the temporary keys in <code>OUTPUT</code>
</ul>
+
* Temporary Keys for determining iterations:
</li>
+
** <code>?(BAM)</code> - per BAM per sample
<li><code>OUTPUT</code> - name of output file
+
** <code>?(SAMPLE)</code> - per sample
</li>
+
** <code>?(CHR)</code> - per chromosome
<li><code>CMD</code> - command for running the step
+
** <code>?(START)</code> - Per region of a Chromosome (must also include <code>?(CHR)</code>):
</li>
+
* Additional Temporary Keys:
</ol>
+
** <code>?(END)</code> - end of the region - only used if <code>?(START)</code> is also specified.
</li>
+
** <code>?(INPUT)</code>
</ol>
+
** <code>?(${depend}/OUTPUT)</code>
</li>
+
 
</ol>
+
'''Notes:'''
 +
* Currently each step iteration will:
 +
** be its own Makefile target/.OK file
 +
** run independently on the cluster
 +
 
 +
== Command Line Parameters ==
 +
Required Parameters:
 +
* <code>--name</code> <pipelineName> - name of the pipeline to run
 +
* <code>--conf</code> <configuration file> - configuration file to use
 +
 
 +
NOTE: Currently, any "overrides" are for the global setting only - not for the pipeline/step.
 +
* this needs to be fixed so they can override the pipeline settings
 +
 
 +
Optional Parameters:
 +
* <code>--ignoreSmCheck</code> - overrides <code>IGNORE_SM_CHECK</code>
 +
* <code>--ignoreRefChrCheck</code> - overrides <code>IGNORE_REF_CHR_CHECK</code>
 +
* <code>--verbose</code> <number> - verbose value passed to the loadConf method
 +
 
 +
Optional Parameters like SnpCall:
 +
* <code>--numjobs|numjobs</code> <number> - number of jobs to run in parallel
 +
* <code>--maxlocaljobs</code> <number> - number of jobs to allow to run when batchtype is local (default 10) - does not validate for commands running LOCAL
 +
* <code>--region</code> <region to process> - like snpcall, specifies a single region to process
 +
* <code>--bam_list|list|bamlist|bam_index|bamindex</code> <bam list file> - overrides <code>BAM_LIST</code>, the list of sample bam files to process
 +
* <code>--out_dir|outdir</code> <output directory> - overrides <code>OUT_DIR</code>
 +
* <code>--batchtype</code> <type> - overrides <code>BATCHTYPE</code>
 +
* <code>--batchopts</code> <options> - overrides <code>BATCHOPTS</code>
 +
* <code>--chrs|chroms</code> <comma separated chromosomes> - overrides <code>CHRS</code> (CHRS is space separated - commas are converted to spaces)
 +
* <code>--ref_dir|refdir</code> <reference directory> - overrides <code>REF_DIR</code>
 +
* <code>--ref_prefix|refprefix</code> <prefix> - overrides <code>REF_PREFIX</code>
 +
* <code>--bam_prefix|bamprefix</code> <prefix> - overrides <code>BAM_PREFIX</code>
 +
* <code>--base_prefix|baseprefix</code> <prefix> - overrides <code>BASE_PREFIX</code>
 +
* <code>--gotcloudroot|gcroot</code> <path to gotcloud> - by default gotcloud root is determined from the path to the pipeline script, but this setting overrides that.
 +
* <code>--help</code> - print Usage
 +
* <code>--test</code> <test directory> - run the test code (just for indel right now)
 +
 
 +
Unused command line options:
 +
* In the code, but are not actually used:
 +
* <code>--keeptmp</code> - overrides <code>KEEP_TMP</code>
 +
* <code>--keeplog</code> - overrides <code>KEEP_LOG</code>
 +
 
 +
== Example Pipelines Created ==
 +
Look for sections & <code>STEPS</code> in the defaults.
 +
https://github.com/statgen/gotcloud/blob/master/bin/gotcloudDefaults.conf
 +
https://github.com/statgen/gotcloud/blob/alignPrep/bin/gotcloudDefaults.conf
61

edits

Navigation menu