Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 92: Line 92:     
==== Configure Each Step ====
 
==== Configure Each Step ====
<ol>
+
'''Create a section for each step'''
<li>Create a section for each step
+
* Example: <code>[stepName1]</code>
<ul><li> Example:
+
 
<dd> <pre>[stepName1]</pre>
+
 
</li>
+
====Required keys for each step:====
</ul>
+
 
<ol>
+
# <code>DEPEND</code> - dependencies for this step
<li>Set required keys for each step:
+
#: Valid Values (separate multiple dependencies with a space):
<ol>
+
#:*<code>BAM</code>
<li><code>DEPEND</code> - dependencies for this step
+
#:*Name of step that must complete prior to this step
<dd> Valid Values (separate multiple dependencies with a space):
+
#:*PER_SAMPLE_BAM??? can only be BAM or PER_SAMPLE_BAM
<ul>
+
#<code>OUTPUT</code> - name of output file
<li><code>BAM</code></li>
+
#* See below for temporary keys for step iteration
<li>Name of step that must complete prior to this step</li>
+
#<code>CMD</code> - command for running the step
<li></li>
+
#* See below for temporary keys for step iteration
</ul>
+
 
</li>
+
 
<li><code>OUTPUT</code> - name of output file
+
====Optional Step Settings:====
</li>
+
General Settings:
<li><code>CMD</code> - command for running the step
+
* <code>LOCAL</code> - run the step locally rather than on the cluster
</li>
+
* <code>NEED_BAI</code> - Set if a step requires a BAI file
</ol>
+
** Per chromosome steps always require a BAI file
</li>
+
** Tells GotCloud to fail if a BAI can't be found
</ol>
+
* <code>BAM_DEPEND</code> - Add the BAM file as a Makefile dependency for this step
</li>
+
 
</ol>
+
Settings to limit which samples this step runs on:
 +
* <code>SAMPLES</code> - use this to define a step to run only for samples with a single BAM or multiple BAMs (merging)
 +
*: Possible values:
 +
*:* <code>MULTI_BAM</code> - run the step only for samples that have multiple BAMs
 +
*:* <code>SINGLE_BAM</code> - run the step only for samples that have one BAM
 +
*Deprecated settings - still in pipeline.pl and may or may not work:
 +
** <code>MULTI_ONLY</code> - set to non-blank if step should run if there are more than 1 input per output.
 +
** <code>SINGLE_ONLY</code> - set to non-blank if step should run if there is only 1 input per output.
 +
 
 +
Joining multiple inputs for a single output:
 +
* Can occur if there are multiple dependencies
 +
* Can occur if a step runs at a more generic iteration level than a dependency
 +
* <code>INPUT_JOIN</code> - value to pass to perl "join" command for joining multiple inputs for each output.
 +
** Looks across all dependencies
 +
* <code>dependStepName_JOIN</code> - how to join the "dependStepName"'s output into the command line for a step that depends on it if there are multiple outputs per input of this step
 +
** Substitutes <code>?(${depend}/OUTPUT)</code> with perl "join" using the specified value to join multiple outputs for that dependency
 +
 
 +
Log Output filenames
 +
* <code>FILELIST</code> - writes/appends the iteration's output file name into the specified file list.
 +
** Typically will be used in a later "merge" step
 +
** See below for temporary keys for step iteration that can be used in this filename
 +
*** Temporary keys can be more general than those in OUTPUT, but cannot be more specific.
 +
 
 +
 
 +
====Iterating a command for each Bam/Sample/Chromosome/Region====
 +
Temporary keys are used when iterating a command per BAM/sample/chromosome/region.
 +
* Specify using <code>?()</code> rather than <code>$()</code>
 +
* Temporary keys can be used in:
 +
** <code>OUTPUT</code>
 +
** <code>CMD</code>
 +
** <code>FILELIST</code>
 +
* They will be substituted as it iterates
 +
* How to iterate a command is determined by the temporary keys in <code>OUTPUT</code>
 +
* Temporary Keys for determining iterations:
 +
** <code>?(BAM)</code> - per BAM per sample
 +
** <code>?(SAMPLE)</code> - per sample
 +
** <code>?(CHR)</code> - per chromosome
 +
** <code>?(START)</code> - Per region of a Chromosome (must also include <code>?(CHR)</code>):
 +
* Additional Temporary Keys:
 +
** <code>?(END)</code> - end of the region - only used if <code>?(START)</code> is also specified.
 +
** <code>?(INPUT)</code>
 +
** <code>?(${depend}/OUTPUT)</code>
 +
 
 +
'''Notes:'''
 +
* Currently each step iteration will:
 +
** be its own Makefile target/.OK file
 +
** run independently on the cluster

Navigation menu