Line 92: |
Line 92: |
| | | |
| ==== Configure Each Step ==== | | ==== Configure Each Step ==== |
− | <ol>
| + | '''Create a section for each step''' |
− | <li>Create a section for each step
| + | * Example: <code>[stepName1]</code> |
− | <ul><li> Example: | + | |
− | <dd> <pre>[stepName1]</pre> | + | |
− | </li> | + | ====Required keys for each step:==== |
− | </ul> | + | |
− | <ol> | + | # <code>DEPEND</code> - dependencies for this step |
− | <li>Set required keys for each step: | + | #: Valid Values (separate multiple dependencies with a space): |
− | <ol> | + | #:*<code>BAM</code> |
− | <li><code>DEPEND</code> - dependencies for this step | + | #:*Name of step that must complete prior to this step |
− | <dd> Valid Values (separate multiple dependencies with a space): | + | #:*PER_SAMPLE_BAM??? can only be BAM or PER_SAMPLE_BAM |
− | <ul> | + | #<code>OUTPUT</code> - name of output file |
− | <li><code>BAM</code></li> | + | #* See below for temporary keys for step iteration |
− | <li>Name of step that must complete prior to this step</li> | + | #<code>CMD</code> - command for running the step |
− | <li></li> | + | #* See below for temporary keys for step iteration |
− | </ul> | + | |
− | </li> | + | |
− | <li><code>OUTPUT</code> - name of output file | + | ====Optional Step Settings:==== |
− | </li> | + | General Settings: |
− | <li><code>CMD</code> - command for running the step | + | * <code>LOCAL</code> - run the step locally rather than on the cluster |
− | </li> | + | * <code>NEED_BAI</code> - Set if a step requires a BAI file |
− | </ol> | + | ** Per chromosome steps always require a BAI file |
− | </li> | + | ** Tells GotCloud to fail if a BAI can't be found |
− | </ol> | + | * <code>BAM_DEPEND</code> - Add the BAM file as a Makefile dependency for this step |
− | </li> | + | |
− | </ol> | + | Settings to limit which samples this step runs on: |
| + | * <code>SAMPLES</code> - use this to define a step to run only for samples with a single BAM or multiple BAMs (merging) |
| + | *: Possible values: |
| + | *:* <code>MULTI_BAM</code> - run the step only for samples that have multiple BAMs |
| + | *:* <code>SINGLE_BAM</code> - run the step only for samples that have one BAM |
| + | *Deprecated settings - still in pipeline.pl and may or may not work: |
| + | ** <code>MULTI_ONLY</code> - set to non-blank if step should run if there are more than 1 input per output. |
| + | ** <code>SINGLE_ONLY</code> - set to non-blank if step should run if there is only 1 input per output. |
| + | |
| + | Joining multiple inputs for a single output: |
| + | * Can occur if there are multiple dependencies |
| + | * Can occur if a step runs at a more generic iteration level than a dependency |
| + | * <code>INPUT_JOIN</code> - value to pass to perl "join" command for joining multiple inputs for each output. |
| + | ** Looks across all dependencies |
| + | * <code>dependStepName_JOIN</code> - how to join the "dependStepName"'s output into the command line for a step that depends on it if there are multiple outputs per input of this step |
| + | ** Substitutes <code>?(${depend}/OUTPUT)</code> with perl "join" using the specified value to join multiple outputs for that dependency |
| + | |
| + | Log Output filenames |
| + | * <code>FILELIST</code> - writes/appends the iteration's output file name into the specified file list. |
| + | ** Typically will be used in a later "merge" step |
| + | ** See below for temporary keys for step iteration that can be used in this filename |
| + | *** Temporary keys can be more general than those in OUTPUT, but cannot be more specific. |
| + | |
| + | |
| + | ====Iterating a command for each Bam/Sample/Chromosome/Region==== |
| + | Temporary keys are used when iterating a command per BAM/sample/chromosome/region. |
| + | * Specify using <code>?()</code> rather than <code>$()</code> |
| + | * Temporary keys can be used in: |
| + | ** <code>OUTPUT</code> |
| + | ** <code>CMD</code> |
| + | ** <code>FILELIST</code> |
| + | * They will be substituted as it iterates |
| + | * How to iterate a command is determined by the temporary keys in <code>OUTPUT</code> |
| + | * Temporary Keys for determining iterations: |
| + | ** <code>?(BAM)</code> - per BAM per sample |
| + | ** <code>?(SAMPLE)</code> - per sample |
| + | ** <code>?(CHR)</code> - per chromosome |
| + | ** <code>?(START)</code> - Per region of a Chromosome (must also include <code>?(CHR)</code>): |
| + | * Additional Temporary Keys: |
| + | ** <code>?(END)</code> - end of the region - only used if <code>?(START)</code> is also specified. |
| + | ** <code>?(INPUT)</code> |
| + | ** <code>?(${depend}/OUTPUT)</code> |
| + | |
| + | '''Notes:''' |
| + | * Currently each step iteration will: |
| + | ** be its own Makefile target/.OK file |
| + | ** run independently on the cluster |