Open main menu

Genome Analysis Wiki β

Changes

GotCloud: Creating a New Pipeline

3,041 bytes added, 14:12, 30 June 2015
Configure Each Step
==== Configure Each Step ====
<ol><li>'''Create a section for each step'''* Example: <ulcode>[stepName1]<li/code> Example  ====Required keys for each step:==== # <ddcode> DEPEND<pre/code>[stepName1]- dependencies for this step#: Valid Values (separate multiple dependencies with a space):#:*<code>BAM</precode>#:*Name of step that must complete prior to this step#:*PER_SAMPLE_BAM??? can only be BAM or PER_SAMPLE_BAM#</licode>OUTPUT</ulcode>- name of output file#* See below for temporary keys for step iteration#<olcode>CMD<li/code>Set required - command for running the step#* See below for temporary keys for each stepiteration  ====Optional Step Settings:====General Settings:* <olcode>LOCAL</code> - run the step locally rather than on the cluster* <licode>NEED_BAI</code>DEPEND- Set if a step requires a BAI file** Per chromosome steps always require a BAI file** Tells GotCloud to fail if a BAI can't be found* <code>BAM_DEPEND</code> - dependencies Add the BAM file as a Makefile dependency for this step Settings to limit which samples this step runs on:* <code>SAMPLES<dd/code> Valid Values (separate multiple dependencies - use this to define a step to run only for samples with a spacesingle BAM or multiple BAMs (merging)*: Possible values:*:* <ulcode>MULTI_BAM</code> - run the step only for samples that have multiple BAMs*:* <licode>SINGLE_BAM</code>- run the step only for samples that have one BAM*Deprecated settings - still in pipeline.pl and may or may not work:** <code>MULTI_ONLY</code>- set to non-blank if step should run if there are more than 1 input per output.** <code>SINGLE_ONLY</licode>- set to non-blank if step should run if there is only 1 input per output. Joining multiple inputs for a single output:* Can occur if there are multiple dependencies* Can occur if a step runs at a more generic iteration level than a dependency* <code>INPUT_JOIN<li/code>Name of - value to pass to perl "join" command for joining multiple inputs for each output.** Looks across all dependencies* <code>dependStepName_JOIN</code> - how to join the "dependStepName"'s output into the command line for a step that must complete prior to depends on it if there are multiple outputs per input of this step** Substitutes <code>?(${depend}/liOUTPUT)</code>with perl "join" using the specified value to join multiple outputs for that dependency Log Output filenames* <code>FILELIST</code> - writes/appends the iteration's output file name into the specified file list.** Typically will be used in a later "merge" step** See below for temporary keys for step iteration that can be used in this filename*** Temporary keys can be more general than those in OUTPUT, but cannot be more specific.  ====Iterating a command for each Bam/Sample/Chromosome/Region====Temporary keys are used when iterating a command per BAM/sample/chromosome/region.* Specify using <licode>?()</licode>rather than <code>$()</code>* Temporary keys can be used in:** <code>OUTPUT</ulcode>** <code>CMD</licode>** <licode>FILELIST</code>* They will be substituted as it iterates* How to iterate a command is determined by the temporary keys in <code>OUTPUT</code> - name of output file* Temporary Keys for determining iterations:** <code>?(BAM)</licode>- per BAM per sample** <licode>?(SAMPLE)</code>CMD- per sample** <code>?(CHR)</code> - command for running the stepper chromosome** <code>?(START)</licode> - Per region of a Chromosome (must also include <code>?(CHR)</olcode>):* Additional Temporary Keys:** <code>?(END)</licode> - end of the region - only used if <code>?(START)</olcode>is also specified.** <code>?(INPUT)</licode>** <code>?(${depend}/olOUTPUT)</code'''Notes:'''* Currently each step iteration will:** be its own Makefile target/.OK file** run independently on the cluster