Changes

5,797 bytes added , 14:55, 31 August 2015

→‎Command Line Parameters: fix formatting about BAM_LIST

Line 47: Line 47:

# [[#Overall Pipeline Definition|Overall Pipeline Definition]]

#* Basics for the overall pipeline

−

#* '''NOTE: ~~Curretnly~~, configurations set in the overall pipeline's section do not by default pass onto the step's configurations'''

+

#* '''NOTE: Currently, configurations set in the overall pipeline's section do not by default pass onto the step's configurations'''

# [[#Configure Each Step|Configure Each Step]]

Line 68: Line 68:

* BATCH_TYPE

* IGNORE_SM_CHECK - turn off the default validation that the @RG SM tag matches the bam list sample name.

−

* IGNORE_REF_CHR_CHECK

+

* IGNORE_REF_CHR_CHECK - turn off the default validation that checks that all of the BAM's chromosomes are in the reference file - eventually we may update to just validate those in CHRS.

* OUT_DIR

* BAM_LIST

Line 76: Line 76:

* UNIFORM_TARGET_BED

* OFFSET_OFF_TARET

−

* CHRS

+

* CHRS - defines which chromosomes to run.

* UNIT_CHUNK

* NO_CRAM - do not allow CRAM files as input

* MAKE_BASE_NAME_PIPE - base makefile name

−

* MAKE_OPTS - ~~otpions~~ to pass to the make command that runs the jobs.

+

* MAKE_OPTS - options to pass to the make command that runs the jobs.

* BAM_DEPEND - set to TRUE if you want the BAM file to be included as a make dependency

Line 89: Line 89:

* By default if a value is not defined in the section, it will check global.

+

==== Configure Each Step ====

+

'''Create a section for each step'''

+

* Example: <code>[stepName1]</code>

+

====Required keys for each step:====

+

# <code>DEPEND</code> - dependencies for this step

+

#: Valid Values (separate multiple dependencies with a space):

+

#:*<code>BAM</code>

+

#:*Name of step that must complete prior to this step

+

#:*PER_SAMPLE_BAM??? can only be BAM or PER_SAMPLE_BAM

+

#<code>OUTPUT</code> - name of output file

+

#* See below for temporary keys for step iteration

+

#<code>CMD</code> - command for running the step

+

#* See below for temporary keys for step iteration

+

====Optional Step Settings:====

+

General Settings:

+

* <code>LOCAL</code> - run the step locally rather than on the cluster

+

* <code>NEED_BAI</code> - Set if a step requires a BAI file

+

** Per chromosome steps always require a BAI file

+

** Tells GotCloud to fail if a BAI can't be found

+

* <code>BAM_DEPEND</code> - Add the BAM file as a Makefile dependency for this step

+

Settings to limit which samples this step runs on:

+

* <code>SAMPLES</code> - use this to define a step to run only for samples with a single BAM or multiple BAMs (merging)

+

*: Possible values:

+

*:* <code>MULTI_BAM</code> - run the step only for samples that have multiple BAMs

+

*:* <code>SINGLE_BAM</code> - run the step only for samples that have one BAM

+

*Deprecated settings - still in pipeline.pl and may or may not work:

+

** <code>MULTI_ONLY</code> - set to non-blank if step should run if there are more than 1 input per output.

+

** <code>SINGLE_ONLY</code> - set to non-blank if step should run if there is only 1 input per output.

+

Joining multiple inputs for a single output:

+

* Can occur if there are multiple dependencies

+

* Can occur if a step runs at a more generic iteration level than a dependency

+

* <code>INPUT_JOIN</code> - value to pass to perl "join" command for joining multiple inputs for each output.

+

** Looks across all dependencies

+

* <code>dependStepName_JOIN</code> - how to join the "dependStepName"'s output into the command line for a step that depends on it if there are multiple outputs per input of this step

+

** Substitutes <code>?(${depend}/OUTPUT)</code> with perl "join" using the specified value to join multiple outputs for that dependency

−

==== ~~Configure Each Step~~ ====

+

Log Output filenames

−

<ol>

+

* <code>FILELIST</code> - writes/appends the iteration's output file name into the specified file list.

−

<li>~~Create~~ a ~~section~~ for ~~each step~~

+

** Typically will be used in a later "merge" step

−

<ul><li> ~~Example:~~

+

** See below for temporary keys for step iteration that can be used in this filename

−

<dd> <~~pre~~>~~[stepName1]~~</~~pre~~>

+

*** Temporary keys can be more general than those in OUTPUT, but cannot be more specific.

−

</li>

+

−

</ul>

+

−

<ol>

+

====Iterating a command for each Bam/Sample/Chromosome/Region====

−

<li>~~Set required keys for~~ each step:

+

Temporary keys are used when iterating a command per BAM/sample/chromosome/region.

−

<ol>

+

* Specify using <code>?()</code> rather than <code>$()</code>

−

<li><code>~~DEPEND~~</code> - ~~dependencies~~ for this ~~step~~

+

* Temporary keys can be used in:

−

<dd> ~~Valid Values (separate multiple dependencies with a space)~~:

+

** <code>OUTPUT</code>

−

<ul>

+

** <code>CMD</code>

−

+

** <code>FILELIST</code>

−

<li>~~Name~~ of ~~step that must complete prior~~ to ~~this step~~</li>

+

* They will be substituted as it iterates

−

+

* How to iterate a command is determined by the temporary keys in <code>OUTPUT</code>

−

</ul>

+

* Temporary Keys for determining iterations:

−

</li>

+

** <code>?(BAM)</code> - per BAM per sample

−

<li><code>~~OUTPUT~~</code> - ~~name of output file~~

+

** <code>?(SAMPLE)</code> - per sample

−

</li>

+

** <code>?(CHR)</code> - per chromosome

−

<li><code>~~CMD~~</code> - ~~command for running~~ the ~~step~~

+

** <code>?(START)</code> - Per region of a Chromosome (must also include <code>?(CHR)</code>):

−

</li>

+

* Additional Temporary Keys:

−

</ol>

+

** <code>?(END)</code> - end of the region - only used if <code>?(START)</code> is also specified.

−

</li>

+

** <code>?(INPUT)</code>

−

</ol>

+

** <code>?(${depend}/OUTPUT)</code>

−

</li>

+

−

</ol>

+

'''Notes:'''

+

* Currently each step iteration will:

+

** be its own Makefile target/.OK file

+

** run independently on the cluster

+

== Command Line Parameters ==

+

Required Parameters:

+

* <code>--name</code> <pipelineName> - name of the pipeline to run

+

* <code>--conf</code> <configuration file> - configuration file to use

+

NOTE: Currently, any "overrides" are for the global setting only - not for the pipeline/step.

+

* this needs to be fixed so they can override the pipeline settings

+

Optional Parameters:

+

* <code>--ignoreSmCheck</code> - overrides <code>IGNORE_SM_CHECK</code>

+

* <code>--ignoreRefChrCheck</code> - overrides <code>IGNORE_REF_CHR_CHECK</code>

+

* <code>--verbose</code> <number> - verbose value passed to the loadConf method

+

Optional Parameters like SnpCall:

+

* <code>--numjobs|numjobs</code> <number> - number of jobs to run in parallel

+

* <code>--maxlocaljobs</code> <number> - number of jobs to allow to run when batchtype is local (default 10) - does not validate for commands running LOCAL

+

* <code>--region</code> <region to process> - like snpcall, specifies a single region to process

+

* <code>--bam_list|list|bamlist|bam_index|bamindex</code> <bam list file> - overrides <code>BAM_LIST</code>, the list of sample bam files to process

+

* <code>--out_dir|outdir</code> <output directory> - overrides <code>OUT_DIR</code>

+

* <code>--batchtype</code> <type> - overrides <code>BATCHTYPE</code>

+

* <code>--batchopts</code> <options> - overrides <code>BATCHOPTS</code>

+

* <code>--chrs|chroms</code> <comma separated chromosomes> - overrides <code>CHRS</code> (CHRS is space separated - commas are converted to spaces)

+

* <code>--ref_dir|refdir</code> <reference directory> - overrides <code>REF_DIR</code>

+

* <code>--ref_prefix|refprefix</code> <prefix> - overrides <code>REF_PREFIX</code>

+

* <code>--bam_prefix|bamprefix</code> <prefix> - overrides <code>BAM_PREFIX</code>

+

* <code>--base_prefix|baseprefix</code> <prefix> - overrides <code>BASE_PREFIX</code>

+

* <code>--gotcloudroot|gcroot</code> <path to gotcloud> - by default gotcloud root is determined from the path to the pipeline script, but this setting overrides that.

+

* <code>--help</code> - print Usage

+

* <code>--test</code> <test directory> - run the test code (just for indel right now)

+

Unused command line options:

+

* In the code, but are not actually used:

+

* <code>--keeptmp</code> - overrides <code>KEEP_TMP</code>

+

* <code>--keeplog</code> - overrides <code>KEEP_LOG</code>

+

== Example Pipelines Created ==

+

Look for sections & <code>STEPS</code> in the defaults.

+

https://github.com/statgen/gotcloud/blob/master/bin/gotcloudDefaults.conf

+

https://github.com/statgen/gotcloud/blob/alignPrep/bin/gotcloudDefaults.conf

Pjvh

61

edits

Changes

GotCloud: Creating a New Pipeline (view source)

Revision as of 14:55, 31 August 2015

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools