Open main menu

Genome Analysis Wiki β

GotCloud: Creating a New Pipeline

Revision as of 13:08, 30 June 2015 by Mktrost (talk | contribs) (Defining a New Pipeline)

Contents

Creating a New BAM Processing Pipeline

GotCloud allows you to configure new basic BAM processing pipelines via configuration.

To define new processing pipelines, you will use Configuration sections to define both the pipeline and each of the steps. So first you need to understand how configuration sections work.

GotCloud Configuration Sections

GotCloud configuration files can be broken into sections:

  • Section names are specified between square brakets ([])
    [sectionName]
    • Any configuration settings specified after the section header belong to that section
    • A section can be specified multiple times in the file and the configuration settings are accumulated
    • To access a value for a key defined in another section, use $(otherSectionName/keyName)


  • If a section is not specified, the configuration settings belong to the global section
    • The global section does not need to be specified at the beginning of the file (it is the default section).
    • Additional global settings can be set later in the file after other settings, by defining the explicitly section:
      [global]


  • Sections can be derived from another section
    • All sections automatically derive from [global]
    • A derived section inherits all the configuration settings from its parent sections
      • Parent settings are overridden by redefining the configuration key/value pair
    • A parent section is specified following a semicolon : on the section definition line:
      [childSectionName] : parentSectionName


  • Section specific configuration settings are specified on the lines following the section definition:
    [section1]
    KEY1 = VAL1
    KEY2 = VAL2
    
    [section2]
    KEY1 = VAL1_2
    KEY3 = VAL3

Defining a New Pipeline

There are 2 parts for creating a new pipeline

  1. Overall Pipeline Definition
    • Basics for the overall pipeline
    • NOTE: Curretnly, configurations set in the overall pipeline's section do not by default pass onto the step's configurations
  2. Configure Each Step

Overall Pipeline Definition

  1. Define a new configuration section for your pipeline
    • Example:
    [pipelineName]
  2. Define the steps in this pipeline using the key STEPS under that section
    • Example:
    [pipelineName]
    STEPS = stepName1 stepName2 stepName3
    • Note: each step must have its own configuration section

Optional Overall Pipeline Settings:

  • BATCH_OPTS
  • BATCH_TYPE
  • IGNORE_SM_CHECK - turn off the default validation that the @RG SM tag matches the bam list sample name.
  • IGNORE_REF_CHR_CHECK
  • OUT_DIR
  • BAM_LIST
  • REF
  • REF_FAI
  • MULTIPLE_TARGET_MAP
  • UNIFORM_TARGET_BED
  • OFFSET_OFF_TARET
  • CHRS
  • UNIT_CHUNK
  • NO_CRAM - do not allow CRAM files as input
  • MAKE_BASE_NAME_PIPE - base makefile name
  • MAKE_OPTS - otpions to pass to the make command that runs the jobs.
  • BAM_DEPEND - set to TRUE if you want the BAM file to be included as a make dependency


NOTES:

  • The BAM_LIST file can contain config values within it - the overall pipeline section will be checked for those config values.
  • By default if a value is not defined in the section, it will check global.


Configure Each Step

  1. Create a section for each step
    • Example:
      [stepName1]
    1. Set required keys for each step:
      1. DEPEND - dependencies for this step
        Valid Values (separate multiple dependencies with a space):
        • BAM
        • Name of step that must complete prior to this step
      2. OUTPUT - name of output file
      3. CMD - command for running the step