Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 21: Line 21:  
This sub-pipeline takes in a single, recalibrated BAM file, creates an index file for it, and performs quality control (running qplot and verifyBamID). It differs from *bamQC* in that it does not require that the user already have a .bai file for the recalibrated BAM file.  
 
This sub-pipeline takes in a single, recalibrated BAM file, creates an index file for it, and performs quality control (running qplot and verifyBamID). It differs from *bamQC* in that it does not require that the user already have a .bai file for the recalibrated BAM file.  
   −
== recab ==  
+
== recab ==
 +
*What it does:
 +
# merge BAMs for samples that have multiple BAMs
 +
# dedup and recalibrate
 +
# index the recalibrated BAM
    +
====Inputs====
 +
* Bam files (stored in a [[#BAM_LIST|BAM_LIST]] file)
 +
* Reference files
 +
* (Optional) configuration file to override default options
 +
 +
=====BAM_LIST File=====
 +
* Each line of the BAM list file represents a single individual
 +
 +
Columns:
 +
# sample id
 +
# comma separated population labels (optional column)
 +
# BAM File 1 (preferable to have full paths to BAM files)
 +
# BAM File 2 (if more than 1 BAM per sample)
 +
:...
 +
 +
: # BAM File N (if more than 1 BAM per sample)
 +
[SAMPLE_ID]    [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...
 +
or
 +
[SAMPLE_ID] [BAM_FILE1] [BAM_FILE2] ...
 +
 +
* Notes:
 +
** tab delimited
 +
** multiple BAMs per individual may be provided, but should all be on the same line of the list file
 +
** population label is optional - it will default to <code>ALL</code>
 +
*** only used by Thunder (part of ldrefine pipeline)
 +
*** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample.
 +
 +
====Outputs====
 +
Upon successful completion of the *recab* sub-pipeline, you should see the following files/subdirectories under the user specified output directory:
 +
*'''recab/mergedBams/'''
 +
** '''''*/SAMPLE.merged.bam'' - a merged BAM file'''
 +
** ''*/SAMPLE.merged.bam.log'' - merge log
 +
** ''*/SAMPLE.merged.bam.OK'' - temp file indicating the merge step completed successfully
 +
 +
* '''recab/'''
 +
** '''''*/SAMPLE.recal.bam'' - a merged, recalibrated, and deduped BAM file'''
 +
** '''''*/SAMPLE.recal.bam.bai'' - an indexed version of the  merged, recalibrated, and deduped BAM file'''
 +
** ''*/SAMPLE.recal.bam.metrics'' - dedup & recalibration log
 +
** ''*/SAMPLE.recal.bam.qemp'' - recalibration tables
 +
** ''*/SAMPLE.recal.bam.done'' - temp file indicating the recalibration step completed successfully
 +
** ''*/SAMPLE.recal.bam.bai.done'' - temp file indicating the indexing step completed successfully
 +
You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *recab* sub-pipeline failed.
 +
 +
'''On success, the recab/ folder contains the final BAMs and bais.'''
 +
 +
===Command-Line and Configuration Options===
 +
 +
*Required Options
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Command-line Flag !! Configuration Key !! Value Description !! Default Value
 +
|-
 +
| --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File|BAM_LIST File]] || $(OUT_DIR)/bam.list
 +
|-
 +
| --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run)
 +
|}
 +
 +
*Common Options
 +
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! colspan="4" | Common Options
 +
|-
 +
! Command-line Flag !! Configuration Key !! Value Description !! Default Value
 +
|-
 +
| --outdir ''path'' || OUT_DIR || output directory ||
 +
|-
 +
| --conf ''file'' || || configuration file to use ||
 +
|-
 +
|  || REF_DIR || where the reference/resource files are stored || gotcloud.ref subdirectory within the base GotCloud directory
 +
|-
 +
| || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa
 +
|-
 +
| || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF Files|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz
 +
|}
 +
 +
==== Example Configuration File ====
 +
Example configuration file where reference files happen to be stored in /path/reference, and bam list file is stored in in path/freeze5
 +
BAM_LIST = /path/freeze5.bam.list
 +
OUT_DIR = /path/freeze5/output
 +
REF_DIR = /path/reference/
 +
REF = $(REF_DIR)/hs37d5.fa
 +
DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz
 +
 +
==== Example Command Line ====
 +
gotcloud pipe –-name recab --numjobs <N>
    
== recabQC ==  
 
== recabQC ==  
87

edits

Navigation menu