Line 21: |
Line 21: |
| This sub-pipeline takes in a single, recalibrated BAM file, creates an index file for it, and performs quality control (running qplot and verifyBamID). It differs from *bamQC* in that it does not require that the user already have a .bai file for the recalibrated BAM file. | | This sub-pipeline takes in a single, recalibrated BAM file, creates an index file for it, and performs quality control (running qplot and verifyBamID). It differs from *bamQC* in that it does not require that the user already have a .bai file for the recalibrated BAM file. |
| | | |
− | == recab == | + | == recab == |
| + | *What it does: |
| + | # merge BAMs for samples that have multiple BAMs |
| + | # dedup and recalibrate |
| + | # index the recalibrated BAM |
| | | |
| + | ====Inputs==== |
| + | * Bam files (stored in a [[#BAM_LIST|BAM_LIST]] file) |
| + | * Reference files |
| + | * (Optional) configuration file to override default options |
| + | |
| + | =====BAM_LIST File===== |
| + | * Each line of the BAM list file represents a single individual |
| + | |
| + | Columns: |
| + | # sample id |
| + | # comma separated population labels (optional column) |
| + | # BAM File 1 (preferable to have full paths to BAM files) |
| + | # BAM File 2 (if more than 1 BAM per sample) |
| + | :... |
| + | |
| + | : # BAM File N (if more than 1 BAM per sample) |
| + | [SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ... |
| + | or |
| + | [SAMPLE_ID] [BAM_FILE1] [BAM_FILE2] ... |
| + | |
| + | * Notes: |
| + | ** tab delimited |
| + | ** multiple BAMs per individual may be provided, but should all be on the same line of the list file |
| + | ** population label is optional - it will default to <code>ALL</code> |
| + | *** only used by Thunder (part of ldrefine pipeline) |
| + | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | ====Outputs==== |
| + | Upon successful completion of the *recab* sub-pipeline, you should see the following files/subdirectories under the user specified output directory: |
| + | *'''recab/mergedBams/''' |
| + | ** '''''*/SAMPLE.merged.bam'' - a merged BAM file''' |
| + | ** ''*/SAMPLE.merged.bam.log'' - merge log |
| + | ** ''*/SAMPLE.merged.bam.OK'' - temp file indicating the merge step completed successfully |
| + | |
| + | * '''recab/''' |
| + | ** '''''*/SAMPLE.recal.bam'' - a merged, recalibrated, and deduped BAM file''' |
| + | ** '''''*/SAMPLE.recal.bam.bai'' - an indexed version of the merged, recalibrated, and deduped BAM file''' |
| + | ** ''*/SAMPLE.recal.bam.metrics'' - dedup & recalibration log |
| + | ** ''*/SAMPLE.recal.bam.qemp'' - recalibration tables |
| + | ** ''*/SAMPLE.recal.bam.done'' - temp file indicating the recalibration step completed successfully |
| + | ** ''*/SAMPLE.recal.bam.bai.done'' - temp file indicating the indexing step completed successfully |
| + | You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *recab* sub-pipeline failed. |
| + | |
| + | '''On success, the recab/ folder contains the final BAMs and bais.''' |
| + | |
| + | ===Command-Line and Configuration Options=== |
| + | |
| + | *Required Options |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File|BAM_LIST File]] || $(OUT_DIR)/bam.list |
| + | |- |
| + | | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) |
| + | |} |
| + | |
| + | *Common Options |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! colspan="4" | Common Options |
| + | |- |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --outdir ''path'' || OUT_DIR || output directory || |
| + | |- |
| + | | --conf ''file'' || || configuration file to use || |
| + | |- |
| + | | || REF_DIR || where the reference/resource files are stored || gotcloud.ref subdirectory within the base GotCloud directory |
| + | |- |
| + | | || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa |
| + | |- |
| + | | || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF Files|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz |
| + | |} |
| + | |
| + | ==== Example Configuration File ==== |
| + | Example configuration file where reference files happen to be stored in /path/reference, and bam list file is stored in in path/freeze5 |
| + | BAM_LIST = /path/freeze5.bam.list |
| + | OUT_DIR = /path/freeze5/output |
| + | REF_DIR = /path/reference/ |
| + | REF = $(REF_DIR)/hs37d5.fa |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | |
| + | ==== Example Command Line ==== |
| + | gotcloud pipe –-name recab --numjobs <N> |
| | | |
| == recabQC == | | == recabQC == |