Line 28: |
Line 28: |
| | | |
| ====Inputs==== | | ====Inputs==== |
− | * Bam files (stored in a [[#BAM_LIST|BAM_LIST]] file) | + | * Bam files (stored in a [[#BAM_LIST File for recab|BAM_LIST File]]) |
| * Reference files | | * Reference files |
| * (Optional) configuration file to override default options | | * (Optional) configuration file to override default options |
| | | |
− | =====BAM_LIST File===== | + | =====BAM_LIST File for recab===== |
| * Each line of the BAM list file represents a single individual | | * Each line of the BAM list file represents a single individual |
| | | |
Line 78: |
Line 78: |
| ! Command-line Flag !! Configuration Key !! Value Description !! Default Value | | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| |- | | |- |
− | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File|BAM_LIST File]] || $(OUT_DIR)/bam.list | + | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File for recab|BAM_LIST File]] || $(OUT_DIR)/bam.list |
| |- | | |- |
| | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) | | | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) |
Line 119: |
Line 119: |
| | | |
| ====Inputs==== | | ====Inputs==== |
− | * Bam files (stored in a [[#BAM_LIST|BAM_LIST]] file) | + | * Bam files (stored in a [[#BAM_LIST File for recabQC|BAM_LIST]] file) |
| * Reference files | | * Reference files |
| * (Optional) configuration file to override default options | | * (Optional) configuration file to override default options |
| | | |
− | =====BAM_LIST File===== | + | =====BAM_LIST File for recabQC===== |
| * Each line of the BAM list file represents a single individual | | * Each line of the BAM list file represents a single individual |
| | | |
Line 146: |
Line 146: |
| | | |
| ====Outputs==== | | ====Outputs==== |
− | Upon successful completion of the *recab* sub-pipeline, you should see the following files/subdirectories under the user specified output directory: | + | Upon successful completion of the *recabQC* sub-pipeline, you should see the following files/subdirectories under the user specified output directory: |
| *'''recab/mergedBams/''' - contains merge results | | *'''recab/mergedBams/''' - contains merge results |
| ** '''''*/SAMPLE.merged.bam'' - a merged BAM file''' | | ** '''''*/SAMPLE.merged.bam'' - a merged BAM file''' |
Line 162: |
Line 162: |
| * '''QCFiles/''' - contains quality control results | | * '''QCFiles/''' - contains quality control results |
| ** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information | | ** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information |
− | *** */SAMPLE.genoCheck.depthRG - depth distribution of the sequence reads per read group | + | *** ''*/SAMPLE.genoCheck.depthRG'' - depth distribution of the sequence reads per read group |
− | *** */SAMPLE.genoCheck.depthSM - depth distribution of the sequence reads per sample | + | *** ''*/SAMPLE.genoCheck.depthSM'' - depth distribution of the sequence reads per sample |
| + | *** ''*/SAMPLE.genoCheck.err'' - log file |
| + | *** ''*/SAMPLE.genoCheck.log'' - log file |
| *** ''*/SAMPLE.genoCheck.OK'' - temp file indicating the VerifyBAMID step completed successfully | | *** ''*/SAMPLE.genoCheck.OK'' - temp file indicating the VerifyBAMID step completed successfully |
− | *** */SAMPLE.genoCheck.selfRG - per-readGroup statistics describing how well each lane matches to the annotated sample | + | *** ''*/SAMPLE.genoCheck.selfRG'' - per-readGroup statistics describing how well each lane matches to the annotated sample |
− | *** '''*/SAMPLE.genoCheck.selfSM''' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample | + | *** '''''*/SAMPLE.genoCheck.selfSM'' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample''' |
| **** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better | | **** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better |
| **** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination | | **** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination |
| ** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results | | ** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results |
− | *** ''*.qplot.done'' - temp file indicating this step completed successfully | + | *** ''*/SAMPLE.qplot.OK'' - temp file indicating the qplot step completed successfully |
− | *** '''*.qplot.R''' - Rscript that can be used to generate the pdf graphs | + | *** '''''*/SAMPLE.qplot.R'' - Rscript that can be used to generate the pdf graphs''' |
− | *** '''*.qplot.stats''' - sample statistics | + | *** '''''*/SAMPLE.qplot.stats'' - sample statistics''' |
| + | |
| + | You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *recabQC* sub-pipeline failed. |
| + | |
| + | '''On success, the recab/ folder contains the final BAMs and bais, while the QCFiles/ folder contains the quality control output''' |
| + | |
| + | ===Command-Line and Configuration Options=== |
| + | |
| + | *Required Options |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File for recabQC|BAM_LIST File]] || $(OUT_DIR)/bam.list |
| + | |- |
| + | | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) |
| + | |} |
| + | |
| + | *Common Options |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --outdir ''path'' || OUT_DIR || output directory || |
| + | |- |
| + | | --conf ''file'' || || configuration file to use || |
| + | |- |
| + | | || REF_DIR || where the reference/resource files are stored || gotcloud.ref subdirectory within the base GotCloud directory |
| + | |- |
| + | | || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa |
| + | |- |
| + | | || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF File|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz |
| + | |- |
| + | | || HM3_VCF || [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF File|HapMap3 VCF Files]] || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
| + | |} |
| | | |
− | You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *recab* sub-pipeline failed. | + | ==== Example Configuration File ==== |
| + | Example configuration file where reference files happen to be stored in /path/reference, and bam list file is stored in in path/freeze5 |
| + | BAM_LIST = /path/freeze5.bam.list |
| + | OUT_DIR = /path/freeze5/output |
| + | REF_DIR = /path/reference/ |
| + | REF = $(REF_DIR)/hs37d5.fa |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| + | |
| + | ==== Example Command Line ==== |
| + | gotcloud pipe –-name recabQC --numjobs <N> |
| + | |
| + | == bamQC == |
| + | *What it does: |
| + | # qplot |
| + | # verifyBamID |
| + | |
| + | ====Inputs==== |
| + | * Single merged, recalibrated, and deduped BAM file for each subject (stored in a [[#BAM_LIST File for bamQC|BAM_LIST File]]) |
| + | * BAI file for each subject |
| + | * Reference files |
| + | * (Optional) configuration file to override default options |
| + | |
| + | =====BAM_LIST File for bamQC===== |
| + | * Each line of the BAM list file represents a single individual |
| + | |
| + | Columns: |
| + | # sample id |
| + | # comma separated population labels (optional column) |
| + | # BAM File (preferable to have full path to BAM file) |
| + | # BAI File (preferable to have full path to BAI file) |
| + | |
| + | [SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE] [BAI_FILE] |
| + | or |
| + | [SAMPLE_ID] [BAM_FILE] [BAI_FILE] |
| + | |
| + | * Notes: |
| + | ** tab delimited |
| + | ** population label is optional - it will default to <code>ALL</code> |
| + | *** only used by Thunder (part of ldrefine pipeline) |
| + | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | ====Outputs==== |
| + | Upon successful completion of the *bamQC* sub-pipeline, you should see the following files/subdirectories under the user specified output directory: |
| + | |
| + | * '''QCFiles/''' - contains quality control results |
| + | ** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information |
| + | *** ''*/SAMPLE.genoCheck.depthRG'' - depth distribution of the sequence reads per read group |
| + | *** ''*/SAMPLE.genoCheck.depthSM'' - depth distribution of the sequence reads per sample |
| + | *** ''*/SAMPLE.genoCheck.err'' - log file |
| + | *** ''*/SAMPLE.genoCheck.log'' - log file |
| + | *** ''*/SAMPLE.genoCheck.OK'' - temp file indicating the VerifyBAMID step completed successfully |
| + | *** ''*/SAMPLE.genoCheck.selfRG'' - per-readGroup statistics describing how well each lane matches to the annotated sample |
| + | *** '''''*/SAMPLE.genoCheck.selfSM'' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample''' |
| + | **** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better |
| + | **** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination |
| + | ** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results |
| + | *** ''*/SAMPLE.qplot.OK'' - temp file indicating the qplot step completed successfully |
| + | *** '''''*/SAMPLE.qplot.R'' - Rscript that can be used to generate the pdf graphs''' |
| + | *** '''''*/SAMPLE.qplot.stats'' - sample statistics''' |
| + | |
| + | You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *bamQC* sub-pipeline failed. |
| | | |
− | '''On success, the recab/ folder contains the final BAMs and bais.''' | + | '''On success, the QCFiles/ folder contains the quality control output''' |
| | | |
| ===Command-Line and Configuration Options=== | | ===Command-Line and Configuration Options=== |
Line 184: |
Line 280: |
| ! Command-line Flag !! Configuration Key !! Value Description !! Default Value | | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| |- | | |- |
− | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File|BAM_LIST File]] || $(OUT_DIR)/bam.list | + | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File for bamQC|BAM_LIST File]] || $(OUT_DIR)/bam.list |
| |- | | |- |
| | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) | | | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) |
Line 202: |
Line 298: |
| | || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa | | | || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa |
| |- | | |- |
− | | || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF Files|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz | + | | || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF File|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz |
| + | |- |
| + | | || HM3_VCF || [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF File|HapMap3 VCF Files]] || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
| |} | | |} |
| | | |
Line 212: |
Line 310: |
| REF = $(REF_DIR)/hs37d5.fa | | REF = $(REF_DIR)/hs37d5.fa |
| DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz | | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| | | |
| ==== Example Command Line ==== | | ==== Example Command Line ==== |
− | gotcloud pipe –-name recab --numjobs <N> | + | gotcloud pipe –-name bamQC --numjobs <N> |
| | | |
− | == bamQC ==
| |
| == bamQC_createIndex == | | == bamQC_createIndex == |
| + | *What it does: |
| + | # creates a BAI file for any BAM that is missing it |
| + | # qplot |
| + | # verifyBamID |
| + | |
| + | ====Inputs==== |
| + | * Single merged, recalibrated, and deduped BAM file for each subject (stored in a [[#BAM_LIST File for bamQC_createIndex|BAM_LIST File]]) |
| + | * Reference files |
| + | * (Optional) configuration file to override default options |
| + | |
| + | =====BAM_LIST File for bamQC_createIndex===== |
| + | * Each line of the BAM list file represents a single individual |
| + | |
| + | Columns: |
| + | # sample id |
| + | # comma separated population labels (optional column) |
| + | # BAM File (preferable to have full paths to BAM files) |
| + | |
| + | [SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE] |
| + | or |
| + | [SAMPLE_ID] [BAM_FILE] |
| + | |
| + | * Notes: |
| + | ** tab delimited |
| + | ** population label is optional - it will default to <code>ALL</code> |
| + | *** only used by Thunder (part of ldrefine pipeline) |
| + | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | ====Outputs==== |
| + | Upon successful completion of the *bamQC_createIndex* sub-pipeline, you should see the following files/subdirectories under the user specified output directory: |
| + | * A BAI file with the exact same path and name as the BAM file that was input, with *.bai on the end |
| + | * '''QCFiles/''' - contains quality control results |
| + | ** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information |
| + | *** ''*/SAMPLE.genoCheck.depthRG'' - depth distribution of the sequence reads per read group |
| + | *** ''*/SAMPLE.genoCheck.depthSM'' - depth distribution of the sequence reads per sample |
| + | *** ''*/SAMPLE.genoCheck.err'' - log file |
| + | *** ''*/SAMPLE.genoCheck.log'' - log file |
| + | *** ''*/SAMPLE.genoCheck.OK'' - temp file indicating the VerifyBAMID step completed successfully |
| + | *** ''*/SAMPLE.genoCheck.selfRG'' - per-readGroup statistics describing how well each lane matches to the annotated sample |
| + | *** '''''*/SAMPLE.genoCheck.selfSM'' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample''' |
| + | **** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better |
| + | **** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination |
| + | ** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results |
| + | *** ''*/SAMPLE.qplot.OK'' - temp file indicating the qplot step completed successfully |
| + | *** '''''*/SAMPLE.qplot.R'' - Rscript that can be used to generate the pdf graphs''' |
| + | *** '''''*/SAMPLE.qplot.stats'' - sample statistics''' |
| + | |
| + | You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *bamQC_createIndex* sub-pipeline failed. |
| + | |
| + | '''On success, the QCFiles/ folder contains the quality control output''' |
| + | |
| + | ===Command-Line and Configuration Options=== |
| + | |
| + | *Required Options |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File for bamQC_createIndex|BAM_LIST File]] || $(OUT_DIR)/bam.list |
| + | |- |
| + | | --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run) |
| + | |} |
| + | |
| + | *Common Options |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Command-line Flag !! Configuration Key !! Value Description !! Default Value |
| + | |- |
| + | | --outdir ''path'' || OUT_DIR || output directory || |
| + | |- |
| + | | --conf ''file'' || || configuration file to use || |
| + | |- |
| + | | || REF_DIR || where the reference/resource files are stored || gotcloud.ref subdirectory within the base GotCloud directory |
| + | |- |
| + | | || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa |
| + | |- |
| + | | || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF File|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz |
| + | |- |
| + | | || HM3_VCF || [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF File|HapMap3 VCF Files]] || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
| + | |} |
| + | |
| + | ==== Example Configuration File ==== |
| + | Example configuration file where reference files happen to be stored in /path/reference, and bam list file is stored in in path/freeze5 |
| + | BAM_LIST = /path/freeze5.bam.list |
| + | OUT_DIR = /path/freeze5/output |
| + | REF_DIR = /path/reference/ |
| + | REF = $(REF_DIR)/hs37d5.fa |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| + | |
| + | ==== Example Command Line ==== |
| + | gotcloud pipe –-name bamQC_createIndex --numjobs <N> |