Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 314: Line 314:     
== bamQC_createIndex ==
 
== bamQC_createIndex ==
 +
*What it does:
 +
# creates a BAI file for any BAM that is missing it
 +
# qplot
 +
# verifyBamID
 +
 +
====Inputs====
 +
* Single merged, recalibrated, and deduped BAM file for each subject (stored in a [[#BAM_LIST File for bamQC|BAM_LIST File]])
 +
* Reference files
 +
* (Optional) configuration file to override default options
 +
 +
=====BAM_LIST File for bamQC=====
 +
* Each line of the BAM list file represents a single individual
 +
 +
Columns:
 +
# sample id
 +
# comma separated population labels (optional column)
 +
# BAM File (preferable to have full paths to BAM files)
 +
 +
[SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1]
 +
or
 +
[SAMPLE_ID] [BAM_FILE1]
 +
 +
* Notes:
 +
** tab delimited
 +
** population label is optional - it will default to <code>ALL</code>
 +
*** only used by Thunder (part of ldrefine pipeline)
 +
*** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample.
 +
 +
====Outputs====
 +
Upon successful completion of the *bamQC* sub-pipeline, you should see the following files/subdirectories under the user specified output directory:
 +
 +
* '''QCFiles/''' - contains quality control results
 +
** VerifyBamID Output - see [[VerifyBamID#A_guideline_to_interpret_output_files|VerifyBamID: A guideline to interpret output files]] for more information
 +
*** ''*/SAMPLE.genoCheck.depthRG'' - depth distribution of the sequence reads per read group
 +
*** ''*/SAMPLE.genoCheck.depthSM'' - depth distribution of the sequence reads per sample
 +
*** ''*/SAMPLE.genoCheck.err'' - log file
 +
*** ''*/SAMPLE.genoCheck.log'' - log file
 +
*** ''*/SAMPLE.genoCheck.OK'' - temp file indicating the VerifyBAMID step completed successfully
 +
*** ''*/SAMPLE.genoCheck.selfRG'' - per-readGroup statistics describing how well each lane matches to the annotated sample
 +
*** '''''*/SAMPLE.genoCheck.selfSM'' - main output file containing the contamination estimate; per-sample statistics describing how well the sample matches to the annotated sample'''
 +
**** Check the 'FREEMIX' column for genotype-free estimate of contamination 0-1 scale, the lower, the better
 +
**** If [FREEMIX] >= 0.03 and [FREELK1]-[FREELK0] is large, possible contamination
 +
** Qplot Output - see: [[QPLOT#Diagnose_sequencing_quality|QPLOT: Diagnose sequencing quality]] for more info on how to use QPLOT results
 +
*** ''*/SAMPLE.qplot.OK'' - temp file indicating the qplot step completed successfully
 +
*** '''''*/SAMPLE.qplot.R'' - Rscript that can be used to generate the pdf graphs'''
 +
*** '''''*/SAMPLE.qplot.stats'' - sample statistics'''
 +
 +
You should see .done and .OK files for each SAMPLE in the index file. If you do not see the .done and .OK files, then your *bamQC* sub-pipeline failed.
 +
 +
'''On success, the QCFiles/ folder contains the quality control output'''
 +
 +
===Command-Line and Configuration Options===
 +
 +
*Required Options
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Command-line Flag !! Configuration Key !! Value Description !! Default Value
 +
|-
 +
| --list/--bam_list/--bamlist ''file'' || BAM_LIST || path to the [[#BAM_LIST File for bamQC|BAM_LIST File]] || $(OUT_DIR)/bam.list
 +
|-
 +
| --numjobs ''#'' || || number of jobs to run in parallel || 0 (generate Makefile of steps, but do not run)
 +
|}
 +
 +
*Common Options
 +
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Command-line Flag !! Configuration Key !! Value Description !! Default Value
 +
|-
 +
| --outdir ''path'' || OUT_DIR || output directory ||
 +
|-
 +
| --conf ''file'' || || configuration file to use ||
 +
|-
 +
|  || REF_DIR || where the reference/resource files are stored || gotcloud.ref subdirectory within the base GotCloud directory
 +
|-
 +
| || REF || [[GotCloud: Genetic Reference and Resource Files#Reference fasta Files|Reference fasta Files]] || $(REF_DIR)/human.g1k.v37.fa
 +
|-
 +
| || DBSNP_VCF || [[GotCloud: Genetic Reference and Resource Files#DBSNP VCF File|DBSNP VCF Files]] || $(REF_DIR)/dbsnp_135.b37.vcf.gz
 +
|-
 +
| || HM3_VCF || [[GotCloud: Genetic Reference and Resource Files#HapMap3 VCF File|HapMap3 VCF Files]] || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz
 +
|}
 +
 +
==== Example Configuration File ====
 +
Example configuration file where reference files happen to be stored in /path/reference, and bam list file is stored in in path/freeze5
 +
BAM_LIST = /path/freeze5.bam.list
 +
OUT_DIR = /path/freeze5/output
 +
REF_DIR = /path/reference/
 +
REF = $(REF_DIR)/hs37d5.fa
 +
DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz
 +
HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz
 +
 +
==== Example Command Line ====
 +
gotcloud pipe –-name bamQC --numjobs <N>
87

edits

Navigation menu