Difference between revisions of "GotCloud: Variant Calling Options"
Line 133: | Line 133: | ||
| --nophonehome || || disable phonehome in GotCloud and the tools it calls || | | --nophonehome || || disable phonehome in GotCloud and the tools it calls || | ||
|- | |- | ||
+ | | || BAMUTIL_THINNING || thinning parameter for bamUtil programs (will be set to 0 - if --nophonehome is specified) || --phoneHomeThinning 10 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | |||
+ | ==== Directory Settings Options ==== | ||
+ | These values set GotCloud output subdirectories (relative paths under the OUT_DIR directory). You should not need to change these from the defaults unless you want to use different sub-directory names. | ||
+ | |||
+ | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
+ | ! Configuration Key !! Value Description !! Default Value | ||
+ | |- | ||
+ | | BAM_GLF_DIR || GLF outputs per BAM (if multiple BAMs per sample) (intermediate files) || glfs/bams | ||
+ | |- | ||
+ | | SM_GLF_DIR || GLF outputs per sample (intermediate files) || glfs/samples | ||
+ | |- | ||
+ | | '''VCF_DIR''' || unfiltered and filtered VCFs || vcfs | ||
+ | |- | ||
+ | | PVCF_DIR || vcfPileup results (intermediate files) || pvcfs | ||
+ | |- | ||
+ | | SPLIT_DIR || VCFs with PASS variants only & split into multiple files || split | ||
+ | |- | ||
+ | | BEAGLE_DIR || beagle output || beagle | ||
+ | |- | ||
+ | | SPLIT4_DIR || VCFs with PASS variants only & split into multiple files for running beagle4 || split4 | ||
+ | |- | ||
+ | | BEAGLE4_DIR || beagle version 4 output || beagle4 | ||
+ | |- | ||
+ | | THUNDER_DIR || thunder output || thunder | ||
+ | |- | ||
+ | | TARGET_DIR || directory to store target information when running with a BED file || target | ||
+ | |- | ||
+ | | GLF_INDEX || filename for index file needed for glfflex (file is created by GotCloud) || glfIndex.ped | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ==== Tool Options ==== | ||
+ | These values set the binaries GotCloud should use. You should not need to change these from the defaults unless you want to try a different version of one of the tools. | ||
+ | |||
+ | Some tools have the options specified with the binary command, while others have them separate or hard coded | ||
+ | |||
+ | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
+ | ! Configuration Key !! Program Description !! Default Value | ||
+ | |- | ||
+ | | SAMTOOLS_FOR_PILEUP || samtools to use for pileup || $(BIN_DIR)/samtools-hybrid | ||
+ | |- | ||
+ | | SAMTOOLS_FOR_OTHERS || samtools to use for view and calmd || $(BIN_DIR)/samtools-hybrid | ||
+ | |- | ||
+ | | GLFMERGE || merge glf files when there are multiple BAMs per indvidual || $(BIN_DIR)/glfMerge | ||
+ | |- | ||
+ | | GLFFLEX || perform glf-based variant calling (replacement for glfMultiples) || $(BIN_DIR)/glfFlex --minMapQuality 0 --minDepth 1 --maxDepth 10000000 --uniformTsTv --smartFilter | ||
+ | |- | ||
+ | | VCFPILEUP || vcfPileup to generate rich per-site information || $(BIN_DIR)/vcfPileup | ||
+ | |- | ||
+ | | INFOCOLLECTOR || gather filtering statistics || $(BIN_DIR)/infoCollector | ||
+ | |- | ||
+ | | VCFMERGE || merge multiple VCFs separated by chunk of genomes || perl $(SCRIPT_DIR)/bams2vcfMerge.pl | ||
+ | |- | ||
+ | | VCFCOOKER || vcfCooker program for filtering || $(BIN_DIR)/vcfCooker | ||
+ | |- | ||
+ | | VCFSUMMARY || script to generate summary statistics of discovered sites || perl $(SCRIPT_DIR)/vcf-summary | ||
+ | |- | ||
+ | | VCFSPLIT || splits VCF into overlapping chunks for genotype refinement || perl $(SCRIPT_DIR)/vcfSplit.pl | ||
+ | |- | ||
+ | | VCFSPLIT4 || splits VCF into overlapping chunks for beagle version 4 genotype refinement || perl $(SCRIPT_DIR)/vcfSplit4.pl | ||
+ | |- | ||
+ | | VCF_SPLIT_CHROM || splits VCF into per chromosome VCFs || perl $(SCRIPT_DIR)/vcfSplitChr.pl | ||
+ | |- | ||
+ | | VCFPASTE || generate filtered genotype VCF || perl $(SCRIPT_DIR)/vcfPaste.p | ||
+ | |- | ||
+ | | BEAGLE || beagle program || java -Xmx4g -jar $(BIN_DIR)/beagle.20101226.jar seed=993478 gprobs=true niterations=50 lowmem=true | ||
+ | |- | ||
+ | | BEAGLE4 || beagle version 4 program || java -Xmx4g -jar $(BIN_DIR)/b4.r1219.jar seed=993478 gprobs=true | ||
+ | |- | ||
+ | | VCF2BEAGLE || convert VCF (with PL tag) into beagle input || perl $(SCRIPT_DIR)/vcf2Beagle.pl --PL | ||
+ | |- | ||
+ | | BEAGLE2VCF || convert beagle output to VCF || perl $(SCRIPT_DIR)/beagle2Vcf.pl | ||
+ | |- | ||
+ | | SVM_SCRIPT || SVM script || perl $(SCRIPT_DIR)/run_libsvm.pl | ||
+ | |- | ||
+ | | SVMLEARN || SVM program || $(BIN_DIR)/svm-train | ||
+ | |- | ||
+ | | SVMCLASSIFY || SVM program || $(BIN_DIR)/svm-predict | ||
+ | |- | ||
+ | | INVNORM || SVM program || $(BIN_DIR)/invNorm | ||
+ | |- | ||
+ | | THUNDER_STATES || flags for thunder states and weighted states || --states 400 --weightedStates 300 | ||
+ | |- | ||
+ | | THUNDER || MaCH/Thunder genotype refinement step || $(BIN_DIR)/thunderVCF -r 30 --phase --dosage --compact --inputPhased $(THUNDER_STATES) | ||
+ | |- | ||
+ | | LIGATEVCF || ligate multiple phased VCFs while resolving the phase between VCFs || perl $(SCRIPT_DIR)/ligateVcf.pl | ||
+ | |- | ||
+ | | LIGATEVCF4 || ligate multiple phased VCFs while resolving the phase between VCFs || perl $(SCRIPT_DIR)/ligateVcf4.pl | ||
+ | |- | ||
+ | | VCFCAT || concatenate multiple VCFs || perl $(SCRIPT_DIR)/vcfCat.pl | ||
+ | |- | ||
+ | | BGZIP || bgzip program || $(BIN_DIR)/bgzip | ||
+ | |- | ||
+ | | TABIX || tabix program || $(BIN_DIR)/tabix | ||
+ | |- | ||
+ | | BAMUTIL || bam util program || $(BIN_DIR)/bam | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ==== Options ==== | ||
+ | |||
+ | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
+ | ! Configuration Key !! Program Description !! Default Value | ||
+ | |- | ||
+ | | || SLEEP_MULT || add sleep time prior to some steps; use only if too many steps are starting at the same time doing the same thing || 0 | ||
+ | |- | ||
+ | | || REMOTE_PREFIX || add a prefix to paths when sending across to a remote machine || | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | ==== Options ==== | ||
+ | |||
+ | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
+ | ! Configuration Key !! Program Description !! Default Value | ||
+ | |- | ||
+ | |||
+ | | || SAMTOOLS_VIEW_FILTER || || | ||
+ | |- | ||
+ | | || NOBAQ_SUBSTRINGS || skip the BAQ step if the BAM filename contains the specified space-separated substrings || SOLID | ||
+ | |- | ||
+ | | || MODEL_GLFSINGLE || || | ||
+ | |- | ||
+ | | || MODEL_SKIP_DISCOVER || || | ||
+ | |- | ||
+ | | || MODEL_AF_PRIOR || || | ||
+ | |- | ||
+ | | || BAM_DEPEND || || | ||
+ | |- | ||
+ | | || WGS_SVM || || | ||
+ | |- | ||
+ | | || MAKE_OPTS || || | ||
+ | |- | ||
+ | | || USE_SVMMODEL || || | ||
+ | |- | ||
+ | | SVM_CUTOFF || || | ||
+ | |- | ||
+ | | SVMMODEL || || | ||
+ | |- | ||
+ | | || POS_SAMPLE || || | ||
+ | |- | ||
+ | | || NEG_SAMPLE || || | ||
+ | |- | ||
+ | | || KEEP_LOG || || | ||
+ | |- | ||
+ | | || FILTER_ADDITIONAL || || | ||
+ | |- | ||
+ | |||
+ | maxABL => "FILTER_MAX_ABL", | ||
+ | maxSTR => "FILTER_MAX_STR", | ||
+ | minSTR => "FILTER_MIN_STR", | ||
+ | winIndel => "FILTER_WIN_INDEL", | ||
+ | maxSTZ => "FILTER_MAX_STZ", | ||
+ | minSTZ => "FILTER_MIN_STZ", | ||
+ | maxAOI => "FILTER_MAX_AOI", | ||
+ | minFIC => "FILTER_MIN_FIC", | ||
+ | maxCBR => "FILTER_MAX_CBR", | ||
+ | maxLQR => "FILTER_MAX_LQR", | ||
+ | minQual => "FILTER_MIN_QUAL", | ||
+ | minMQ => "FILTER_MIN_MQ", | ||
+ | maxMQ0 => "FILTER_MAX_MQ0", | ||
+ | maxMQ30 => "FILTER_MAX_MQ30", | ||
+ | maxAOZ => "FILTER_MAX_AOZ", | ||
+ | maxIOR => "FILTER_MAX_IOR", | ||
+ | |||
+ | |||
+ | |||
|} | |} |
Revision as of 14:28, 24 October 2014
Required Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--outdir path | OUT_DIR | output directory | |
--list/--bam_list/--bamlist file | BAM_LIST | path to the BAM List File | $(OUT_DIR)/bam.list |
--numjobs # | number of jobs to run in parallel | 0 (generate Makefile of steps, but do not run) |
Common Options
Common Options | |||
---|---|---|---|
Command-line Flag | Configuration Key | Value Description | Default Value |
--conf file | configuration file to use |
Cluster Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--batchtype type | BATCH_TYPE | name of cluster type | local |
--batchopts opts | BATCH_OPTS | options to pass to the cluster command | |
--copyglf path | COPY_GLF | path to copy glfs to before processing them (path local to remote nodes, maybe in /tmp) |
Test/Debug Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--help | print help information | ||
--test path | run the snpcall/ldrefine test and write output to the specified path | ||
--verbose | Add additional messages when reading configuration |
Reference/Resource Files
- See GotCloud: Genetic Reference and Resource Files for reference/resource file configuration settings
Analysis Region Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--chrs # # | CHRS | pace separated list of chromosomes to process | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X |
--region #:#-# | call region - skip regions of chromosome outside of specified region
format (-end is optional): chr:start-end |
Chromosome X Calling
For proper Chromosome X calling, it is recommended to specify a PED file with sex information:
Configuration Key | Value Description |
---|---|
PED_INDEX | ped file containing sampleID (2nd column) and sex (5th column) |
Format of PED file:
familyID sampleID fatherID motherID sex
- Only
sampleID
andsex
are used
Targeted/Exome Sequencing Settings
If you are running Targeted/Exome Sequencing, the user should specify:
Configuration Key | Value Description |
---|---|
UNIFORM_TARGET_BED | Bed file of targeted regions (same bed for all samples) |
MULTIPLE_TARGET_MAP | Filename of file mapping: sample id -> bed file of targeted regions
Each line of the file contains: [SM_ID] [TARGET_BED] |
OFFSET_OFF_TARGET | Number of bases by which to extend the target region
(default is 0, do not extend the target region) |
SAMTOOLS_VIEW_TARGET_ONLY | true: speeds up processing by excluding off-target regions initially when performing samtools view
false (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step Warning: You may not want to set this to true due to it may:
|
Path Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--makebasename name | MAKE_BASE_NAME | basename of the Makefile generated by GotCloud | umake |
--bamprefix prefix | BAM_PREFIX | path to prepend to relative BAM file paths in the BAM list | |
--refprefix prefix | REF_PREFIX | path to prepend to relative reference/resource file paths | |
--baseprefix prefix | BASE_PREFIX | path to prepend to relative paths for the BAM list file, PED_INDEX, BAM (if BAM_PREFIX isn't specified), reference/resource files (if REF_PREFIX isn't specified) | |
--refdir path | REF_DIR | value to use for REF_DIR key | $(GOTCLOUD_ROOT)/gotcloud.ref |
--gotcloudroot path | GOTCLOUD_ROOT | specify to use a different directory for finding GotCloud bins/scripts | based on the location of the gotcloud/umake.pl script |
Validation Adjustment Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--maxlocaljobs # | maximum # of jobs that can run if batchtype is local (to prevent accidentally starting jobs locally that were meant to be on a cluster) | 10 | |
--ignoresmcheck | IGNORE_SM_CHECK | disable the validation that the Sample name in the BAM file matches the one in the BAM list file |
Miscellaneous Options
Command-line Flag | Configuration Key | Value Description | Default Value |
---|---|---|---|
--nophonehome | disable phonehome in GotCloud and the tools it calls | ||
BAMUTIL_THINNING | thinning parameter for bamUtil programs (will be set to 0 - if --nophonehome is specified) | --phoneHomeThinning 10 |
Directory Settings Options
These values set GotCloud output subdirectories (relative paths under the OUT_DIR directory). You should not need to change these from the defaults unless you want to use different sub-directory names.
Configuration Key | Value Description | Default Value |
---|---|---|
BAM_GLF_DIR | GLF outputs per BAM (if multiple BAMs per sample) (intermediate files) | glfs/bams |
SM_GLF_DIR | GLF outputs per sample (intermediate files) | glfs/samples |
VCF_DIR | unfiltered and filtered VCFs | vcfs |
PVCF_DIR | vcfPileup results (intermediate files) | pvcfs |
SPLIT_DIR | VCFs with PASS variants only & split into multiple files | split |
BEAGLE_DIR | beagle output | beagle |
SPLIT4_DIR | VCFs with PASS variants only & split into multiple files for running beagle4 | split4 |
BEAGLE4_DIR | beagle version 4 output | beagle4 |
THUNDER_DIR | thunder output | thunder |
TARGET_DIR | directory to store target information when running with a BED file | target |
GLF_INDEX | filename for index file needed for glfflex (file is created by GotCloud) | glfIndex.ped |
Tool Options
These values set the binaries GotCloud should use. You should not need to change these from the defaults unless you want to try a different version of one of the tools.
Some tools have the options specified with the binary command, while others have them separate or hard coded
Configuration Key | Program Description | Default Value |
---|---|---|
SAMTOOLS_FOR_PILEUP | samtools to use for pileup | $(BIN_DIR)/samtools-hybrid |
SAMTOOLS_FOR_OTHERS | samtools to use for view and calmd | $(BIN_DIR)/samtools-hybrid |
GLFMERGE | merge glf files when there are multiple BAMs per indvidual | $(BIN_DIR)/glfMerge |
GLFFLEX | perform glf-based variant calling (replacement for glfMultiples) | $(BIN_DIR)/glfFlex --minMapQuality 0 --minDepth 1 --maxDepth 10000000 --uniformTsTv --smartFilter |
VCFPILEUP | vcfPileup to generate rich per-site information | $(BIN_DIR)/vcfPileup |
INFOCOLLECTOR | gather filtering statistics | $(BIN_DIR)/infoCollector |
VCFMERGE | merge multiple VCFs separated by chunk of genomes | perl $(SCRIPT_DIR)/bams2vcfMerge.pl |
VCFCOOKER | vcfCooker program for filtering | $(BIN_DIR)/vcfCooker |
VCFSUMMARY | script to generate summary statistics of discovered sites | perl $(SCRIPT_DIR)/vcf-summary |
VCFSPLIT | splits VCF into overlapping chunks for genotype refinement | perl $(SCRIPT_DIR)/vcfSplit.pl |
VCFSPLIT4 | splits VCF into overlapping chunks for beagle version 4 genotype refinement | perl $(SCRIPT_DIR)/vcfSplit4.pl |
VCF_SPLIT_CHROM | splits VCF into per chromosome VCFs | perl $(SCRIPT_DIR)/vcfSplitChr.pl |
VCFPASTE | generate filtered genotype VCF | perl $(SCRIPT_DIR)/vcfPaste.p |
BEAGLE | beagle program | java -Xmx4g -jar $(BIN_DIR)/beagle.20101226.jar seed=993478 gprobs=true niterations=50 lowmem=true |
BEAGLE4 | beagle version 4 program | java -Xmx4g -jar $(BIN_DIR)/b4.r1219.jar seed=993478 gprobs=true |
VCF2BEAGLE | convert VCF (with PL tag) into beagle input | perl $(SCRIPT_DIR)/vcf2Beagle.pl --PL |
BEAGLE2VCF | convert beagle output to VCF | perl $(SCRIPT_DIR)/beagle2Vcf.pl |
SVM_SCRIPT | SVM script | perl $(SCRIPT_DIR)/run_libsvm.pl |
SVMLEARN | SVM program | $(BIN_DIR)/svm-train |
SVMCLASSIFY | SVM program | $(BIN_DIR)/svm-predict |
INVNORM | SVM program | $(BIN_DIR)/invNorm |
THUNDER_STATES | flags for thunder states and weighted states | --states 400 --weightedStates 300 |
THUNDER | MaCH/Thunder genotype refinement step | $(BIN_DIR)/thunderVCF -r 30 --phase --dosage --compact --inputPhased $(THUNDER_STATES) |
LIGATEVCF | ligate multiple phased VCFs while resolving the phase between VCFs | perl $(SCRIPT_DIR)/ligateVcf.pl |
LIGATEVCF4 | ligate multiple phased VCFs while resolving the phase between VCFs | perl $(SCRIPT_DIR)/ligateVcf4.pl |
VCFCAT | concatenate multiple VCFs | perl $(SCRIPT_DIR)/vcfCat.pl |
BGZIP | bgzip program | $(BIN_DIR)/bgzip |
TABIX | tabix program | $(BIN_DIR)/tabix |
BAMUTIL | bam util program | $(BIN_DIR)/bam |
Options
Configuration Key | Program Description | Default Value | |
---|---|---|---|
SLEEP_MULT | add sleep time prior to some steps; use only if too many steps are starting at the same time doing the same thing | 0 | |
REMOTE_PREFIX | add a prefix to paths when sending across to a remote machine |
Options
maxABL => "FILTER_MAX_ABL", maxSTR => "FILTER_MAX_STR", minSTR => "FILTER_MIN_STR", winIndel => "FILTER_WIN_INDEL", maxSTZ => "FILTER_MAX_STZ", minSTZ => "FILTER_MIN_STZ", maxAOI => "FILTER_MAX_AOI", minFIC => "FILTER_MIN_FIC", maxCBR => "FILTER_MAX_CBR", maxLQR => "FILTER_MAX_LQR", minQual => "FILTER_MIN_QUAL", minMQ => "FILTER_MIN_MQ", maxMQ0 => "FILTER_MAX_MQ0", maxMQ30 => "FILTER_MAX_MQ30", maxAOZ => "FILTER_MAX_AOZ", maxIOR => "FILTER_MAX_IOR",Configuration Key | Program Description | Default Value | |
---|---|---|---|
SAMTOOLS_VIEW_FILTER | |||
NOBAQ_SUBSTRINGS | skip the BAQ step if the BAM filename contains the specified space-separated substrings | SOLID | |
MODEL_GLFSINGLE | |||
MODEL_SKIP_DISCOVER | |||
MODEL_AF_PRIOR | |||
BAM_DEPEND | |||
WGS_SVM | |||
MAKE_OPTS | |||
USE_SVMMODEL | |||
SVM_CUTOFF | |||
SVMMODEL | |||
POS_SAMPLE | |||
NEG_SAMPLE | |||
KEEP_LOG | |||
FILTER_ADDITIONAL |