Difference between revisions of "GotCloud: Variant Calling Options"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 56: Line 56:
 
format (-end is optional): chr:start-end
 
format (-end is optional): chr:start-end
 
|  
 
|  
 +
|-
 +
| || UNIT_CHUNK || chunk size of SNP calling (GotCloud breaks up each chromosome into regions of this size) || 5000000
 +
|-
 +
| || LD_NSNPS || chunk size (number of SNPs) of genotype refinement || 10000
 +
|-
 +
| || LD_OVERLAP || overlapping # of SNPs between chunks for genotype refinement || 1000
 +
|-
 
|}
 
|}
  
Line 93: Line 100:
 
** see: [[GotCloud: FAQs#Targetted/Exome|GotCloud: FAQs->Targetted/Exome]]
 
** see: [[GotCloud: FAQs#Targetted/Exome|GotCloud: FAQs->Targetted/Exome]]
 
|}
 
|}
 
 
  
 
=== Path Options ===
 
=== Path Options ===

Revision as of 14:29, 24 October 2014

Required Options

Command-line Flag Configuration Key Value Description Default Value
--outdir path OUT_DIR output directory
--list/--bam_list/--bamlist file BAM_LIST path to the BAM List File $(OUT_DIR)/bam.list
--numjobs # number of jobs to run in parallel 0 (generate Makefile of steps, but do not run)

Common Options

Common Options
Command-line Flag Configuration Key Value Description Default Value
--conf file configuration file to use

Cluster Options

Command-line Flag Configuration Key Value Description Default Value
--batchtype type BATCH_TYPE name of cluster type local
--batchopts opts BATCH_OPTS options to pass to the cluster command
--copyglf path COPY_GLF path to copy glfs to before processing them (path local to remote nodes, maybe in /tmp)

Test/Debug Options

Command-line Flag Configuration Key Value Description Default Value
--help print help information
--test path run the snpcall/ldrefine test and write output to the specified path
--verbose Add additional messages when reading configuration

Reference/Resource Files

Analysis Region Options

Command-line Flag Configuration Key Value Description Default Value
--chrs # # CHRS pace separated list of chromosomes to process 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
--region #:#-# call region - skip regions of chromosome outside of specified region

format (-end is optional): chr:start-end

UNIT_CHUNK chunk size of SNP calling (GotCloud breaks up each chromosome into regions of this size) 5000000
LD_NSNPS chunk size (number of SNPs) of genotype refinement 10000
LD_OVERLAP overlapping # of SNPs between chunks for genotype refinement 1000

Chromosome X Calling

For proper Chromosome X calling, it is recommended to specify a PED file with sex information:

Configuration Key Value Description
PED_INDEX ped file containing sampleID (2nd column) and sex (5th column)

Format of PED file:

familyID sampleID fatherID motherID sex
  • Only sampleID and sex are used

Targeted/Exome Sequencing Settings

If you are running Targeted/Exome Sequencing, the user should specify:

Configuration Key Value Description
UNIFORM_TARGET_BED Bed file of targeted regions (same bed for all samples)
MULTIPLE_TARGET_MAP Filename of file mapping: sample id -> bed file of targeted regions

Each line of the file contains: [SM_ID] [TARGET_BED]

OFFSET_OFF_TARGET Number of bases by which to extend the target region

(default is 0, do not extend the target region)

SAMTOOLS_VIEW_TARGET_ONLY true: speeds up processing by excluding off-target regions initially when performing samtools view

false (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step

Warning: You may not want to set this to true due to it may:

Path Options

Command-line Flag Configuration Key Value Description Default Value
--makebasename name MAKE_BASE_NAME basename of the Makefile generated by GotCloud umake
--bamprefix prefix BAM_PREFIX path to prepend to relative BAM file paths in the BAM list
--refprefix prefix REF_PREFIX path to prepend to relative reference/resource file paths
--baseprefix prefix BASE_PREFIX path to prepend to relative paths for the BAM list file, PED_INDEX, BAM (if BAM_PREFIX isn't specified), reference/resource files (if REF_PREFIX isn't specified)
--refdir path REF_DIR value to use for REF_DIR key $(GOTCLOUD_ROOT)/gotcloud.ref
--gotcloudroot path GOTCLOUD_ROOT specify to use a different directory for finding GotCloud bins/scripts based on the location of the gotcloud/umake.pl script

Validation Adjustment Options

Command-line Flag Configuration Key Value Description Default Value
--maxlocaljobs # maximum # of jobs that can run if batchtype is local (to prevent accidentally starting jobs locally that were meant to be on a cluster) 10
--ignoresmcheck IGNORE_SM_CHECK disable the validation that the Sample name in the BAM file matches the one in the BAM list file

Miscellaneous Options

Command-line Flag Configuration Key Value Description Default Value
--nophonehome disable phonehome in GotCloud and the tools it calls
BAMUTIL_THINNING thinning parameter for bamUtil programs (will be set to 0 - if --nophonehome is specified) --phoneHomeThinning 10


Directory Settings Options

These values set GotCloud output subdirectories (relative paths under the OUT_DIR directory). You should not need to change these from the defaults unless you want to use different sub-directory names.

Configuration Key Value Description Default Value
BAM_GLF_DIR GLF outputs per BAM (if multiple BAMs per sample) (intermediate files) glfs/bams
SM_GLF_DIR GLF outputs per sample (intermediate files) glfs/samples
VCF_DIR unfiltered and filtered VCFs vcfs
PVCF_DIR vcfPileup results (intermediate files) pvcfs
SPLIT_DIR VCFs with PASS variants only & split into multiple files split
BEAGLE_DIR beagle output beagle
SPLIT4_DIR VCFs with PASS variants only & split into multiple files for running beagle4 split4
BEAGLE4_DIR beagle version 4 output beagle4
THUNDER_DIR thunder output thunder
TARGET_DIR directory to store target information when running with a BED file target
GLF_INDEX filename for index file needed for glfflex (file is created by GotCloud) glfIndex.ped

Tool Options

These values set the binaries GotCloud should use. You should not need to change these from the defaults unless you want to try a different version of one of the tools.

Some tools have the options specified with the binary command, while others have them separate or hard coded

Configuration Key Program Description Default Value
SAMTOOLS_FOR_PILEUP samtools to use for pileup $(BIN_DIR)/samtools-hybrid
SAMTOOLS_FOR_OTHERS samtools to use for view and calmd $(BIN_DIR)/samtools-hybrid
GLFMERGE merge glf files when there are multiple BAMs per indvidual $(BIN_DIR)/glfMerge
GLFFLEX perform glf-based variant calling (replacement for glfMultiples) $(BIN_DIR)/glfFlex --minMapQuality 0 --minDepth 1 --maxDepth 10000000 --uniformTsTv --smartFilter
VCFPILEUP vcfPileup to generate rich per-site information $(BIN_DIR)/vcfPileup
INFOCOLLECTOR gather filtering statistics $(BIN_DIR)/infoCollector
VCFMERGE merge multiple VCFs separated by chunk of genomes perl $(SCRIPT_DIR)/bams2vcfMerge.pl
VCFCOOKER vcfCooker program for filtering $(BIN_DIR)/vcfCooker
VCFSUMMARY script to generate summary statistics of discovered sites perl $(SCRIPT_DIR)/vcf-summary
VCFSPLIT splits VCF into overlapping chunks for genotype refinement perl $(SCRIPT_DIR)/vcfSplit.pl
VCFSPLIT4 splits VCF into overlapping chunks for beagle version 4 genotype refinement perl $(SCRIPT_DIR)/vcfSplit4.pl
VCF_SPLIT_CHROM splits VCF into per chromosome VCFs perl $(SCRIPT_DIR)/vcfSplitChr.pl
VCFPASTE generate filtered genotype VCF perl $(SCRIPT_DIR)/vcfPaste.p
BEAGLE beagle program java -Xmx4g -jar $(BIN_DIR)/beagle.20101226.jar seed=993478 gprobs=true niterations=50 lowmem=true
BEAGLE4 beagle version 4 program java -Xmx4g -jar $(BIN_DIR)/b4.r1219.jar seed=993478 gprobs=true
VCF2BEAGLE convert VCF (with PL tag) into beagle input perl $(SCRIPT_DIR)/vcf2Beagle.pl --PL
BEAGLE2VCF convert beagle output to VCF perl $(SCRIPT_DIR)/beagle2Vcf.pl
SVM_SCRIPT SVM script perl $(SCRIPT_DIR)/run_libsvm.pl
SVMLEARN SVM program $(BIN_DIR)/svm-train
SVMCLASSIFY SVM program $(BIN_DIR)/svm-predict
INVNORM SVM program $(BIN_DIR)/invNorm
THUNDER_STATES flags for thunder states and weighted states --states 400 --weightedStates 300
THUNDER MaCH/Thunder genotype refinement step $(BIN_DIR)/thunderVCF -r 30 --phase --dosage --compact --inputPhased $(THUNDER_STATES)
LIGATEVCF ligate multiple phased VCFs while resolving the phase between VCFs perl $(SCRIPT_DIR)/ligateVcf.pl
LIGATEVCF4 ligate multiple phased VCFs while resolving the phase between VCFs perl $(SCRIPT_DIR)/ligateVcf4.pl
VCFCAT concatenate multiple VCFs perl $(SCRIPT_DIR)/vcfCat.pl
BGZIP bgzip program $(BIN_DIR)/bgzip
TABIX tabix program $(BIN_DIR)/tabix
BAMUTIL bam util program $(BIN_DIR)/bam

Options

Configuration Key Program Description Default Value
SLEEP_MULT add sleep time prior to some steps; use only if too many steps are starting at the same time doing the same thing 0
REMOTE_PREFIX add a prefix to paths when sending across to a remote machine

Options

maxABL => "FILTER_MAX_ABL", maxSTR => "FILTER_MAX_STR", minSTR => "FILTER_MIN_STR", winIndel => "FILTER_WIN_INDEL", maxSTZ => "FILTER_MAX_STZ", minSTZ => "FILTER_MIN_STZ", maxAOI => "FILTER_MAX_AOI", minFIC => "FILTER_MIN_FIC", maxCBR => "FILTER_MAX_CBR", maxLQR => "FILTER_MAX_LQR", minQual => "FILTER_MIN_QUAL", minMQ => "FILTER_MIN_MQ", maxMQ0 => "FILTER_MAX_MQ0", maxMQ30 => "FILTER_MAX_MQ30", maxAOZ => "FILTER_MAX_AOZ", maxIOR => "FILTER_MAX_IOR",
Configuration Key Program Description Default Value
SAMTOOLS_VIEW_FILTER
NOBAQ_SUBSTRINGS skip the BAQ step if the BAM filename contains the specified space-separated substrings SOLID
MODEL_GLFSINGLE
MODEL_SKIP_DISCOVER
MODEL_AF_PRIOR
BAM_DEPEND
WGS_SVM
MAKE_OPTS
USE_SVMMODEL
SVM_CUTOFF
SVMMODEL
POS_SAMPLE
NEG_SAMPLE
KEEP_LOG
FILTER_ADDITIONAL