Required Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--outdir path |
OUT_DIR |
output directory |
|
--list/--bam_list/--bamlist file |
BAM_LIST |
path to the BAM List File |
$(OUT_DIR)/bam.list
|
--numjobs # |
|
number of jobs to run in parallel |
0 (generate Makefile of steps, but do not run)
|
Common Options
Common Options
|
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--conf file |
|
configuration file to use |
|
Cluster Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--batchtype type |
BATCH_TYPE |
name of cluster type |
local
|
--batchopts opts |
BATCH_OPTS |
options to pass to the cluster command |
|
--copyglf path |
COPY_GLF |
path to copy glfs to before processing them (path local to remote nodes, maybe in /tmp) |
|
Test/Debug Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--help |
|
print help information |
|
--test path |
|
run the snpcall/ldrefine test and write output to the specified path |
|
--verbose |
|
Add additional messages when reading configuration |
|
Reference/Resource Files
Analysis Region Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--chrs # # |
CHRS |
pace separated list of chromosomes to process |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
|
--region #:#-# |
|
call region - skip regions of chromosome outside of specified region
format (-end is optional): chr:start-end
|
|
|
UNIT_CHUNK |
chunk size of SNP calling (GotCloud breaks up each chromosome into regions of this size) |
5000000
|
|
LD_NSNPS |
chunk size (number of SNPs) of genotype refinement |
10000
|
|
LD_OVERLAP |
overlapping # of SNPs between chunks for genotype refinement |
1000
|
Chromosome X Calling
For proper Chromosome X calling, it is recommended to specify a PED file with sex information:
Configuration Key |
Value Description
|
PED_INDEX |
ped file containing sampleID (2nd column) and sex (5th column)
|
Format of PED file:
familyID sampleID fatherID motherID sex
- Only
sampleID
and sex
are used
Targeted/Exome Sequencing Settings
If you are running Targeted/Exome Sequencing, the user should specify:
Configuration Key |
Value Description
|
UNIFORM_TARGET_BED |
Bed file of targeted regions (same bed for all samples)
|
MULTIPLE_TARGET_MAP |
Filename of file mapping: sample id -> bed file of targeted regions
Each line of the file contains: [SM_ID] [TARGET_BED]
|
OFFSET_OFF_TARGET |
Number of bases by which to extend the target region
(default is 0, do not extend the target region)
|
SAMTOOLS_VIEW_TARGET_ONLY |
true: speeds up processing by excluding off-target regions initially when performing samtools view
false (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step
Warning: You may not want to set this to true due to it may:
- make command line too long
- produce an error if reads overlap multiple targeted regions
|
Path Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--makebasename name |
MAKE_BASE_NAME |
basename of the Makefile generated by GotCloud |
umake
|
--bamprefix prefix |
BAM_PREFIX |
path to prepend to relative BAM file paths in the BAM list |
|
--refprefix prefix |
REF_PREFIX |
path to prepend to relative reference/resource file paths |
|
--baseprefix prefix |
BASE_PREFIX |
path to prepend to relative paths for the BAM list file, PED_INDEX, BAM (if BAM_PREFIX isn't specified), reference/resource files (if REF_PREFIX isn't specified) |
|
--refdir path |
REF_DIR |
value to use for REF_DIR key |
$(GOTCLOUD_ROOT)/gotcloud.ref
|
--gotcloudroot path |
GOTCLOUD_ROOT |
specify to use a different directory for finding GotCloud bins/scripts |
based on the location of the gotcloud/umake.pl script
|
Validation Adjustment Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--maxlocaljobs # |
|
maximum # of jobs that can run if batchtype is local (to prevent accidentally starting jobs locally that were meant to be on a cluster) |
10
|
--ignoresmcheck |
IGNORE_SM_CHECK |
disable the validation that the Sample name in the BAM file matches the one in the BAM list file |
|
Miscellaneous Options
Command-line Flag |
Configuration Key |
Value Description |
Default Value
|
--nophonehome |
|
disable phonehome in GotCloud and the tools it calls |
|
|
BAMUTIL_THINNING |
thinning parameter for bamUtil programs (will be set to 0 - if --nophonehome is specified) |
--phoneHomeThinning 10
|
Directory Settings Options
These values set GotCloud output subdirectories (relative paths under the OUT_DIR directory). You should not need to change these from the defaults unless you want to use different sub-directory names.
Configuration Key |
Value Description |
Default Value
|
BAM_GLF_DIR |
GLF outputs per BAM (if multiple BAMs per sample) (intermediate files) |
glfs/bams
|
SM_GLF_DIR |
GLF outputs per sample (intermediate files) |
glfs/samples
|
VCF_DIR |
unfiltered and filtered VCFs |
vcfs
|
PVCF_DIR |
vcfPileup results (intermediate files) |
pvcfs
|
SPLIT_DIR |
VCFs with PASS variants only & split into multiple files |
split
|
BEAGLE_DIR |
beagle output |
beagle
|
SPLIT4_DIR |
VCFs with PASS variants only & split into multiple files for running beagle4 |
split4
|
BEAGLE4_DIR |
beagle version 4 output |
beagle4
|
THUNDER_DIR |
thunder output |
thunder
|
TARGET_DIR |
directory to store target information when running with a BED file |
target
|
GLF_INDEX |
filename for index file needed for glfflex (file is created by GotCloud) |
glfIndex.ped
|
Tool Options
These values set the binaries GotCloud should use. You should not need to change these from the defaults unless you want to try a different version of one of the tools.
Some tools have the options specified with the binary command, while others have them separate or hard coded
Configuration Key |
Program Description |
Default Value
|
SAMTOOLS_FOR_PILEUP |
samtools to use for pileup |
$(BIN_DIR)/samtools-hybrid
|
SAMTOOLS_FOR_OTHERS |
samtools to use for view and calmd |
$(BIN_DIR)/samtools-hybrid
|
GLFMERGE |
merge glf files when there are multiple BAMs per indvidual |
$(BIN_DIR)/glfMerge
|
GLFFLEX |
perform glf-based variant calling (replacement for glfMultiples) |
$(BIN_DIR)/glfFlex --minMapQuality 0 --minDepth 1 --maxDepth 10000000 --uniformTsTv --smartFilter
|
VCFPILEUP |
vcfPileup to generate rich per-site information |
$(BIN_DIR)/vcfPileup
|
INFOCOLLECTOR |
gather filtering statistics |
$(BIN_DIR)/infoCollector
|
VCFMERGE |
merge multiple VCFs separated by chunk of genomes |
perl $(SCRIPT_DIR)/bams2vcfMerge.pl
|
VCFCOOKER |
vcfCooker program for filtering |
$(BIN_DIR)/vcfCooker
|
VCFSUMMARY |
script to generate summary statistics of discovered sites |
perl $(SCRIPT_DIR)/vcf-summary
|
VCFSPLIT |
splits VCF into overlapping chunks for genotype refinement |
perl $(SCRIPT_DIR)/vcfSplit.pl
|
VCFSPLIT4 |
splits VCF into overlapping chunks for beagle version 4 genotype refinement |
perl $(SCRIPT_DIR)/vcfSplit4.pl
|
VCF_SPLIT_CHROM |
splits VCF into per chromosome VCFs |
perl $(SCRIPT_DIR)/vcfSplitChr.pl
|
VCFPASTE |
generate filtered genotype VCF |
perl $(SCRIPT_DIR)/vcfPaste.p
|
BEAGLE |
beagle program |
java -Xmx4g -jar $(BIN_DIR)/beagle.20101226.jar seed=993478 gprobs=true niterations=50 lowmem=true
|
BEAGLE4 |
beagle version 4 program |
java -Xmx4g -jar $(BIN_DIR)/b4.r1219.jar seed=993478 gprobs=true
|
VCF2BEAGLE |
convert VCF (with PL tag) into beagle input |
perl $(SCRIPT_DIR)/vcf2Beagle.pl --PL
|
BEAGLE2VCF |
convert beagle output to VCF |
perl $(SCRIPT_DIR)/beagle2Vcf.pl
|
SVM_SCRIPT |
SVM script |
perl $(SCRIPT_DIR)/run_libsvm.pl
|
SVMLEARN |
SVM program |
$(BIN_DIR)/svm-train
|
SVMCLASSIFY |
SVM program |
$(BIN_DIR)/svm-predict
|
INVNORM |
SVM program |
$(BIN_DIR)/invNorm
|
THUNDER_STATES |
flags for thunder states and weighted states |
--states 400 --weightedStates 300
|
THUNDER |
MaCH/Thunder genotype refinement step |
$(BIN_DIR)/thunderVCF -r 30 --phase --dosage --compact --inputPhased $(THUNDER_STATES)
|
LIGATEVCF |
ligate multiple phased VCFs while resolving the phase between VCFs |
perl $(SCRIPT_DIR)/ligateVcf.pl
|
LIGATEVCF4 |
ligate multiple phased VCFs while resolving the phase between VCFs |
perl $(SCRIPT_DIR)/ligateVcf4.pl
|
VCFCAT |
concatenate multiple VCFs |
perl $(SCRIPT_DIR)/vcfCat.pl
|
BGZIP |
bgzip program |
$(BIN_DIR)/bgzip
|
TABIX |
tabix program |
$(BIN_DIR)/tabix
|
BAMUTIL |
bam util program |
$(BIN_DIR)/bam
|
Options
Configuration Key |
Program Description |
Default Value
|
|
SLEEP_MULT |
add sleep time prior to some steps; use only if too many steps are starting at the same time doing the same thing |
0
|
|
REMOTE_PREFIX |
add a prefix to paths when sending across to a remote machine |
|
Options
maxABL => "FILTER_MAX_ABL",
maxSTR => "FILTER_MAX_STR",
minSTR => "FILTER_MIN_STR",
winIndel => "FILTER_WIN_INDEL",
maxSTZ => "FILTER_MAX_STZ",
minSTZ => "FILTER_MIN_STZ",
maxAOI => "FILTER_MAX_AOI",
minFIC => "FILTER_MIN_FIC",
maxCBR => "FILTER_MAX_CBR",
maxLQR => "FILTER_MAX_LQR",
minQual => "FILTER_MIN_QUAL",
minMQ => "FILTER_MIN_MQ",
maxMQ0 => "FILTER_MAX_MQ0",
maxMQ30 => "FILTER_MAX_MQ30",
maxAOZ => "FILTER_MAX_AOZ",
maxIOR => "FILTER_MAX_IOR",
Configuration Key |
Program Description |
Default Value
|
|
SAMTOOLS_VIEW_FILTER |
|
|
|
NOBAQ_SUBSTRINGS |
skip the BAQ step if the BAM filename contains the specified space-separated substrings |
SOLID
|
|
MODEL_GLFSINGLE |
|
|
|
MODEL_SKIP_DISCOVER |
|
|
|
MODEL_AF_PRIOR |
|
|
|
BAM_DEPEND |
|
|
|
WGS_SVM |
|
|
|
MAKE_OPTS |
|
|
|
USE_SVMMODEL |
|
|
SVM_CUTOFF |
|
|
SVMMODEL |
|
|
|
POS_SAMPLE |
|
|
|
NEG_SAMPLE |
|
|
|
KEEP_LOG |
|
|
|
FILTER_ADDITIONAL |
|
|