GotCloud: Variant Calling Pipeline
Back to the beginning [1]
The Variant Calling Pipeline (UMAKE) takes recalibrated BAM files and detects SNPs and calls their genotypes, producing VCF files.
Input Data:
- Aligned/Processed/Recalibrated BAM files
- Index file containing Sample IDs & BAM file names
- Reference files
- (Optional) Configuration file to override default options
BAM files
The BAM files need to be duplicate-marked and base-quality recalibrated in order to obtain high quality SNP calls.
FASTQs can be converted to this type of BAM using the Mapping Pipeline.
Additional input Files including Pedigree files (PED format) (to specify gender information in chrX calling), Target information (UCSC's BED format) in targeted or whole exome capture sequencing may be provided.
Configuration file contains core information of run-time options including the software binaries and command line arguments. Refer to the example configuration file for further information
[edit]
Index File
Each line of the index file represents each individual under the following format. Note that multiple BAMs per individual may be provided.
[SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...
Columns:
- sample id
- comma separated population labels
- BAM File 1
- BAM File 2 (if applicable)
- ...
- # BAM File N
Reference Files
Reference files are required for doing Variant Calling.
See Configuration Files: Reference Files for information on how to specify the reference files in the configuration.
Configuration File
A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults.
Comments begin with a #
Format: KEY = value
Where KEY is the item being set and value is its new value
Required User Config Files Settings
The following Config File Settings must be specified by the user:
- CHRS = # space separated list of chromosomes you want
- BAM_INDEX = # path to the Index File of BAMs
Required on Command-Line or in Config File
The following Command-Line or Config File Settings must be specified by the user:
- --outdir/OUTDIR= # path to desired output directory
Targeted/Exome Sequencing Settings
If you are running Targeted/Exome Sequencing, the user should specify:
- Write loci file when performing pileup
- WRITE_TARGET_LOCI = TRUE
- Specify the directory to store target information, for example: targetDir
- TARGET_DIR = targetDir
If all individuals have the same target:
- Specify the single bed file, for example: target.bed
- UNIFORM_TARGET_BED = target.bed
If not all individuals have the same target:
- Specify the file containing the sample id -> bed map, for example: targetMap.txt
- MULTIPLE_TARGET_MAP = targetMap.txt
- Each line of the file contains [SM_ID] [TARGET_BED]
- MULTIPLE_TARGET_MAP = targetMap.txt
Optional Settings:
- Extend the target region by a given number of bases, for example: 50
- OFFSET_OFF_TARGET = 50
- Exclude off-target regions when using samtools view (may make command line too long)
- SAMTOOLS_VIEW_TARGET_ONLY = TRUE
Reference Files
- Reference Sequence in fasta format.
- REF = path/file.fa
- Indel VCF File Prefix
- INDEL_PREFIX = path/indels.sites.hg19
- path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
- DBSNP File Prefix
- DBSNP_PREFIX = path/dbsnp_135_b37.rod
- path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
- HapMap3 polymorphic site prefix
- HM3_PREFIX = path/hapmap3.qc.poly
- path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed
Can be downloaded from: [| FTP Download of Full Resource Files]
INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix DBSNP_PREFIX = $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix HM3_PREFIX = $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix
Chromosome X Calling
- PED_INDEX = pedfile.ped
Running
Running umake is straightforward:
cd ~/myseq
/usr/local/biopipe/bin/umake --conf myconf ???
make -f [out-prefix].Makefile -j [# parallel jobs]