GotCloud: Variant Calling Pipeline

Back to the beginning [1]

The Variant Calling Pipeline (UMAKE) takes recalibrated BAM files and detects SNPs and calls their genotypes, producing VCF files.

Input Data:

Aligned/Processed/Recalibrated BAM files
Index file containing Sample IDs & BAM file names
Reference files
(Optional) Configuration file to override default options

BAM files

The BAM files need to be duplicate-marked and base-quality recalibrated in order to obtain high quality SNP calls.

FASTQs can be converted to this type of BAM using the Mapping Pipeline.

Additional input Files including Pedigree files (PED format) (to specify gender information in chrX calling), Target information (UCSC's BED format) in targeted or whole exome capture sequencing may be provided. Configuration file contains core information of run-time options including the software binaries and command line arguments. Refer to the example configuration file for further information [edit]

Index File

Each line of the index file represents each individual under the following format. Note that multiple BAMs per individual may be provided.

[SAMPLE_ID]    [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...

Columns:

sample id
comma separated population labels
BAM File 1
BAM File 2 (if applicable)

...

# BAM File N

Reference Files

Reference files are required for doing Variant Calling.

See Configuration Files: Reference Files for information on how to specify the reference files in the configuration.

Configuration File

A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults.

Comments begin with a #

Format: KEY = value

Where KEY is the item being set and value is its new value

Required User Config Files Settings

The following Config File Settings must be specified by the user:

CHRS = # space separated list of chromosomes you want
BAM_INDEX = # path to the Index File of BAMs

Required on Command-Line or in Config File

The following Command-Line or Config File Settings must be specified by the user:

--outdir/OUTDIR= # path to desired output directory

Targeted/Exome Sequencing Settings

If you are running Targeted/Exome Sequencing, the user should specify:

Write loci file when performing pileup
- WRITE_TARGET_LOCI = TRUE
Specify the directory to store target information, for example: targetDir
- TARGET_DIR = targetDir

If all individuals have the same target:

Specify the single bed file, for example: target.bed
- UNIFORM_TARGET_BED = target.bed

If not all individuals have the same target:

Specify the file containing the sample id -> bed map, for example: targetMap.txt
- MULTIPLE_TARGET_MAP = targetMap.txt
  - Each line of the file contains [SM_ID] [TARGET_BED]

Optional Settings:

Extend the target region by a given number of bases, for example: 50
- OFFSET_OFF_TARGET = 50
Exclude off-target regions when using samtools view (may make command line too long)
- SAMTOOLS_VIEW_TARGET_ONLY = TRUE

Reference Files

Reference Sequence in fasta format.
- REF = path/file.fa
Indel VCF File Prefix
- INDEL_PREFIX = path/indels.sites.hg19
- path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
DBSNP File Prefix
- DBSNP_PREFIX = path/dbsnp_135_b37.rod
- path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
HapMap3 polymorphic site prefix
- HM3_PREFIX = path/hapmap3.qc.poly
- path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed

Can be downloaded from: [| FTP Download of Full Resource Files]

INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix DBSNP_PREFIX = $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix HM3_PREFIX = $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix

Chromosome X Calling

PED_INDEX = pedfile.ped

Running

Running umake is straightforward:

cd ~/myseq
/usr/local/biopipe/bin/umake --conf myconf ???
make -f [out-prefix].Makefile -j [# parallel jobs]

GotCloud: Variant Calling Pipeline

Contents

Input Data:

BAM files

Index File

Reference Files

Configuration File

Required User Config Files Settings

Required on Command-Line or in Config File

Targeted/Exome Sequencing Settings

Reference Files

Chromosome X Calling

Running

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools