Difference between revisions of "GotCloud: Variant Calling Pipeline"
Line 16: | Line 16: | ||
=== Configuration File === | === Configuration File === | ||
+ | A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults. | ||
+ | |||
+ | Comments begin with a <code>#</code> | ||
+ | |||
+ | Format: KEY = value | ||
+ | |||
+ | Where KEY is the item being set and value is its new value | ||
+ | |||
+ | |||
+ | ====Required User Config Files Settings==== | ||
+ | The following Config File Settings must be specified by the user: | ||
+ | * CHRS = # space separated list of chromosomes you want | ||
+ | * BAM_INDEX = # path to the Index File of BAMs | ||
+ | |||
+ | ====Required on Command-Line or in Config File==== | ||
+ | The following Command-Line or Config File Settings must be specified by the user: | ||
+ | * --outdir/OUTDIR= # path to desired output directory | ||
+ | |||
+ | ====Targeted/Exome Sequencing Settings==== | ||
+ | If you are running Targeted/Exome Sequencing, the user should specify: | ||
+ | * Write loci file when performing pileup | ||
+ | ** WRITE_TARGET_LOCI = TRUE | ||
+ | * Specify the directory to store target information, for example: targetDir | ||
+ | ** TARGET_DIR = targetDir | ||
+ | |||
+ | If all individuals have the same target: | ||
+ | * Specify the single bed file, for example: target.bed | ||
+ | ** UNIFORM_TARGET_BED = target.bed | ||
+ | |||
+ | If not all individuals have the same target: | ||
+ | * Specify the file containing the sample id -> bed map, for example: targetMap.txt | ||
+ | ** MULTIPLE_TARGET_MAP = targetMap.txt | ||
+ | *** Each line of the file contains [SM_ID] [TARGET_BED] | ||
+ | |||
+ | Optional Settings: | ||
+ | * Extend the target region by a given number of bases, for example: 50 | ||
+ | ** OFFSET_OFF_TARGET = 50 | ||
+ | * Exclude off-target regions when using samtools view (may make command line too long) | ||
+ | ** SAMTOOLS_VIEW_TARGET_ONLY = TRUE | ||
+ | |||
+ | |||
+ | ==== Reference Files ==== | ||
+ | * Reference Sequence in fasta format. | ||
+ | ** REF = path/file.fa | ||
+ | * Indel VCF File Prefix | ||
+ | ** INDEL_PREFIX = path/indels.sites.hg19 | ||
+ | ** path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed | ||
+ | * DBSNP File Prefix | ||
+ | ** DBSNP_PREFIX = path/dbsnp_135_b37.rod | ||
+ | ** path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed | ||
+ | * HapMap3 polymorphic site prefix | ||
+ | ** HM3_PREFIX = path/hapmap3.qc.poly | ||
+ | ** path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed | ||
+ | |||
+ | Can be downloaded from: [[ftp://share.sph.umich.edu/1000genomes/umake-resources/ | FTP Download of Full Resource Files]] | ||
+ | |||
+ | INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix | ||
+ | DBSNP_PREFIX = $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix | ||
+ | HM3_PREFIX = $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix | ||
+ | |||
+ | ==== Chromosome X Calling ==== | ||
+ | * PED_INDEX = pedfile.ped | ||
+ | |||
== Running == | == Running == |
Revision as of 00:50, 6 November 2012
Back to the beginning [1]
The Variant Calling Pipeline (UMAKE) takes recalibrated BAM files and detects SNPs and calls their genotypes, producing VCF files.
Input Data:
- Aligned/Processed/Recalibrated BAM files
- Index file containing Sample IDs & BAM file names
- Reference files
- (Optional) Configuration file to override default options
BAM files
Index file
Reference Files
Configuration File
A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults.
Comments begin with a #
Format: KEY = value
Where KEY is the item being set and value is its new value
Required User Config Files Settings
The following Config File Settings must be specified by the user:
- CHRS = # space separated list of chromosomes you want
- BAM_INDEX = # path to the Index File of BAMs
Required on Command-Line or in Config File
The following Command-Line or Config File Settings must be specified by the user:
- --outdir/OUTDIR= # path to desired output directory
Targeted/Exome Sequencing Settings
If you are running Targeted/Exome Sequencing, the user should specify:
- Write loci file when performing pileup
- WRITE_TARGET_LOCI = TRUE
- Specify the directory to store target information, for example: targetDir
- TARGET_DIR = targetDir
If all individuals have the same target:
- Specify the single bed file, for example: target.bed
- UNIFORM_TARGET_BED = target.bed
If not all individuals have the same target:
- Specify the file containing the sample id -> bed map, for example: targetMap.txt
- MULTIPLE_TARGET_MAP = targetMap.txt
- Each line of the file contains [SM_ID] [TARGET_BED]
- MULTIPLE_TARGET_MAP = targetMap.txt
Optional Settings:
- Extend the target region by a given number of bases, for example: 50
- OFFSET_OFF_TARGET = 50
- Exclude off-target regions when using samtools view (may make command line too long)
- SAMTOOLS_VIEW_TARGET_ONLY = TRUE
Reference Files
- Reference Sequence in fasta format.
- REF = path/file.fa
- Indel VCF File Prefix
- INDEL_PREFIX = path/indels.sites.hg19
- path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
- DBSNP File Prefix
- DBSNP_PREFIX = path/dbsnp_135_b37.rod
- path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
- HapMap3 polymorphic site prefix
- HM3_PREFIX = path/hapmap3.qc.poly
- path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed
Can be downloaded from: [| FTP Download of Full Resource Files]
INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix DBSNP_PREFIX = $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix HM3_PREFIX = $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix
Chromosome X Calling
- PED_INDEX = pedfile.ped
Running
Running umake is straightforward:
cd ~/myseq
/usr/local/biopipe/bin/umake --conf myconf ???
make -f [out-prefix].Makefile -j [# parallel jobs]