Difference between revisions of "GotCloud: Variant Calling Pipeline"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 16: Line 16:
  
 
=== Configuration File ===
 
=== Configuration File ===
 +
A default configuration file is automatically loaded.  Users must specify their own configuration file specifying just the values different than the defaults.
 +
 +
Comments begin with a <code>#</code>
 +
 +
Format: KEY = value
 +
 +
Where KEY is the item being set and value is its new value
 +
 +
 +
====Required User Config Files Settings====
 +
The following Config File Settings must be specified by the user:
 +
* CHRS = # space separated list of chromosomes you want
 +
* BAM_INDEX = # path to the Index File of BAMs
 +
 +
====Required on Command-Line or in Config File====
 +
The following Command-Line or Config File Settings must be specified by the user:
 +
* --outdir/OUTDIR= # path to desired output directory
 +
 +
====Targeted/Exome Sequencing Settings====
 +
If you are running Targeted/Exome Sequencing, the user should specify:
 +
* Write loci file when performing pileup
 +
** WRITE_TARGET_LOCI = TRUE
 +
* Specify the directory to store target information, for example: targetDir
 +
** TARGET_DIR = targetDir
 +
 +
If all individuals have the same target:
 +
* Specify the single bed file, for example: target.bed
 +
** UNIFORM_TARGET_BED = target.bed
 +
 +
If not all individuals have the same target:
 +
* Specify the file containing the sample id -> bed map, for example: targetMap.txt
 +
** MULTIPLE_TARGET_MAP = targetMap.txt
 +
*** Each line of the file contains [SM_ID] [TARGET_BED]
 +
 +
Optional Settings:
 +
* Extend the target region by a given number of bases, for example: 50
 +
** OFFSET_OFF_TARGET = 50
 +
*  Exclude off-target regions when using samtools view (may make command line too long)
 +
** SAMTOOLS_VIEW_TARGET_ONLY = TRUE
 +
 +
 +
==== Reference Files ====
 +
* Reference Sequence in fasta format.
 +
** REF = path/file.fa
 +
* Indel VCF File Prefix
 +
** INDEL_PREFIX = path/indels.sites.hg19
 +
** path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
 +
* DBSNP File Prefix
 +
** DBSNP_PREFIX = path/dbsnp_135_b37.rod
 +
** path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
 +
* HapMap3 polymorphic site prefix
 +
** HM3_PREFIX = path/hapmap3.qc.poly
 +
** path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed
 +
 +
Can be downloaded from: [[ftp://share.sph.umich.edu/1000genomes/umake-resources/ | FTP Download of Full Resource Files]]
 +
 +
INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix
 +
DBSNP_PREFIX =  $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix
 +
HM3_PREFIX =  $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix
 +
 +
==== Chromosome X Calling ====
 +
* PED_INDEX = pedfile.ped
 +
  
 
== Running ==
 
== Running ==

Revision as of 00:50, 6 November 2012

Back to the beginning [1]

The Variant Calling Pipeline (UMAKE) takes recalibrated BAM files and detects SNPs and calls their genotypes, producing VCF files.

Input Data:

  • Aligned/Processed/Recalibrated BAM files
  • Index file containing Sample IDs & BAM file names
  • Reference files
  • (Optional) Configuration file to override default options

BAM files

Index file

Reference Files

Configuration File

A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults.

Comments begin with a #

Format: KEY = value

Where KEY is the item being set and value is its new value


Required User Config Files Settings

The following Config File Settings must be specified by the user:

  • CHRS = # space separated list of chromosomes you want
  • BAM_INDEX = # path to the Index File of BAMs

Required on Command-Line or in Config File

The following Command-Line or Config File Settings must be specified by the user:

  • --outdir/OUTDIR= # path to desired output directory

Targeted/Exome Sequencing Settings

If you are running Targeted/Exome Sequencing, the user should specify:

  • Write loci file when performing pileup
    • WRITE_TARGET_LOCI = TRUE
  • Specify the directory to store target information, for example: targetDir
    • TARGET_DIR = targetDir

If all individuals have the same target:

  • Specify the single bed file, for example: target.bed
    • UNIFORM_TARGET_BED = target.bed

If not all individuals have the same target:

  • Specify the file containing the sample id -> bed map, for example: targetMap.txt
    • MULTIPLE_TARGET_MAP = targetMap.txt
      • Each line of the file contains [SM_ID] [TARGET_BED]

Optional Settings:

  • Extend the target region by a given number of bases, for example: 50
    • OFFSET_OFF_TARGET = 50
  • Exclude off-target regions when using samtools view (may make command line too long)
    • SAMTOOLS_VIEW_TARGET_ONLY = TRUE


Reference Files

  • Reference Sequence in fasta format.
    • REF = path/file.fa
  • Indel VCF File Prefix
    • INDEL_PREFIX = path/indels.sites.hg19
    • path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
  • DBSNP File Prefix
    • DBSNP_PREFIX = path/dbsnp_135_b37.rod
    • path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
  • HapMap3 polymorphic site prefix
    • HM3_PREFIX = path/hapmap3.qc.poly
    • path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed

Can be downloaded from: [| FTP Download of Full Resource Files]

INDEL_PREFIX = $(UMAKE_ROOT)/ref/indels/1kg.pilot_release.merged.indels.sites.hg19 # 1000 Genomes Pilot 1 indel VCF prefix DBSNP_PREFIX = $(UMAKE_ROOT)/ref/dbSNP/dbsnp_135_b37.rod # dbSNP file prefix HM3_PREFIX = $(UMAKE_ROOT)/ref/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly # HapMap3 polymorphic site prefix

Chromosome X Calling

  • PED_INDEX = pedfile.ped


Running

Running umake is straightforward:

cd ~/myseq
/usr/local/biopipe/bin/umake --conf myconf ???
make -f [out-prefix].Makefile -j [# parallel jobs]