From Genome Analysis Wiki
Jump to: navigation, search


This is the configuration file for the variant calling pipeline in the GotCloud Tutorial

The configuration file contains KEY = VALUE settings that override defaults and set specific values for the given run.

CHRS = 20
BAM_INDEX = GBR60bam.index
# References
REF_ROOT = chr20Ref
REF = $(REF_ROOT)/human_g1k_v37_chr20.fa
INDEL_PREFIX = $(REF_ROOT)/1kg.pilot_release.merged.indels.sites.hg19
DBSNP_VCF =  $(REF_ROOT)/dbsnp135_chr20.vcf.gz
HM3_VCF =  $(REF_ROOT)/hapmap_3.3.b37.sites.chr20.vcf.gz

This configuration file sets:

  • CHRS - this specifies which chromosomes to process
    • Leave this out of your configuration file if you want to process all chromosomes (1-22, X, Y)
  • BAM_INDEX - file containing the samples & BAMs to be processed
  • Reference Information:
    • AS - assembly value to put in the BAM
    • FA_REF - the reference file (.fa extension), the additional files should be at the same location:
      • human_g1k_v37_chr20-bs.umfa
      • human_g1k_v37_chr20.fa
      • human_g1k_v37_chr20.fa.fai
    • INDEL_PREFIX - indel information
    • DBSNP_VCF - a vcf containing the dbsnp positions
    • HM3_VCF - hapmap vcf

For running your own test, update the INDEX_FILE to point to your index file and the reference values to point to your references.

This example uses reference files that are chr20 only in order to speed processing of the tutorial data. If you are using the default references, you may just need to update REF_DIR to the directory where they are installed. Full Reference files can be downloaded from GotCloudReference.

It is recommended that you use absolute paths. (This example does not use absolute paths in order to be flexible to where the data is installed, but using relative paths require it to be run from the correct directory.)


The index file contains at least 3 tab-separated columns

  1. Sample name
  2. Population
    • can be a comma separated list of populations
    • specify ALL if you don't know the population or if you aren't interested in population specific information
  3. BAM file name

If you have more than one BAM file for each sample, separate them by tabs.

HG00096	GBR	bams/HG00096.bam
HG00100	GBR	bams/HG00100.bam
HG00103	GBR	bams/HG00103.bam
HG00106	GBR	bams/HG00106.bam
HG00108	GBR	bams/HG00108.bam
HG00111	GBR	bams/HG00111.bam
HG00112	GBR	bams/HG00112.bam
HG00114	GBR	bams/HG00114.bam
HG00115	GBR	bams/HG00115.bam
HG00116	GBR	bams/HG00116.bam
HG00117	GBR	bams/HG00117.bam
HG00118	GBR	bams/HG00118.bam
HG00119	GBR	bams/HG00119.bam
HG00120	GBR	bams/HG00120.bam
HG00122	GBR	bams/HG00122.bam
HG00123	GBR	bams/HG00123.bam
HG00124	GBR	bams/HG00124.bam
HG00125	GBR	bams/HG00125.bam
HG00126	GBR	bams/HG00126.bam
HG00127	GBR	bams/HG00127.bam
HG00131	GBR	bams/HG00131.bam
HG00133	GBR	bams/HG00133.bam
HG00136	GBR	bams/HG00136.bam
HG00137	GBR	bams/HG00137.bam
HG00138	GBR	bams/HG00138.bam
HG00139	GBR	bams/HG00139.bam
HG00140	GBR	bams/HG00140.bam
HG00141	GBR	bams/HG00141.bam
HG00142	GBR	bams/HG00142.bam
HG00143	GBR	bams/HG00143.bam
HG00145	GBR	bams/HG00145.bam
HG00146	GBR	bams/HG00146.bam
HG00148	GBR	bams/HG00148.bam
HG00149	GBR	bams/HG00149.bam
HG00150	GBR	bams/HG00150.bam
HG00151	GBR	bams/HG00151.bam
HG00152	GBR	bams/HG00152.bam
HG00154	GBR	bams/HG00154.bam
HG00155	GBR	bams/HG00155.bam
HG00156	GBR	bams/HG00156.bam
HG00157	GBR	bams/HG00157.bam
HG00158	GBR	bams/HG00158.bam
HG00159	GBR	bams/HG00159.bam
HG00160	GBR	bams/HG00160.bam
HG00231	GBR	bams/HG00231.bam
HG00232	GBR	bams/HG00232.bam
HG00233	GBR	bams/HG00233.bam
HG00239	GBR	bams/HG00239.bam
HG00242	GBR	bams/HG00242.bam
HG00243	GBR	bams/HG00243.bam
HG00244	GBR	bams/HG00244.bam
HG00245	GBR	bams/HG00245.bam
HG00246	GBR	bams/HG00246.bam
HG00247	GBR	bams/HG00247.bam
HG00249	GBR	bams/HG00249.bam
HG00250	GBR	bams/HG00250.bam
HG00251	GBR	bams/HG00251.bam
HG00252	GBR	bams/HG00252.bam
HG00253	GBR	bams/HG00253.bam
HG00254	GBR	bams/HG00254.bam

This example uses relative paths, but for greatest flexibility, absolute paths are recommended.