Difference between revisions of "GotCloud: Genetic Reference and Resource Files"
Line 9: | Line 9: | ||
* To look in a different directory, set your reference/resource file location by setting either of the following to that path: | * To look in a different directory, set your reference/resource file location by setting either of the following to that path: | ||
** <code>REF_DIR</code> in your configuration file | ** <code>REF_DIR</code> in your configuration file | ||
− | ** <code>-- | + | ** <code>--ref_dir</code> on the command-line |
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
Line 157: | Line 157: | ||
If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path: | If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path: | ||
* <code>REF_DIR</code> in your configuration file | * <code>REF_DIR</code> in your configuration file | ||
− | * <code>-- | + | * <code>--ref_dir</code> on the command-line |
'''Get the Resource Files''' | '''Get the Resource Files''' |
Revision as of 14:14, 23 October 2014
Genetic Reference and Resource Files
Back to parent: GotCloud
In order to run GotCloud, you need to provide Genetic Reference and Resource Files.
You can generate your own files or use the set available for download.
- By default, GotCloud looks for the reference/resource files in the
gotcloud.ref
subdirectory within the base GotCloud directory - To look in a different directory, set your reference/resource file location by setting either of the following to that path:
REF_DIR
in your configuration file--ref_dir
on the command-line
Description | Configuration Key | Default Value | Pipelines | Special Info |
---|---|---|---|---|
Reference fasta | REF | $(REF_DIR)/human.g1k.v37.fa | align, snpcall, indel | Additional Files Required |
DBSNP VCF File | DBSNP_VCF | $(REF_DIR)/dbsnp_135.b37.vcf.gz | align, snpcall | Must be tabixed |
HapMap3 VCF File | HM3_VCF | $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz | align, snpcall | Must be tabixed |
OMNI VCF File | OMNI_VCF | $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz | snpcall | Must be tabixed |
INDEL VCF File(s) | INDEL_PREFIX | $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 | snpcall | Must be tabixed |
INDEL_VCF | alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome |
Reference fasta Files
Reference Sequence in fasta format
- Contains reference base at each reference position
Configuration Key | Default Value |
---|---|
REF | $(REF_DIR)/human.g1k.v37.fa |
Pipeline | Use |
---|---|
align | mapping to reference, recalibration, quality control |
snpcall | pileup & identify variants, summarize filtered variants |
indel | discovery, genotyping |
Additional files generated from the reference fasta
In addition to the fasta file a few additional files generated from the fasta are required
- Already included with default reference files
- If you are using your own reference files, you will need to be sure to create these files
- Expected to be at the same location as the reference file
- Be sure to create these additional files using the version of tool being run by GotCloud (by default they are in the
gotcloud/bin/
directory) - In the commands below, replace
ref.fa
with the path/name of the reference fasta file
Pipeline | Step | Required Extensions | Command to Create | More Information |
---|---|---|---|---|
align, snpcall, indel | .fai | bin/samtools faidx ref.fa
| ||
align, snpcall, indel | -bs.umfa | If it does not already exist, GotCloud automatically creates this file in same directory as the REF file | ||
align | bwa mapping | .amb, .ann, .bwt, .pac, .sa | bin/bwa index ref.fa |
http://bio-bwa.sourceforge.net/bwa.shtml |
align | qplot | .winsize100.gc | bin/qplot --reference ref.fa |
NOTE: Ignore the error at the end of qplot that says:
FATAL ERROR - No SAM/BAM files provided, stopped! This error is due to using qplot to just generate a GC Content file and not also process a BAM file. |
DBSNP VCF File
VCF file containing known dbsnp variant positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
DBSNP_VCF | $(REF_DIR)/dbsnp_135.b37.vcf.gz |
Pipeline | Use |
---|---|
align | recalibration (exclude known dbsnps when generating recalibration tables) & qplot |
snpcall | generating filtered VCF summary statistics |
HapMap3 VCF File
HapMap3 Polymorphic Sites VCF File
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
HM3_VCF | $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
Pipeline | Use |
---|---|
align | verifyBamID (contamination checking) |
snpcall | generating filtered VCF summary statistics & positive example sites for SVM filtering |
OMNI VCF File
VCF file containing OMNI positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
OMNI_VCF | $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz |
Pipeline | Use |
---|---|
snpcall | positive example sites for SVM filtering |
INDEL VCF File(s)
VCF file containing known INDEL positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
INDEL_PREFIX | $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
INDEL_VCF | alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome |
Pipeline | Use |
---|---|
snpcall | used to filter variants that are too close to a known indel |
- Use
INDEL_PREFIX
ifpath/
contains a separate file for each chromosome in the format:indels.sites.hg19.chr#.vcf
for each#
chromosome being processed - Use
INDEL_VCF
if you have all chromosomes in a single VCF file (it can be, but does not have to be a gz file)
Downloadable Reference and Resource Files
Installing Genetic Reference and Resource Files
Choose a destination for these files and install them as shown below. We'll assume you will use gotcloud/gotcloud.ref. Replace gotcloud
with the path to where you installed gotcloud.
cd gotcloud # path to where you installed gotcloud
mkdir -p gotcloud.ref
cd gotcloud.ref
If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path:
REF_DIR
in your configuration file--ref_dir
on the command-line
Get the Resource Files GotCloud makes use of various reference and other genetic resource files. You are free to use your own files, of course, but we also are making the files we use available.
wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz
Install the Resource Files
tar xzf h37-db135-v3.tgz
rm -f h37-db135-v3.tgz