Difference between revisions of "GotCloud: Genetic Reference and Resource Files"
Line 73: | Line 73: | ||
=== DBSNP VCF File === | === DBSNP VCF File === | ||
VCF file containing known dbsnp variant positions | VCF file containing known dbsnp variant positions | ||
− | * Must be | + | * Must be bgzip'd and tabix'd |
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
Line 90: | Line 90: | ||
=== HapMap3 VCF File === | === HapMap3 VCF File === | ||
HapMap3 Polymorphic Sites VCF File | HapMap3 Polymorphic Sites VCF File | ||
− | * Must be | + | * Must be bgzip'd and tabix'd |
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
Line 107: | Line 107: | ||
=== OMNI VCF File === | === OMNI VCF File === | ||
VCF file containing OMNI positions | VCF file containing OMNI positions | ||
− | * Must be | + | * Must be bgzip'd and tabix'd |
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | ||
Line 122: | Line 122: | ||
=== INDEL VCF File(s) === | === INDEL VCF File(s) === | ||
VCF file containing OMNI positions | VCF file containing OMNI positions | ||
− | * Must be | + | * Must be bgzip'd and tabix'd |
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Revision as of 13:47, 23 October 2014
Genetic Reference and Resource Files
Back to parent: GotCloud
In order to run GotCloud, you need to provide Genetic Reference and Resource Files.
You can generate your own files or use the set available for download.
Description | Configuration Key | Default Value | Pipelines | Special Info |
---|---|---|---|---|
Reference fasta | REF | $(REF_DIR)/human.g1k.v37.fa | align, snpcall, indel | Additional Files Required |
DBSNP VCF File | DBSNP_VCF | $(REF_DIR)/dbsnp_135.b37.vcf.gz | align, snpcall | Must be tabixed |
HapMap3 VCF File | HM3_VCF | $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz | align, snpcall | Must be tabixed |
OMNI VCF File | OMNI_VCF | $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz | snpcall | Must be tabixed |
INDEL VCF File(s) | INDEL_PREFIX | $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 | snpcall | Must be tabixed |
INDEL_VCF | alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome |
Reference fasta Files
Reference Sequence in fasta format
- Contains reference base at each reference position
Configuration Key | Default Value |
---|---|
REF | $(REF_DIR)/human.g1k.v37.fa |
Pipeline | Use |
---|---|
align | mapping to reference, recalibration, quality control |
snpcall | pileup & identify variants, summarize filtered variants |
indel | discovery, genotyping |
Additional files generated from the reference fasta
In addition to the fasta file a few additional files generated from the fasta are required
- Already included with default reference files
- If you are using your own reference files, you will need to be sure to create these files
- Expected to be at the same location as the reference file
- Be sure to create these additional files using the version of tool being run by GotCloud (by default they are in the
gotcloud/bin/
directory) - In the commands below, replace
ref.fa
with the path/name of the reference fasta file
Pipeline | Step | Required Extensions | Command to Create | More Information |
---|---|---|---|---|
all | .fai | bin/samtools faidx ref.fa
| ||
all | -bs.umfa | automatically created in same directory as REF file by GotCloud | ||
align | bwa mapping | .amb, .ann, .bwt, .pac, .sa | bin/bwa index ref.fa |
http://bio-bwa.sourceforge.net/bwa.shtml |
align | qplot | .winsize100.gc | bin/qplot --reference ref.fa |
NOTE: Ignore the error at the end of qplot that says:
FATAL ERROR - No SAM/BAM files provided, stopped! This error is due to using qplot to just generate a GC Content file and not also process a BAM file. |
DBSNP VCF File
VCF file containing known dbsnp variant positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
DBSNP_VCF | $(REF_DIR)/dbsnp_135.b37.vcf.gz |
Pipeline | Use |
---|---|
align | recalibration (exclude known dbsnps when generating recalibration tables) & qplot |
snpcall | generating filtered VCF summary statistics |
HapMap3 VCF File
HapMap3 Polymorphic Sites VCF File
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
HM3_VCF | $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
Pipeline | Use |
---|---|
align | verifyBamID (contamination checking) |
snpcall | generating filtered VCF summary statistics & positive example sites for SVM filtering |
OMNI VCF File
VCF file containing OMNI positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
OMNI_VCF | $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz |
Pipeline | Use |
---|---|
snpcall | positive example sites for SVM filtering |
INDEL VCF File(s)
VCF file containing OMNI positions
- Must be bgzip'd and tabix'd
Configuration Key | Default Value |
---|---|
INDEL_PREFIX | $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
INDEL_VCF | alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome |
Pipeline | Use |
---|---|
snpcall | positive example sites for SVM filtering |
Downloadable Reference and Resource Files
Installing Genetic Reference and Resource Files Choose a destination for these files and install them as shown below. We'll assume you will use /usr/local/gotcloud.ref. If you use a different directory, replace /usr/local/gotcloud.ref with your path.
mkdir -p /usr/local/gotcloud.ref # Where you want the files installed
cd /usr/local/gotcloud.ref
Note this path as you will need to set the variable REF_DIR in the configuration file for gotcloud.
Get the Resource Files
The GotCloud Aligner and Umake makes use of various reference and other genetic resource files.
You are free to use your own files, of course, but we also are making the files we use available.
# The easiest way to get the data:
wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz
# Another way:
ftp share.sph.umich.edu
Connected to share.sph.umich.edu.
220 (vsFTPd 2.3.5)
Name (share.sph.umich.edu:tpg): anonymous
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> prompt
Interactive mode off.
ftp> cd gotcloud
250 Directory successfully changed.
ftp> mget ref/h37-db135-v3.tgz
ftp> quit
221 Goodbye.
Install the Resource Files
tar xzf h37-db135-v3.tgz
rm -f h37-db135-v3.tgz