Line 6: |
Line 6: |
| | | |
| You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]]. | | You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]]. |
| + | * By default, GotCloud looks for the reference/resource files in the <code>gotcloud.ref</code> subdirectory within the base GotCloud directory |
| + | * To look in a different directory, set your reference/resource file location by setting either of the following to that path: |
| + | ** <code>REF_DIR</code> in your configuration file |
| + | ** <code>--ref_dir</code> on the command-line |
| | | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Line 22: |
Line 26: |
| | snpcall || Must be tabixed | | | snpcall || Must be tabixed |
| |- | | |- |
− | | rowspan="2"|[[#INDEL VCF File(s)|INDEL VCF File(s)]] || INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 || rowspan="2"|snpcall || rowspan="2"|Must be tabixed | + | | rowspan="2"|[[#INDEL VCF File(s)|INDEL VCF File(s)]] || INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 || rowspan="2"|snpcall || .chr#.vcf extension will be appended |
| |- | | |- |
− | | INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome'' | + | | INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome''||Must be tabixed |
| |} | | |} |
| | | |
Line 57: |
Line 61: |
| ! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information | | ! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information |
| |- | | |- |
− | | all || ||.fai || <code>bin/samtools faidx ref.fa</code> | + | | align, snpcall, indel || ||.fai || <code>bin/samtools faidx ref.fa</code> |
| |- | | |- |
− | | all || || -bs.umfa || automatically created in same directory as REF file by GotCloud | + | | align, snpcall, indel || || -bs.umfa || || If it does not already exist, GotCloud automatically creates this file in same directory as the REF file |
| |- | | |- |
| | align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml | | | align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml |
Line 73: |
Line 77: |
| === DBSNP VCF File === | | === DBSNP VCF File === |
| VCF file containing known dbsnp variant positions | | VCF file containing known dbsnp variant positions |
− | * Must be gzip'd and tabix'd | + | * Must be bgzip'd and tabix'd |
| | | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| ! Configuration Key !! Default Value | | ! Configuration Key !! Default Value |
| |- | | |- |
− | | DBSNP_VCF || $(REF_DIR)/dbsnp_135.b37.vcf.gz | + | | DBSNP_VCF || $(REF_DIR)/dbsnp_142.b37.vcf.gz |
| |} | | |} |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Line 90: |
Line 94: |
| === HapMap3 VCF File === | | === HapMap3 VCF File === |
| HapMap3 Polymorphic Sites VCF File | | HapMap3 Polymorphic Sites VCF File |
− | * Must be gzip'd and tabix'd | + | * Must be bgzip'd and tabix'd |
| | | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Line 107: |
Line 111: |
| === OMNI VCF File === | | === OMNI VCF File === |
| VCF file containing OMNI positions | | VCF file containing OMNI positions |
− | * Must be gzip'd and tabix'd | + | * Must be bgzip'd and tabix'd |
| | | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Line 121: |
Line 125: |
| | | |
| === INDEL VCF File(s) === | | === INDEL VCF File(s) === |
− | VCF file containing OMNI positions | + | VCF file containing known INDEL positions |
− | * Must be gzip'd and tabix'd
| |
| | | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
Line 134: |
Line 137: |
| ! Pipeline !! Use | | ! Pipeline !! Use |
| |- | | |- |
− | | snpcall || positive example sites for SVM filtering | + | | snpcall || used to filter variants that are too close to a known indel |
| |} | | |} |
| + | |
| + | * Use <code>INDEL_PREFIX</code> if <code>path/</code> contains a separate file for each chromosome in the format: <code>indels.sites.hg19.chr#.vcf</code> for each <code>#</code> chromosome being processed |
| + | * Use <code>INDEL_VCF</code> if you have all chromosomes in a single VCF file (it can be, but does not have to be a gz file) |
| | | |
| == Downloadable Reference and Resource Files == | | == Downloadable Reference and Resource Files == |
| + | * When running on Amazon, a default set of reference files are included in the GotCloud AMI in the default <code>REF_DIR</code> |
| + | |
| | | |
| '''Installing Genetic Reference and Resource Files''' | | '''Installing Genetic Reference and Resource Files''' |
− | Choose a destination for these files and install them as shown below. We'll assume you will use '''/usr/local/gotcloud.ref'''. If you use a different directory, replace /usr/local/gotcloud.ref with your path. | + | |
| + | Choose a destination for these files and install them as shown below. We'll assume you will use '''gotcloud/gotcloud.ref'''. Replace <code>gotcloud</code> with the path to where you installed gotcloud. |
| | | |
| <code> | | <code> |
− | <b>mkdir -p /usr/local/gotcloud.ref</b> # Where you want the files installed | + | <b>cd gotcloud</b> # path to where you installed gotcloud |
− | <b>cd /usr/local/gotcloud.ref</b>
| |
| </code> | | </code> |
| | | |
− | Note this path as you will need to set the variable '''REF_DIR''' in the configuration file for gotcloud.
| + | If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path: |
| + | * <code>REF_DIR</code> in your configuration file |
| + | * <code>--ref_dir</code> on the command-line |
| + | |
| | | |
| + | '''Get & Install the Resource Files''' |
| | | |
− | '''Get the Resource Files'''
| + | GotCloud makes use of various reference and other genetic resource files. |
− | The GotCloud Aligner and Umake makes use of various reference and other genetic resource files.
| |
| You are free to use your own files, of course, but we also are making the files we use available. | | You are free to use your own files, of course, but we also are making the files we use available. |
| | | |
− | <code> | + | <ul> |
− | # The easiest way to get the data:
| + | <li> <div id="h37-db135">Human reference 37, dbsnp 135:</div></li> |
| <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b> | | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b> |
− |
| |
− | # Another way:
| |
− | <b>ftp share.sph.umich.edu</b>
| |
− | Connected to share.sph.umich.edu.
| |
− | 220 (vsFTPd 2.3.5)
| |
− | Name (share.sph.umich.edu:tpg): <b>anonymous</b>
| |
− | 230 Login successful.
| |
− | Remote system type is UNIX.
| |
− | Using binary mode to transfer files.
| |
− | ftp> <b>prompt</b>
| |
− | Interactive mode off.
| |
− | ftp> <b>cd gotcloud</b>
| |
− | 250 Directory successfully changed.
| |
− | ftp> <b>mget ref/h37-db135-v3.tgz</b>
| |
− | ftp> <b>quit</b>
| |
− | 221 Goodbye.
| |
− | </code>
| |
− |
| |
− | '''Install the Resource Files'''
| |
− |
| |
− | <code>
| |
| <b>tar xzf h37-db135-v3.tgz</b> | | <b>tar xzf h37-db135-v3.tgz</b> |
| <b>rm -f h37-db135-v3.tgz</b> | | <b>rm -f h37-db135-v3.tgz</b> |
− | </code> | + | <li><div id="h37-db142">Human reference 37, dbsnp 142:</div></li> |
| + | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db142-v1.tgz</b> |
| + | <b>tar xzf h37-db142-v1.tgz</b> |
| + | <b>rm -f h37-db142-v1.tgz</b> |
| + | <li><div id="hs37d5-db142">Human reference 37 with decoy, dbsnp 142:</div></li> |
| + | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/hs37d5-db142-v1.tgz</b> |
| + | <b>tar xzf hs37d5-db142-v1.tgz</b> |
| + | <b>rm -f hs37d5-db142-v1.tgz</b> |
| + | </ul> |