Line 1: |
Line 1: |
− | = Genetic Reference and Resource Files = | + | == Genetic Reference and Resource Files == |
| | | |
| Back to parent: [[GotCloud]] | | Back to parent: [[GotCloud]] |
Line 6: |
Line 6: |
| | | |
| You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]]. | | You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]]. |
| + | * By default, GotCloud looks for the reference/resource files in the <code>gotcloud.ref</code> subdirectory within the base GotCloud directory |
| + | * To look in a different directory, set your reference/resource file location by setting either of the following to that path: |
| + | ** <code>REF_DIR</code> in your configuration file |
| + | ** <code>--ref_dir</code> on the command-line |
| | | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Description !! Configuration Key !! Default Value !! Pipelines !! Special Info |
| + | |- |
| + | | [[#Reference fasta Files| Reference fasta]] || REF || $(REF_DIR)/human.g1k.v37.fa |
| + | | align, snpcall, indel || [[#Additional files generated from the reference fasta|Additional Files Required]] |
| + | |- |
| + | | [[#DBSNP VCF File|DBSNP VCF File]] || DBSNP_VCF || $(REF_DIR)/dbsnp_135.b37.vcf.gz |
| + | | align, snpcall || Must be tabixed |
| + | |- |
| + | | [[#HapMap3 VCF File|HapMap3 VCF File]] || HM3_VCF || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
| + | | align, snpcall || Must be tabixed |
| + | |- |
| + | | [[#OMNI VCF File|OMNI VCF File]] || OMNI_VCF || $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz |
| + | | snpcall || Must be tabixed |
| + | |- |
| + | | rowspan="2"|[[#INDEL VCF File(s)|INDEL VCF File(s)]] || INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 || rowspan="2"|snpcall || .chr#.vcf extension will be appended |
| + | |- |
| + | | INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome''||Must be tabixed |
| + | |} |
| | | |
− | == Required Files ==
| |
| | | |
− | === Human Reference Files === | + | === Reference fasta Files === |
| Reference Sequence in fasta format | | Reference Sequence in fasta format |
| + | * Contains reference base at each reference position |
| + | |
| {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" | | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| ! Configuration Key !! Default Value | | ! Configuration Key !! Default Value |
Line 31: |
Line 55: |
| * Already included with default reference files | | * Already included with default reference files |
| * If you are using your own reference files, you will need to be sure to create these files | | * If you are using your own reference files, you will need to be sure to create these files |
| + | ** Expected to be at the same location as the reference file |
| ** Be sure to create these additional files using the version of tool being run by GotCloud (by default they are in the <code>gotcloud/bin/</code> directory) | | ** Be sure to create these additional files using the version of tool being run by GotCloud (by default they are in the <code>gotcloud/bin/</code> directory) |
| ** In the commands below, replace <code>ref.fa</code> with the path/name of the reference fasta file | | ** In the commands below, replace <code>ref.fa</code> with the path/name of the reference fasta file |
Line 36: |
Line 61: |
| ! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information | | ! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information |
| |- | | |- |
− | | all || ||.fai || <code>bin/samtools faidx ref.fa</code> | + | | align, snpcall, indel || ||.fai || <code>bin/samtools faidx ref.fa</code> |
| |- | | |- |
− | | all || || -bs.umfa || automatically created in same directory as REF file by GotCloud | + | | align, snpcall, indel || || -bs.umfa || || If it does not already exist, GotCloud automatically creates this file in same directory as the REF file |
| |- | | |- |
| | align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml | | | align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml |
Line 51: |
Line 76: |
| | | |
| === DBSNP VCF File === | | === DBSNP VCF File === |
| + | VCF file containing known dbsnp variant positions |
| + | * Must be bgzip'd and tabix'd |
| | | |
− | === HapMap3 Polymorphic Sites VCF File === | + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Configuration Key !! Default Value |
| + | |- |
| + | | DBSNP_VCF || $(REF_DIR)/dbsnp_142.b37.vcf.gz |
| + | |} |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Pipeline !! Use |
| + | |- |
| + | | align || recalibration (exclude known dbsnps when generating recalibration tables) & qplot |
| + | |- |
| + | | snpcall || generating filtered VCF summary statistics |
| + | |} |
| + | |
| + | === HapMap3 VCF File === |
| + | HapMap3 Polymorphic Sites VCF File |
| + | * Must be bgzip'd and tabix'd |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Configuration Key !! Default Value |
| + | |- |
| + | | HM3_VCF || $(REF_DIR)/hapmap_3.3.b37.sites.vcf.gz |
| + | |} |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Pipeline !! Use |
| + | |- |
| + | | align || verifyBamID (contamination checking) |
| + | |- |
| + | | snpcall || generating filtered VCF summary statistics & positive example sites for SVM filtering |
| + | |} |
| + | |
| + | === OMNI VCF File === |
| + | VCF file containing OMNI positions |
| + | * Must be bgzip'd and tabix'd |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Configuration Key !! Default Value |
| + | |- |
| + | | OMNI_VCF || $(REF_DIR)/1000G_omni2.5.b37.sites.PASS.vcf.gz |
| + | |} |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Pipeline !! Use |
| + | |- |
| + | | snpcall || positive example sites for SVM filtering |
| + | |} |
| | | |
| === INDEL VCF File(s) === | | === INDEL VCF File(s) === |
| + | VCF file containing known INDEL positions |
| + | |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Configuration Key !! Default Value |
| + | |- |
| + | | INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
| + | |- |
| + | | INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome'' |
| + | |} |
| + | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
| + | ! Pipeline !! Use |
| + | |- |
| + | | snpcall || used to filter variants that are too close to a known indel |
| + | |} |
| + | |
| + | * Use <code>INDEL_PREFIX</code> if <code>path/</code> contains a separate file for each chromosome in the format: <code>indels.sites.hg19.chr#.vcf</code> for each <code>#</code> chromosome being processed |
| + | * Use <code>INDEL_VCF</code> if you have all chromosomes in a single VCF file (it can be, but does not have to be a gz file) |
| | | |
− | === OMNI VCF File === | + | == Downloadable Reference and Resource Files == |
| + | * When running on Amazon, a default set of reference files are included in the GotCloud AMI in the default <code>REF_DIR</code> |
| | | |
− | = Downloadable Reference and Resource Files =
| |
| | | |
| '''Installing Genetic Reference and Resource Files''' | | '''Installing Genetic Reference and Resource Files''' |
− | Choose a destination for these files and install them as shown below. We'll assume you will use '''/usr/local/gotcloud.ref'''. If you use a different directory, replace /usr/local/gotcloud.ref with your path. | + | |
| + | Choose a destination for these files and install them as shown below. We'll assume you will use '''gotcloud/gotcloud.ref'''. Replace <code>gotcloud</code> with the path to where you installed gotcloud. |
| | | |
| <code> | | <code> |
− | <b>mkdir -p /usr/local/gotcloud.ref</b> # Where you want the files installed | + | <b>cd gotcloud</b> # path to where you installed gotcloud |
− | <b>cd /usr/local/gotcloud.ref</b>
| |
| </code> | | </code> |
| | | |
− | Note this path as you will need to set the variable '''REF_DIR''' in the configuration file for gotcloud.
| + | If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path: |
| + | * <code>REF_DIR</code> in your configuration file |
| + | * <code>--ref_dir</code> on the command-line |
| | | |
| | | |
− | '''Get the Resource Files''' | + | '''Get & Install the Resource Files''' |
− | The GotCloud Aligner and Umake makes use of various reference and other genetic resource files.
| + | |
| + | GotCloud makes use of various reference and other genetic resource files. |
| You are free to use your own files, of course, but we also are making the files we use available. | | You are free to use your own files, of course, but we also are making the files we use available. |
| | | |
− | <code> | + | <ul> |
− | # The easiest way to get the data:
| + | <li> <div id="h37-db135">Human reference 37, dbsnp 135:</div></li> |
| <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b> | | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b> |
− |
| |
− | # Another way:
| |
− | <b>ftp share.sph.umich.edu</b>
| |
− | Connected to share.sph.umich.edu.
| |
− | 220 (vsFTPd 2.3.5)
| |
− | Name (share.sph.umich.edu:tpg): <b>anonymous</b>
| |
− | 230 Login successful.
| |
− | Remote system type is UNIX.
| |
− | Using binary mode to transfer files.
| |
− | ftp> <b>prompt</b>
| |
− | Interactive mode off.
| |
− | ftp> <b>cd gotcloud</b>
| |
− | 250 Directory successfully changed.
| |
− | ftp> <b>mget ref/h37-db135-v3.tgz</b>
| |
− | ftp> <b>quit</b>
| |
− | 221 Goodbye.
| |
− | </code>
| |
− |
| |
− | '''Install the Resource Files'''
| |
− |
| |
− | <code>
| |
| <b>tar xzf h37-db135-v3.tgz</b> | | <b>tar xzf h37-db135-v3.tgz</b> |
| <b>rm -f h37-db135-v3.tgz</b> | | <b>rm -f h37-db135-v3.tgz</b> |
− | </code> | + | <li><div id="h37-db142">Human reference 37, dbsnp 142:</div></li> |
| + | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db142-v1.tgz</b> |
| + | <b>tar xzf h37-db142-v1.tgz</b> |
| + | <b>rm -f h37-db142-v1.tgz</b> |
| + | <li><div id="hs37d5-db142">Human reference 37 with decoy, dbsnp 142:</div></li> |
| + | <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/hs37d5-db142-v1.tgz</b> |
| + | <b>tar xzf hs37d5-db142-v1.tgz</b> |
| + | <b>rm -f hs37d5-db142-v1.tgz</b> |
| + | </ul> |