Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 6: Line 6:     
You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]].
 
You can generate your own files or use the set available for [[#Downloadable Reference and Resource Files|download]].
 +
* By default, GotCloud looks for the reference/resource files in the <code>gotcloud.ref</code> subdirectory within the base GotCloud directory
 +
* To look in a different directory, set your reference/resource file location by setting either of the following to that path:
 +
** <code>REF_DIR</code> in your configuration file
 +
** <code>--ref_dir</code> on the command-line
    
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
Line 22: Line 26:  
| snpcall || Must be tabixed
 
| snpcall || Must be tabixed
 
|-
 
|-
| rowspan="2"|[[#INDEL VCF File(s)|INDEL VCF File(s)]] || INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 || rowspan="2"|snpcall || rowspan="2"|Must be tabixed
+
| rowspan="2"|[[#INDEL VCF File(s)|INDEL VCF File(s)]] || INDEL_PREFIX || $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 || rowspan="2"|snpcall || .chr#.vcf extension will be appended
 
|-
 
|-
| INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome''
+
| INDEL_VCF || ''alternate configuration setting if all INDEL sites are in a single VCF rather than broken up by chromosome''||Must be tabixed
 
|}
 
|}
   Line 57: Line 61:  
! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information
 
! Pipeline !! Step !! Required Extensions !! Command to Create !! More Information
 
|-
 
|-
| all || ||.fai || <code>bin/samtools faidx ref.fa</code>
+
| align, snpcall, indel || ||.fai || <code>bin/samtools faidx ref.fa</code>
 
|-
 
|-
| all || || -bs.umfa || automatically created in same directory as REF file by GotCloud
+
| align, snpcall, indel || || -bs.umfa || || If it does not already exist, GotCloud automatically creates this file in same directory as the REF file
 
|-
 
|-
 
| align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml  
 
| align || bwa mapping || .amb, .ann, .bwt, .pac, .sa || <code>bin/bwa index ref.fa</code> || http://bio-bwa.sourceforge.net/bwa.shtml  
Line 73: Line 77:  
=== DBSNP VCF File ===
 
=== DBSNP VCF File ===
 
VCF file containing known dbsnp variant positions
 
VCF file containing known dbsnp variant positions
* Must be gzip'd and tabix'd
+
* Must be bgzip'd and tabix'd
    
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
! Configuration Key !! Default Value
 
! Configuration Key !! Default Value
 
|-
 
|-
| DBSNP_VCF || $(REF_DIR)/dbsnp_135.b37.vcf.gz
+
| DBSNP_VCF || $(REF_DIR)/dbsnp_142.b37.vcf.gz
 
|}
 
|}
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
Line 90: Line 94:  
=== HapMap3 VCF File ===
 
=== HapMap3 VCF File ===
 
HapMap3 Polymorphic Sites VCF File
 
HapMap3 Polymorphic Sites VCF File
* Must be gzip'd and tabix'd
+
* Must be bgzip'd and tabix'd
    
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
Line 107: Line 111:  
=== OMNI VCF File ===
 
=== OMNI VCF File ===
 
VCF file containing OMNI positions
 
VCF file containing OMNI positions
* Must be gzip'd and tabix'd
+
* Must be bgzip'd and tabix'd
    
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
Line 121: Line 125:     
=== INDEL VCF File(s) ===
 
=== INDEL VCF File(s) ===
VCF file containing OMNI positions
+
VCF file containing known INDEL positions
* Must be gzip'd and tabix'd
      
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
Line 134: Line 137:  
! Pipeline !! Use
 
! Pipeline !! Use
 
|-
 
|-
| snpcall || positive example sites for SVM filtering
+
| snpcall || used to filter variants that are too close to a known indel
 
|}
 
|}
 +
 +
* Use <code>INDEL_PREFIX</code> if <code>path/</code> contains a separate file for each chromosome in the format: <code>indels.sites.hg19.chr#.vcf</code> for each <code>#</code> chromosome being processed
 +
* Use <code>INDEL_VCF</code> if you have all chromosomes in a single VCF file (it can be, but does not have to be a gz file)
    
== Downloadable Reference and Resource Files ==
 
== Downloadable Reference and Resource Files ==
 +
* When running on Amazon, a default set of reference files are included in the GotCloud AMI in the default <code>REF_DIR</code>
 +
    
'''Installing Genetic Reference and Resource Files'''
 
'''Installing Genetic Reference and Resource Files'''
Choose a destination for these files and install them as shown below.  We'll assume you will use '''/usr/local/gotcloud.ref'''.  If you use a different directory, replace /usr/local/gotcloud.ref with your path.
+
 
 +
Choose a destination for these files and install them as shown below.  We'll assume you will use '''gotcloud/gotcloud.ref'''.  Replace <code>gotcloud</code> with the path to where you installed gotcloud.
    
<code>
 
<code>
  <b>mkdir -p /usr/local/gotcloud.ref</b>   # Where you want the files installed
+
  <b>cd gotcloud</b>   # path to where you installed gotcloud
<b>cd /usr/local/gotcloud.ref</b>
   
</code>
 
</code>
   −
Note this path as you will need to set the variable '''REF_DIR''' in the configuration file for gotcloud.
+
If you use a path other than a gotcloud.ref subdirectory of gotcloud, note this path as you will need to set either of the following to the installation path:
 +
* <code>REF_DIR</code> in your configuration file
 +
* <code>--ref_dir</code> on the command-line
 +
 
    +
'''Get & Install the Resource Files'''
   −
'''Get the Resource Files'''
+
GotCloud makes use of various reference and other genetic resource files.
The GotCloud Aligner and Umake makes use of various reference and other genetic resource files.
   
You are free to use your own files, of course, but we also are making the files we use available.
 
You are free to use your own files, of course, but we also are making the files we use available.
   −
<code>
+
<ul>
#  The easiest way to get the data:
+
<li> <div id="h37-db135">Human reference 37, dbsnp 135:</div></li>
 
  <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b>
 
  <b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz</b>
  −
#  Another way:
  −
<b>ftp share.sph.umich.edu</b>
  −
Connected to share.sph.umich.edu.
  −
220 (vsFTPd 2.3.5)
  −
Name (share.sph.umich.edu:tpg): <b>anonymous</b>
  −
230 Login successful.
  −
Remote system type is UNIX.
  −
Using binary mode to transfer files.
  −
ftp> <b>prompt</b>
  −
Interactive mode off.
  −
ftp> <b>cd gotcloud</b>
  −
250 Directory successfully changed.
  −
ftp> <b>mget ref/h37-db135-v3.tgz</b>
  −
ftp> <b>quit</b>
  −
221 Goodbye.
  −
</code>
  −
  −
'''Install the Resource Files'''
  −
  −
<code>
   
  <b>tar xzf h37-db135-v3.tgz</b>
 
  <b>tar xzf h37-db135-v3.tgz</b>
 
  <b>rm -f h37-db135-v3.tgz</b>
 
  <b>rm -f h37-db135-v3.tgz</b>
</code>
+
<li><div id="h37-db142">Human reference 37, dbsnp 142:</div></li>
 +
<b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db142-v1.tgz</b>
 +
<b>tar xzf h37-db142-v1.tgz</b>
 +
<b>rm -f h37-db142-v1.tgz</b>
 +
<li><div id="hs37d5-db142">Human reference 37 with decoy, dbsnp 142:</div></li>
 +
<b>wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/hs37d5-db142-v1.tgz</b>
 +
<b>tar xzf hs37d5-db142-v1.tgz</b>
 +
<b>rm -f hs37d5-db142-v1.tgz</b>
 +
</ul>

Navigation menu