Difference between revisions of "GotCloud: Genetic Reference and Resource Files"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 11: Line 11:
  
 
=== Human Reference Files ===
 
=== Human Reference Files ===
 +
Reference Sequence in fasta format
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Configuration Key !! Default Value
 +
|-
 +
| REF || $(REF_DIR)/human.g1k.v37.fa
 +
|}
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Pipeline !! Use
 +
|-
 +
| align || mapping to reference, recalibration, quality control
 +
|-
 +
| snpcall || pileup & identify variants, summarize filtered variants
 +
|-
 +
| indel || discovery, genotyping
 +
|}
 +
 +
All pipelines require additional files:
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Step !! Required Extensions !! Command to Create
 +
|-
 +
|all || -bs.umfa || automatically created in same directory as REF file by GotCloud
 +
|}
 +
 +
 +
The alignment pipeline requires additional files:
 +
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
! Step !! Required Extensions !! Command to Create
 +
| bwa mapping || .amb, .ann, .bwt, .pac, .sa || bin/bwa index ref.fa
 +
|-
 +
| qplot || .fa.GCcontent
 +
|}
 +
  
 
=== DBSNP VCF File ===
 
=== DBSNP VCF File ===
  
=== ===
+
=== HapMap3 Polymorphic Sites VCF File ===
 +
 
 +
=== INDEL VCF File(s) ===
 +
 
 +
=== OMNI VCF File ===
  
 
= Downloadable Reference and Resource Files =
 
= Downloadable Reference and Resource Files =

Revision as of 11:51, 23 October 2014

Genetic Reference and Resource Files

Back to parent: GotCloud

In order to run GotCloud, you need to provide Genetic Reference and Resource Files.

You can generate your own files or use the set available for download.


Required Files

Human Reference Files

Reference Sequence in fasta format

Configuration Key Default Value
REF $(REF_DIR)/human.g1k.v37.fa
Pipeline Use
align mapping to reference, recalibration, quality control
snpcall pileup & identify variants, summarize filtered variants
indel discovery, genotyping

All pipelines require additional files:

Step Required Extensions Command to Create
all -bs.umfa automatically created in same directory as REF file by GotCloud


The alignment pipeline requires additional files:

Step Required Extensions Command to Create bwa mapping .amb, .ann, .bwt, .pac, .sa bin/bwa index ref.fa
qplot .fa.GCcontent


DBSNP VCF File

HapMap3 Polymorphic Sites VCF File

INDEL VCF File(s)

OMNI VCF File

Downloadable Reference and Resource Files

Installing Genetic Reference and Resource Files Choose a destination for these files and install them as shown below. We'll assume you will use /usr/local/gotcloud.ref. If you use a different directory, replace /usr/local/gotcloud.ref with your path.

mkdir -p /usr/local/gotcloud.ref    # Where you want the files installed
cd /usr/local/gotcloud.ref

Note this path as you will need to set the variable REF_DIR in the configuration file for gotcloud.


Get the Resource Files The GotCloud Aligner and Umake makes use of various reference and other genetic resource files. You are free to use your own files, of course, but we also are making the files we use available.

#  The easiest way to get the data:
wget ftp://anonymous@share.sph.umich.edu/gotcloud/ref/h37-db135-v3.tgz
#  Another way:
ftp share.sph.umich.edu
Connected to share.sph.umich.edu.
220 (vsFTPd 2.3.5)
Name (share.sph.umich.edu:tpg): anonymous
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> prompt
Interactive mode off.
ftp> cd gotcloud
250 Directory successfully changed.
ftp> mget ref/h37-db135-v3.tgz
ftp> quit
221 Goodbye.

Install the Resource Files

tar xzf h37-db135-v3.tgz
rm -f h37-db135-v3.tgz

Using Your own Reference Files

Human Reference

Generating BWA Reference Files

Use "bwa index" to generate the human reference files with the required extensions:

  • .amb
  • .ann
  • .bwt
  • .pac
  • .sa

See http://bio-bwa.sourceforge.net/bwa.shtml for more information about using "bwa index".

Generating Reference Index Files

Use "samtools faidx" to generate the human reference files with the required extensions:

  • .fai

Generating GC Content File

The GC Content file is used by QPLOT. It is assumed to be at the same location as the reference file.

If the reference file is at path/ref.fa, the GC Content file is expected to be:path/ref.winsize100.gc


To generate the GC content file, run qplot:

GOTCLOUD_DIR/bin/qplot --reference reference.fa --winsize windowSize
  • Replace reference.fa with the name of your human reference fasta file.
  • Replace windowSize with your desired window size, or leave out --winsize to use the default (100).

NOTE: You will get an error at the end of qplot that says:

FATAL ERROR - 
No SAM/BAM files provided, stopped!

This error is due to using qplot to just generate a GC Content file and not also process a BAM file.

But it was successful as long as you see (where reference is the name of your reference file):

GC content file [ reference.winsize100.gc ] created.


See QPLOT: InputFiles for more information.