Difference between revisions of "GotCloud: Genetic Reference and Resource Files"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 88: Line 88:
 
* .sa
 
* .sa
  
See http://bio-bwa.sourceforge.net/bwa.shtml for more information.
+
See http://bio-bwa.sourceforge.net/bwa.shtml for more information about using "bwa index".
  
 
=== Generating GCContent File ===
 
=== Generating GCContent File ===

Revision as of 12:36, 5 April 2013

Genetic Reference and Resource Files

Back to parent: GotCloud

In order to run GotCloud, you need to provide Genetic Reference and Resource Files.

You can generate your own files or use the set available for download.


Required Files

Human Reference Files

DBSNP VCF File

Downloadable Reference and Resource Files

Installing Genetic Reference and Resource Files


Get the Resource Files The GotCloud Aligner and Umake makes use of various reference and other genetic resource files. You are free to use your own files, of course, but we also are making the files we use available.

#  The easiest way to get the data:
cd /tmp
wget ftp://share.sph.umich.edu/gotcloud/hs37-db132.tar.gz
#  Another way:
cd /tmp
ftp share.sph.umich.edu
Connected to share.sph.umich.edu.
220 (vsFTPd 2.3.5)
Name (share.sph.umich.edu:tpg): anonymous
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> prompt
Interactive mode off.
ftp> cd gotcloud
250 Directory successfully changed.
ftp> mget hs37-db132.tar.gz
ftp> quit
221 Goodbye.

Install the Resource Files

Choose a destination for these files and install them as shown below (we'll assume you will use /usr/local/gotcloud.ref).

mkdir -p /usr/local/gotcloud.ref    # Where you want the files installed
cd /usr/local/gotcloud.ref
tar xzvf hs37-db132.tar.gz
 ref/
 ref/hs37d5.fa.fai
 ref/metabochip.batch2.broken.b37.chr2.plink.MAF01.bed
 ref/hs37d5-bs.umfa
 ref/metabochip.batch2.broken.b37.chr2.plink.MAF01.fam
 ref/dbsnp_132.b37.vcf.gz.tbi
 ref/dbsnp_132.UCSC.coordinates.tbl
   [lines deleted]
rm -f hs37-db132.tar.gz

Note this path as you will need to set the variable REF_DIR in the configuration file or options gen_biopipeline.pl and umake.pl.


Using Your own Reference Files

Human Reference

Generating BWA Reference Files

Use "bwa index" to generate the human reference files with the required extensions:

  • .amb
  • .ann
  • .bwt
  • .fai
  • .pac
  • .rbwt
  • .rpac
  • .rsa
  • .sa

See http://bio-bwa.sourceforge.net/bwa.shtml for more information about using "bwa index".

Generating GCContent File

To generate the GC content file, run qplot:

GOTCLOUD_DIR/bin/qplot --reference reference.fa --winsize windowSize --create_gc reference.gc
  • Replace reference.fa with the name of your human reference fasta file.
  • Replace windowSize with your desired window size, or leave out --winsize to use the default.

http://genome.sph.umich.edu/wiki/QPLOT#Input_files