Changes

Karma-colorspace (view source)

Revision as of 21:14, 19 November 2009

82 bytes added , 21:14, 19 November 2009

m

→‎Build Binary Reference Genome and Word Index

Line 20: Line 20:

= Build Binary Reference Genome and Word Index =

−

First, build a binary version of the genome reference sequence as nucleotides (option: --createReference). Suppose that NCBI36.fa is a FASTA file which contains ~~the~~ nucleotide sequences for all chromosomes.

+

First, build a binary version of the genome reference sequence as nucleotides (option: --createReference).  Suppose that   NCBI36.fa   is a FASTA file which contains nucleotide sequences for all chromosomes.

The command to invoke is:

Line 27: Line 27:

(To let KARMA map nucleotide space reads, one would use instead ''--createIndex'' to create both a binary sequence and the word index files.)

−

Second, we also need to build ~~a binary version~~ of the genome reference sequence (option: --createReference) and the word index files (option: --createIndex) ~~in color space~~. The same nucleotide FASTA file is needed. However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity. The command to invoke is:

+

Second, we also need to build color space versions of both the genome reference sequence (option: --createReference) and the word index files (option: --createIndex).  The same nucleotide FASTA file is needed.  However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity.  The command to invoke is:

ln -s NCBI36.fa NCBI36CS.fa

karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa

−

~~An important parameter is~~ the word length for indexing. We recommend N = 15 (the default value) for the human genome on a machine with at least 20 Gb of RAM. Shorter words will decrease the memory footprint at the cost of increased run time. However, the word length must not be longer than half the length of the color space reads you intend to map, minus 1. See [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion. Specify ``--wordSize N`` in order to use ''N'' as the word size.

+

When building the index files one can set the word length for indexing.  We recommend N = 15 (the default value) for the human genome on a machine with at least 20 Gb of RAM.  Shorter index words will decrease the memory footprint at the cost of increased run time.  However, the word length must not be longer than half the length of the color space reads you intend to map, minus 1.  (See [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion.)  Specify ``--wordSize N`` in order to use ''N'' as the word size.

Tblackw

29

edits

Changes

Karma-colorspace (view source)

Revision as of 21:14, 19 November 2009

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools