Changes

Karma-colorspace (view source)

Revision as of 21:05, 19 November 2009

70 bytes added , 21:05, 19 November 2009

m

→‎Build Binary Reference Genome and Word Index

Line 20: Line 20:

= Build Binary Reference Genome and Word Index =

−

~~ ~~ First, ~~we need to~~ build binary reference ~~genome~~ (option: --createReference)~~   (To let KARMA map nucleotide space reads, you need to use ''--createIndex'' to create the word index file~~.~~) ~~

+

First, build a binary version of the genome reference sequence as nucleotides (option: --createReference). Suppose that NCBI36.fa is a FASTA file which contains the nucleotide sequences for all chromosomes.

−

+

The command to invoke is:

−

~~  in nucleotide space. Assume~~ NCBI36.fa is a FASTA file contains sequences of all chromosomes. ~~ ~~ The command to invoke is:

karma --createReference --reference NCBI36.fa

−

+

(To let KARMA map nucleotide space reads, one would use instead ''--createIndex'' to create both a binary sequence and the word index files.)

−

~~ ~~ Second, we need to build binary reference ~~genome~~ (option: --createReference) and word index (option: --createIndex)~~  ~~ in color space. The same FASTA file is needed. However, to avoid naming conflicts, we suggest ~~using word~~ "CS" ~~   appending~~ to the base file name for clarity. The command to invoke is:

+

Second, we also need to build a binary version of the genome reference sequence (option: --createReference) and the word index files (option: --createIndex) in color space. The same nucleotide FASTA file is needed. However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity. The command to invoke is:

ln -s NCBI36.fa NCBI36CS.fa

karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa

−

+

−

~~ ~~ An important parameter is the ~~size of words~~ for indexing.~~  ~~ We ~~recommand~~ 15 (default value) for human ~~reference~~ genome.~~   Specifiy ``--wordSize N`` if~~ you ~~like~~ to ~~use ''N'' as word size~~.~~   Typically you will observe performance change (see~~ [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion).~~       Note, multiple chromosomes are supported.   In current version, KARMA can take one FASTA file which contains sequences of all chromosomes~~.

+

An important parameter is the word length for indexing. We recommend N = 15 (the default value) for the human genome on a machine with at least 20 Gb of RAM. Shorter words will decrease the memory footprint at the cost of increased run time. However, the word length must not be longer than half the length of the color space reads you intend to map, minus 1. See [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion. Specify ``--wordSize N`` in order to use ''N'' as the word size.

Tblackw

29

edits

Changes

Karma-colorspace (view source)

Revision as of 21:05, 19 November 2009

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools