Changes

Karma-colorspace (view source)

Revision as of 06:59, 15 November 2009

150 bytes added , 06:59, 15 November 2009

no edit summary

Line 6: Line 6:

*  Binary reference genome in nucleotide space (see [[#Input file requirement|Input file requirement]]})

−

*  Binary reference genome and word index in color space (see ~~\ref{sec:2}~~)

+

*  Binary reference genome and word index in color space (see [[#Build Binary Reference Genome and Word Index|Build Binary Reference Genome and Word Index]])

*  Color space reads in valid color space FASTQ format (see [[#Input file requirement|Input file requirement]] for file specification)

*  Color space reads are longer than minimum length requirement. (see [[#Minimum read length requirement|Minimum read length requirement]])

−

*  Specify color space parameter when starting KARMA (see ~~\ref{sec:3}~~)

+

*  Specify color space parameter when starting KARMA (see [[#Map Color Space Reads|Map Color Space Reads]])

Please note the hardwares requirment for KARMA are:

Line 16: Line 16:

*30G disk space

−

We listed a complete example reviewing the whole procedure from building word index to mapping color space reads in ~~\ref{sec:5}~~.

+

We listed a complete example reviewing the whole procedure from building word index to mapping color space reads in [[#A Complete Example|A Complete Example]].

= Build Binary Reference Genome and Word Index =

−

First, we need to build binary reference genome (option: --createReference)   \footnote{To let KARMA map nucleotide space reads, you need to use ``--createIndex''to create the word index file.}''

+

First, we need to build binary reference genome (option: --createReference)

+

\footnote{To let KARMA map nucleotide space reads, you need to use ``--createIndex''to create the word index file.}''

in nucleotide space. Assume NCBI36.fa is a FASTA file contains sequences of all chromosomes.   The command to invoke is:

Line 35: Line 36:

karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa

−

An important parameter is the size of words for indexing.   We recommand 15 (default value) for human reference genome.   Specifiy ``--wordSize N`` if you like to use $N$ as word size.   Typically you will observe performance change (see [[#Choose an appropriate size for word index|Choose an appropriate size for word index]] for more discussion).       Note, multiple chromosomes are supported.   In current version, KARMA can take one FASTA file which contains sequences of all chromosomes.

+

An important parameter is the size of words for indexing.   We recommand 15 (default value) for human reference genome.   Specifiy ``--wordSize N`` if you like to use ''N'' as word size.   Typically you will observe performance change (see [[#Choose an appropriate size for word index|Choose an appropriate size for word index]] for more discussion).       Note, multiple chromosomes are supported.   In current version, KARMA can take one FASTA file which contains sequences of all chromosomes.

−

= Map Color Space Reads =

+

=Map Color Space Reads=

KARMA takes valid color space FASTQ files inputs.   We usually use suffix .csfastq to distinguish it from nucleotide space reads.   For single end color space read, we can invoke command:

Line 88: Line 89:

Size for word index is sensitive to mapping performance. A small size of word index will increase the number of calculation cycles for a single read and duplications of a single word. On the other side, a big size will require much larger memory. Please also keep in mind that appropriate size is related to your hardware architecture. For practically purpose, we found size of 15 is optimal.

−

= A Complete Example =

+

=A Complete Example=

Zhanxw

255

edits

Changes

Karma-colorspace (view source)

Revision as of 06:59, 15 November 2009

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools