Changes

From Genome Analysis Wiki
Jump to navigationJump to search
82 bytes added ,  21:14, 19 November 2009
m
Line 20: Line 20:  
= Build Binary Reference Genome and Word Index<br> =
 
= Build Binary Reference Genome and Word Index<br> =
   −
First, build a binary version of the genome reference sequence as nucleotides (option: --createReference).  Suppose that NCBI36.fa is a FASTA file which contains the nucleotide sequences for all chromosomes.<br>
+
First, build a binary version of the genome reference sequence as nucleotides (option: --createReference).&nbsp; Suppose that &nbsp; NCBI36.fa &nbsp; is a FASTA file which contains nucleotide sequences for all chromosomes.<br>
 
The command to invoke is:<br>
 
The command to invoke is:<br>
   Line 27: Line 27:  
(To let KARMA map nucleotide space reads, one would use instead ''--createIndex'' to create both a binary sequence and the word index files.)<br>
 
(To let KARMA map nucleotide space reads, one would use instead ''--createIndex'' to create both a binary sequence and the word index files.)<br>
   −
Second, we also need to build a binary version of the genome reference sequence (option: --createReference) and the word index files (option: --createIndex) in color space.  The same nucleotide FASTA file is needed.  However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity.  The command to invoke is:<br>
+
Second, we also need to build color space versions of both the genome reference sequence (option: --createReference) and the word index files (option: --createIndex).&nbsp; The same nucleotide FASTA file is needed.&nbsp; However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity.&nbsp; The command to invoke is:<br>
    
   ln -s NCBI36.fa NCBI36CS.fa
 
   ln -s NCBI36.fa NCBI36CS.fa
 
   karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa
 
   karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa
 
<br>
 
<br>
An important parameter is the word length for indexing.  We recommend N = 15 (the default value) for the human genome on a machine with at least 20 Gb of RAM.  Shorter words will decrease the memory footprint at the cost of increased run time.  However, the word length must not be longer than half the length of the color space reads you intend to map, minus 1.  See [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion.  Specify ``--wordSize N`` in order to use ''N'' as the word size.<br>
+
When building the index files one can set the word length for indexing.&nbsp; We recommend N = 15 (the default value) for the human genome on a machine with at least 20 Gb of RAM.&nbsp; Shorter index words will decrease the memory footprint at the cost of increased run time.&nbsp; However, the word length must not be longer than half the length of the color space reads you intend to map, minus 1.&nbsp; (See [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion.)&nbsp; Specify ``--wordSize N`` in order to use ''N'' as the word size.<br>
    
<br>
 
<br>
29

edits

Navigation menu