Changes

From Genome Analysis Wiki
Jump to navigationJump to search
32 bytes added ,  07:01, 15 November 2009
no edit summary
Line 5: Line 5:  
We summarize software requirments as following:
 
We summarize software requirments as following:
   −
*  Binary reference genome in nucleotide space (see [[#Input file requirement|Input file requirement]]})  
+
*  Binary reference genome in nucleotide space (see [[#Input_file_requirement|Input file requirement]]})  
*  Binary reference genome and word index in color space (see [[#Build Binary Reference Genome and Word Index|Build Binary Reference Genome and Word Index]])  
+
*  Binary reference genome and word index in color space (see [[#Build_Binary_Reference_Genome_and_Word_Index|Build Binary Reference Genome and Word Index]])  
*  Color space reads in valid color space FASTQ format (see [[#Input file requirement|Input file requirement]] for file specification)  
+
*  Color space reads in valid color space FASTQ format (see [[#Input_file_requirement|Input file requirement]] for file specification)  
*  Color space reads are longer than minimum length requirement. (see [[#Minimum read length requirement|Minimum read length requirement]])  
+
*  Color space reads are longer than minimum length requirement. (see [[#Minimum_read_length_requirement|Minimum read length requirement]])  
*&nbsp; Specify color space parameter when starting KARMA (see [[#Map Color Space Reads|Map Color Space Reads]])<br>  
+
*&nbsp; Specify color space parameter when starting KARMA (see [[#Map_Color_Space_Reads|Map Color Space Reads]])<br>  
    
Please note the hardwares requirment for KARMA are:
 
Please note the hardwares requirment for KARMA are:
Line 16: Line 16:  
*30G disk space  
 
*30G disk space  
   −
<br> We listed a complete example reviewing the whole procedure from building word index to mapping color space reads in [[#A Complete Example|A Complete Example]].
+
<br> We listed a complete example reviewing the whole procedure from building word index to mapping color space reads in [[#A_Complete_Example|A Complete Example]].
    
= Build Binary Reference Genome and Word Index<br> =
 
= Build Binary Reference Genome and Word Index<br> =
   −
&nbsp; First, we need to build binary reference genome (option: --createReference)<br> &nbsp;  
+
&nbsp; First, we need to build binary reference genome (option: --createReference)<br> &nbsp; &lt;ref&gt;To let KARMA map nucleotide space reads, you need to use ``--createIndex''to create the word index file.&lt;/ref&gt;''<br>
\footnote{To let KARMA map nucleotide space reads, you need to use ``--createIndex''to create the word index file.}''<br>
      
&nbsp; in nucleotide space. Assume NCBI36.fa is a FASTA file contains sequences of all chromosomes.<br> &nbsp; The command to invoke is:<br>
 
&nbsp; in nucleotide space. Assume NCBI36.fa is a FASTA file contains sequences of all chromosomes.<br> &nbsp; The command to invoke is:<br>
Line 36: Line 35:  
   karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa
 
   karma --colorSpace --createReference --createIndex --reference NCBI36CS.fa
   −
&nbsp; An important parameter is the size of words for indexing.<br> &nbsp; We recommand 15 (default value) for human reference genome.<br> &nbsp; Specifiy ``--wordSize N`` if you like to use ''N'' as word size.<br> &nbsp; Typically you will observe performance change (see [[#Choose an appropriate size for word index|Choose an appropriate size for word index]] for more discussion).<br> &nbsp;<br> &nbsp;<br> &nbsp; Note, multiple chromosomes are supported.<br> &nbsp; In current version, KARMA can take one FASTA file which contains sequences of all chromosomes.<br>
+
&nbsp; An important parameter is the size of words for indexing.<br> &nbsp; We recommand 15 (default value) for human reference genome.<br> &nbsp; Specifiy ``--wordSize N`` if you like to use ''N'' as word size.<br> &nbsp; Typically you will observe performance change (see [[#Choose_an_appropriate_size_for_word_index|Choose an appropriate size for word index]] for more discussion).<br> &nbsp;<br> &nbsp;<br> &nbsp; Note, multiple chromosomes are supported.<br> &nbsp; In current version, KARMA can take one FASTA file which contains sequences of all chromosomes.<br>
    
<br>
 
<br>
   −
=Map Color Space Reads=
+
= Map Color Space Reads =
    
&nbsp; KARMA takes valid color space FASTQ files inputs.<br> &nbsp; We usually use suffix .csfastq to distinguish it from nucleotide space reads.<br> &nbsp; For single end color space read, we can invoke command:<br>
 
&nbsp; KARMA takes valid color space FASTQ files inputs.<br> &nbsp; We usually use suffix .csfastq to distinguish it from nucleotide space reads.<br> &nbsp; For single end color space read, we can invoke command:<br>
Line 65: Line 64:  
= <br> Additional Information<br> =
 
= <br> Additional Information<br> =
   −
==Input file requirement==
+
== Input file requirement ==
    
&nbsp; KARMA require input files in valid color space FASTQ format.<br> &nbsp; We require the length of reads(including leading primer) should equal to the length of its quality string.<br> &nbsp;<br> &nbsp; A valid example of color space FASTQ file:<br>
 
&nbsp; KARMA require input files in valid color space FASTQ format.<br> &nbsp; We require the length of reads(including leading primer) should equal to the length of its quality string.<br> &nbsp;<br> &nbsp; A valid example of color space FASTQ file:<br>
 
+
<pre>@Chromosome_20_048435095_Genome_2757096147
@Chromosome_20_048435095_Genome_2757096147
      
A02232200222021320012102212311002212
 
A02232200222021320012102212311002212
Line 75: Line 73:  
+
 
+
   −
!!1111111111111111111111111111111111
+
!!1111111111111111111111111111111111</pre>
 
+
== Minimum read length requirement ==
==Minimum read length requirement==
      
&nbsp; Keep in mind that the requirement of minimum color space read length for KARMA is<br> &nbsp; twice the size of word plus two (including leading primer) \footnote{For nucleotide space,<br> &nbsp; the minimum length requirement is twice the word size.}.<br> &nbsp; For example, KARMA use word size of 15 by default, so it will try to map color space<br> &nbsp; reads that are longer than 32 base pairs.<br>
 
&nbsp; Keep in mind that the requirement of minimum color space read length for KARMA is<br> &nbsp; twice the size of word plus two (including leading primer) \footnote{For nucleotide space,<br> &nbsp; the minimum length requirement is twice the word size.}.<br> &nbsp; For example, KARMA use word size of 15 by default, so it will try to map color space<br> &nbsp; reads that are longer than 32 base pairs.<br>
   −
==Auxiliary tools==
+
== Auxiliary tools ==
    
&nbsp; ABI SOLiD platform generated FASTA file (e.g. XXX.csfasta) and quality file (e.g. XXX\_QV.qual) separately. We wrote a script, ''solid2csfastq.py'', to convert it to color space FASTQ file(e.g. XXX.csfastq). We believe a single color space FASTQ file will simplify post processing.<br>
 
&nbsp; ABI SOLiD platform generated FASTA file (e.g. XXX.csfasta) and quality file (e.g. XXX\_QV.qual) separately. We wrote a script, ''solid2csfastq.py'', to convert it to color space FASTQ file(e.g. XXX.csfastq). We believe a single color space FASTQ file will simplify post processing.<br>
   −
==Choose an appropriate size for word index==
+
== Choose an appropriate size for word index ==
    
Size for word index is sensitive to mapping performance. A small size of word index will increase the number of calculation cycles for a single read and duplications of a single word. On the other side, a big size will require much larger memory. Please also keep in mind that appropriate size is related to your hardware architecture. For practically purpose, we found size of 15 is optimal.
 
Size for word index is sensitive to mapping performance. A small size of word index will increase the number of calculation cycles for a single read and duplications of a single word. On the other side, a big size will require much larger memory. Please also keep in mind that appropriate size is related to your hardware architecture. For practically purpose, we found size of 15 is optimal.
   −
=A Complete Example=
+
= A Complete Example =
    
<br>
 
<br>
255

edits

Navigation menu