Changes

From Genome Analysis Wiki
Jump to navigationJump to search
186 bytes added ,  16:19, 2 December 2009
m
no edit summary
Line 24: Line 24:  
= Build Binary Reference Genome and Word Index =
 
= Build Binary Reference Genome and Word Index =
   −
First, build a binary version of the genome reference sequence as nucleotides (option: --createReference).  Suppose that NCBI36.fa is a FASTA file which contains nucleotide sequences for all chromosomes.
+
First, build a binary version of the genome reference sequence as nucleotides (option: --createReference).  Suppose that NCBI36.fa is a FASTA file which contains the nucleotide sequences for all chromosomes.
    
The command to invoke is:
 
The command to invoke is:
Line 30: Line 30:  
   karma --createReference --reference NCBI36.fa
 
   karma --createReference --reference NCBI36.fa
 
<br>
 
<br>
(To let KARMA map nucleotide space reads, one would use instead ''--createIndex''&nbsp; to create both a binary sequence and the word index files.)<br>
+
(To let KARMA map nucleotide space reads, one would use instead ''--createIndex''&nbsp; to create both a packed binary sequence file and the word index files.)<br>
    
Second, one also needs to build color space versions of both the genome reference sequence (option: --createReference) and the word index files (option: --createIndex).&nbsp;  The same nucleotide FASTA file is used.&nbsp;  However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity.&nbsp;  The command to invoke is:<br>
 
Second, one also needs to build color space versions of both the genome reference sequence (option: --createReference) and the word index files (option: --createIndex).&nbsp;  The same nucleotide FASTA file is used.&nbsp;  However, to avoid naming conflicts among the resulting binary files, we suggest appending "CS" to the base file name for clarity.&nbsp;  The command to invoke is:<br>
Line 43: Line 43:  
= Map Color Space Reads =
 
= Map Color Space Reads =
   −
KARMA expects valid color space FASTQ files as input.&nbsp; We often use the suffix .csfastq to distinguish these from nucleotide space reads.&nbsp; For a .csfastq&nbsp; file of single end color space reads named &nbsp; single.csfastq, &nbsp; invoke the command:<br>
+
KARMA expects valid color space FASTQ files as input.&nbsp; We often use the suffix .csfastq to distinguish these from nucleotide space reads.&nbsp; With a .csfastq &nbsp; file of single end color space reads named &nbsp; single.csfastq, &nbsp; invoke the command:<br>
    
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace single.csfastq
 
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace single.csfastq
   −
This command line specifies both the nucleotide and color space reference sequences (and the word indexes, invisibly).&nbsp; The output will be written to a file in .sam&nbsp; format named&nbsp; "single.sam".<br>
+
This command line specifies both the nucleotide and color space reference sequences (and the word indexes, invisibly).&nbsp; The output will be written to a file in .sam format named &nbsp; "single.sam"&nbsp; derived from the .fastq&nbsp; file name.<br>
 
&nbsp;<br>
 
&nbsp;<br>
   −
Multiple input files are also acceptable, e.g.<br>
+
Multiple input files are also acceptable and will produce multiple .sam output files, e.g.<br>
    
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace \
 
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace \
Line 60: Line 60:  
   --pairedReads  pair.1.csfastq  pair.2.csfastq
 
   --pairedReads  pair.1.csfastq  pair.2.csfastq
   −
The mapping results will be stored in a .sam&nbsp; file named&nbsp; "pair.1.sam", which contains reads from both files.&nbsp; If multiple paired end read files are specified on the command line, KARMA will pair the 1st and 2nd files, 3rd and 4th files and etc.<br>
+
The mapping results will be stored in a .sam&nbsp; file named&nbsp; "pair.1.sam", which contains reads from both files.&nbsp; If multiple paired end read files are specified on the command line, KARMA will pair the 1st and 2nd files, 3rd and 4th files, etc. and write output files&nbsp; "pair.1.sam", "pair.3.sam", etc.<br>
    
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace \
 
   karma --reference NCBI36.fa --csReference NCBI36CS.fa --colorSpace \
Line 78: Line 78:  
== Minimum read length requirement ==
 
== Minimum read length requirement ==
   −
Keep in mind that KARMA requires color space reads that are at least twice as long as the index word size plus two (including the leading primer base).&nbsp; (For nucleotide space, the minimum read length is twice the word size.)&nbsp; For example, KARMA uses an index word size of 15 by default, so it will only map color space reads that are 32 base pairs or longer.<br>
+
Keep in mind that KARMA requires color space reads that are at least twice as long as the index word size plus two (including the leading primer base).&nbsp; (For nucleotide space, the minimum read length is twice the word size.)&nbsp; For example, KARMA uses an index word size of 15 by default, so it will only map color space reads that are 32 colors or longer (including the primer base).<br>
    
== Auxiliary tools ==
 
== Auxiliary tools ==
29

edits

Navigation menu