Changes

From Genome Analysis Wiki
Jump to navigationJump to search
169 bytes removed ,  14:36, 8 April 2010
add mapping examples
Line 59: Line 59:  
# the word size
 
# the word size
   −
Although the input reference is always expected to be base space and in FASTA format, the binary version of the reference, and the corresponding index and hash files, can be in either color space (ABI SOLiD) or base space (Illumina or LS454).  For a given reference FASTA file, you may have either a color or base space binary reference, as well as either color or base space index/hash files.
+
Although the input reference is always expected to be base space and in [http://en.wikipedia.org/wiki/FASTA_format FASTA] format, the binary version of the reference, and the corresponding index and hash files, can be in either color space (ABI SOLiD) or base space (Illumina or LS454).  For a given reference [http://en.wikipedia.org/wiki/FASTA_format FASTA] file, you may have either a color or base space binary reference, as well as either color or base space index/hash files, any in varying word sizes or occurrence cutoffs.
    
Because the index and hash files are dependent on the occurrence cutoff parameter and the word size, the output files created by karma have those values in the file name.  This allows you to create a variety of index/hash tables, depending on your expected use (ABI SOLiD, in particular, is sensitive to read length).
 
Because the index and hash files are dependent on the occurrence cutoff parameter and the word size, the output files created by karma have those values in the file name.  This allows you to create a variety of index/hash tables, depending on your expected use (ABI SOLiD, in particular, is sensitive to read length).
   −
== Options for building reference ==
+
== Options for building reference index and hash ==
    +
-r ''reference''          Reference file in [http://en.wikipedia.org/wiki/FASTA_format FASTA] format
 
  -w ''word size''          Word size for index and hash (default 15, typically 10-16)
 
  -w ''word size''          Word size for index and hash (default 15, typically 10-16)
 
  -O ''occurrence cutoff''  Upper count of number of word positions to store in word positions table (default 5000)
 
  -O ''occurrence cutoff''  Upper count of number of word positions to store in word positions table (default 5000)
Line 82: Line 83:     
In both of the above examples, the -r option names the reference originally used to build the index/hash, and the -w 11 specifies that we are using the index/hash built for 11-mer words.  Although you can use the default word size of 15 for phiX, the index is 4^15 * 4 = 4GBytes, so a shorter word size is prudent.
 
In both of the above examples, the -r option names the reference originally used to build the index/hash, and the -w 11 specifies that we are using the index/hash built for 11-mer words.  Although you can use the default word size of 15 for phiX, the index is 4^15 * 4 = 4GBytes, so a shorter word size is prudent.
 +
 +
Since Karma uses the word size and occurrence cutoff to help construct the actual index and hash filenames, you must specify them the same way you did when you created the reference index and hash.
    
== Aligning Reads (Illumina) ==
 
== Aligning Reads (Illumina) ==
Line 95: Line 98:  
Karma has been designed to align LS 454 reads.  However, in Karma 0.9.0, this functionality is not working.
 
Karma has been designed to align LS 454 reads.  However, in Karma 0.9.0, this functionality is not working.
   −
== Options  ==
+
= Modifying the Reference Header =
   −
Command line
     −
Usage:
+
To facilitate SAM RG values being set automatically in a production environment, we keep a header in the binary version of the reference.  The header can be viewed and edited using the header subcommands here.
   −
Karma expects the sub command to be the first argument on the command line.  Currently, this includes: map, create, header, check and test.
+
To view the header:
   −
To align reads, you first create an index:
+
  karma header -r phiX.fa
  karma create [options...] somereference.fa
     −
A simple example is:
+
To view and edit the header:
karma create -i phiX.fa
     −
To actually align reads, use the map command:
+
  karma header -r phiX.fa -e
  karma map [options...] mate1.fastq.gz [mate2.fastq.gz]
     −
A simple example is:
+
= Other test and check capabilities =
karma map -r phiX.fa -o phiX.sam mate1.fastq.gz mate2.fastq.gz
  −
 
  −
To facilitate SAM RG values being set automatically in a production environment, we keep a header in the reference.  The header can be viewed and edited using the header subcommand:
  −
 
  −
karma header -r phiX.fa
      
Due to the size and complexity of Karma input, output and index files, various checks and tests are useful, so we include some diagnostics capabilities:
 
Due to the size and complexity of Karma input, output and index files, various checks and tests are useful, so we include some diagnostics capabilities:
Line 130: Line 124:  
  -s [int] -> set random number seed [12345]
 
  -s [int] -> set random number seed [12345]
   −
== File structure  ==
+
= Karma File structure  =
    
Upon successfully building references, you will obtain a list of reference files like below:  
 
Upon successfully building references, you will obtain a list of reference files like below:  
Line 185: Line 179:  
<br>  
 
<br>  
   −
= Align Illumina Reads =
+
= Karma TODO List =
 
  −
Command line:
     −
<pre>
+
= Karma CHANGELOG =
karma map -r reference.fa -o output.sam read1.fastq read2.fastq
  −
</pre>
  −
 
  −
= Align ABI SOLiD Reads =  
  −
 
  −
Command line:
  −
 
  −
<pre>
  −
karma map -r reference.fa -c -o output.sam read1.fastq read2.fastq
  −
</pre>
      
= Other useful links =
 
= Other useful links =
  −
[http://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_li.pdf Introduction of BWA usage]
      
[http://lh3lh3.users.sourceforge.net/bioinfo.shtml Heng Li's thoughts about aligner]  
 
[http://lh3lh3.users.sourceforge.net/bioinfo.shtml Heng Li's thoughts about aligner]  
    
[http://lh3lh3.users.sourceforge.net/udb.shtml Benchmark of Dictionary Structures]
 
[http://lh3lh3.users.sourceforge.net/udb.shtml Benchmark of Dictionary Structures]
75

edits

Navigation menu