K-tuple Alignment with Rapid Matching Algorithm
Karma uses an existing reference to align short reads, such as generated by Illumina sequencers.
The current version, 0.9.0, is optimized to rapidly map base space reads from Illumina sequencers.
Color space and LS454 sequence alignments are not working. These features will return in Karma 0.9.1.
- 1 Download Karma
- 2 Build Karma
- 3 Normal Workflow
- 4 Build reference index and hash
- 5 Aligning Reads
- 6 Align Illumina Reads
- 7 Align ABI SOLiD Reads
- 8 Other useful links
To get a copy go to Karma Download
Testing the build
To test karma, go to the subdirectory named karma, and type the command:
The test script builds a reference for the small phiX genome, then runs single end as well as paired end alignments. It compares the results of that with known results. Differences are printed to the console, and currently look something like this:
diff phiX.sam.good phiX.sam 3c3 < @RG DT:2010-04-08T17:29Z ID:boingboing SM:NA12345 --- > @RG DT:2010-04-08T18:13Z ID:boingboing SM:NA12345
Any differences greater than that are an error and need to be fixed by the author.
Karma works using a set of index and hash files created from an existing reference. Once created, this set of reference index and hash files must always be specified in the command line when aligning reads.
In concept, the simplest workflow is to first create a reference index using karma create, then align reads using karma map. You only have to build the index and hash once.
Because the reference can be large, and because Karma will share the reference among many running instances of Karma, it is useful to put well known references in a common location readily accessible to you and your collaborators.
Build reference index and hash
Building a reference index and hash with Karma is straightforward, but because it is time consuming for longer genomes, you typically save the reference index between runs.
The simplest example for creating a reference and index using a wordsize of 11-mer words is:
karma create -i -w 11 phiX.fa
More generally, three primary parameters are necessary for building a Karma reference index:
- a boolean flag indicating base or color space
- the index table word occurrence cutoff value
- the word size
Although the input reference is always expected to be base space and in FASTA format, the binary version of the reference, and the corresponding index and hash files, can be in either color space (ABI SOLiD) or base space (Illumina or LS454). For a given reference FASTA file, you may have either a color or base space binary reference, as well as either color or base space index/hash files.
Because the index and hash files are dependent on the occurrence cutoff parameter and the word size, the output files created by karma have those values in the file name. This allows you to create a variety of index/hash tables, depending on your expected use (ABI SOLiD, in particular, is sensitive to read length).
Options for building reference
-w word size Word size for index and hash (default 15, typically 10-16) -O occurrence cutoff Upper count of number of word positions to store in word positions table (default 5000) -c Creates a color space reference and index/hash -i Create the index and hash as well as the binary reference
Aligning reads to the reference is easy:
karma map -r phiX.fa -w 11 phiX.fastq
or for paired reads:
karma map -r phiX.fa -w 11 phiX-mate1.fastq phiX-mate2.fastq
In both of the above examples, the -r option names the reference originally used to build the index/hash, and the -w 11 specifies that we are using the index/hash built for 11-mer words. Although you can use the default word size of 15 for phiX, the index is 4^15 * 4 = 4GBytes, so a shorter word size is prudent.
Aligning Reads (Illumina)
Karma is set up so that the default options work well for mapping Illumina reads to the Human genome.
Aligning Reads (ABI SOLiD)
Karma has been designed to align color space reads. However, in Karma 0.9.0, this functionality is not working.
Aligning Reads (LS 454)
Karma has been designed to align LS 454 reads. However, in Karma 0.9.0, this functionality is not working.
Karma expects the sub command to be the first argument on the command line. Currently, this includes: map, create, header, check and test.
To align reads, you first create an index:
karma create [options...] somereference.fa
A simple example is:
karma create -i phiX.fa
To actually align reads, use the map command:
karma map [options...] mate1.fastq.gz [mate2.fastq.gz]
A simple example is:
karma map -r phiX.fa -o phiX.sam mate1.fastq.gz mate2.fastq.gz
To facilitate SAM RG values being set automatically in a production environment, we keep a header in the reference. The header can be viewed and edited using the header subcommand:
karma header -r phiX.fa
Due to the size and complexity of Karma input, output and index files, various checks and tests are useful, so we include some diagnostics capabilities:
Tests for external files:
karma check [options...] file.bam file.fastq file.sam file.fa file.umfa
Tests internal to Karma:
karma test [options...] -d -> debug -s [int] -> set random number seed 
Upon successfully building references, you will obtain a list of reference files like below:
Word Hash (Left)
Word Hash (Right)
Align Illumina Reads
karma map -r reference.fa -o output.sam read1.fastq read2.fastq
Align ABI SOLiD Reads
karma map -r reference.fa -c -o output.sam read1.fastq read2.fastq