Difference between revisions of "BamUtil"

From Genome Analysis Wiki
Jump to navigationJump to search
(Split into mulitple pages and add missing tools)
(Reorganize the tools)
Line 62: Line 62:
  
 
The bam executable has the following functions.
 
The bam executable has the following functions.
* [[BamUtil: validate|validate|validate - Read and Validate a SAM/BAM file]]
+
 
* [[BamUtil: convert|convert - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)]]
+
 
* [[BamUtil: dumpHeader|dumpHeader - Print SAM/BAM header]]
+
* Rewrite SAM/BAM Files
* [[BamUtil: splitChromosome|splitChromosome - Split BAM by Chromosome]]
+
** [[BamUtil: convert|convert - Read a SAM/BAM file and write as a SAM/BAM file (optionally converts between '=' & bases in the sequence)]]
* [[BamUtil: writeRegion|writeRegion - Write the alignments in the indexed BAM file that fall into the specified region]]
+
** [[BamUtil: splitChromosome|splitChromosome - Split BAM by Chromosome]]
* [[BamUtil: dumpRefInfo|dumpRefInfo - Print SAM/BAM Reference Information]]
+
** [[BamUtil: writeRegion|writeRegion - Write the alignments in the indexed BAM file that fall into the specified region and/or have the specified read name]]
* [[BamUtil: dumpIndex|dumpIndex - Dump a BAM index file into an easy to read text version]]
+
** [[BamUtil: filter|filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high]]
* [[BamUtil: readIndexedBam|readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file]]
+
** [[BamUtil: revert|revert - Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags]]
* [[BamUtil: filter|filter - Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high]]
+
** [[BamUtil: squeeze|squeeze - reduces files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers]]
* [[BamUtil: readReference|readReference - Print the reference string for the specified region]]
+
** [[BamUtil: findCigars|findCigars - Output just the reads that contain any of the specified CIGAR operations]]
* [[BamUtil: diff|diff - Print the diffs between 2 bams]]
+
** [[BamUtil: readIndexedBam|readIndexedBam - Read an indexed BAM file reference by reference id -1 to the max reference id and write it out as a SAM/BAM file]]
* [[BamUtil: stats|stats - Print the diffs between 2 bams]]
+
 
* [[BamUtil: revert|revert - Revert SAM/BAM replacing the specified fields with their previous values (if known).]]
+
* Informational Tools
* [[BamUtil: squeeze|squeeze - reduces files size by dropping OQ fields, duplicates, specified tags, using '=' when a base matches the reference, binning quality scores.]]
+
** [[BamUtil: validate|validate|validate - Read and Validate a SAM/BAM file]]
* [[BamUtil: findCigars|findCigars - Output just the reads that contain any of the specified CIGAR operations.]]
+
** [[BamUtil: diff|diff - Print the diffs between 2 bams]]
 +
** [[BamUtil: stats|stats - Print the diffs between 2 bams]]
 +
 
 +
* Print Information in Readable Form:
 +
** [[BamUtil: dumpHeader|dumpHeader - Print SAM/BAM header]]
 +
** [[BamUtil: dumpRefInfo|dumpRefInfo - Print SAM/BAM Reference Information]]
 +
** [[BamUtil: dumpIndex|dumpIndex - Dump a BAM index file into an easy to read text version]]
 +
** [[BamUtil: readReference|readReference - Print the reference string for the specified region]]
 +
 
  
 
This executable is built using [[C++ Library: libStatGen]].
 
This executable is built using [[C++ Library: libStatGen]].
  
 
Just running ./bam will print the Usage information for the bam executable.
 
Just running ./bam will print the Usage information for the bam executable.

Revision as of 14:58, 2 September 2011


bamUtil Overview

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.


Where to Find It

The bamUtil repository is available both via release downloads (coming soon) and via github.

On github, you can both browse and download the latest version of the repository as well as explore the history of changes.

You can access the latest version with or without git.

If you download from github or use git to keep up to date, you also need to download our library: libStatGen.

The releases will be available both with and without libStatGen included. If you download the verison without libStatGen included, you will also need to download libStatGen separately. (It will be available without libStatGen in case you already have a downloaded version of libStatGen that you want to use.

Releases

Release downloads are Coming Soon.


Using github

Using Git To Track the Current Development Version

Clone (get your own copy)

You can create your own git clone (copy) using:

git clone https://github.com/statgen/bamUtil.git

or

git clone git://github.com/statgen/bamUtil.git

Either of these commands create a directory called bamUtil in the current directory.

Then just cd bamUtil and compile.

Get the latest Updates (update your copy)

To update your copy to the latest version (a major advantage of using git):

  1. cd pathToYourCopy/bamUtil
  2. make clean
  3. git pull
  4. make all

Git Refresher

If you decide to use git, but need a refresher, see How To Use Git or Notes on how to use git (if you have access)


Downloading From GitHub Without Git

Periodically download the latest copy from github from the "Downloads" link on the webpage: https://github.com/statgen/bamUtil/archives/master.

The downloaded tar file is named "statgen-bamUtil-someHexNumber.tar.gz". The directory created when it is untared shares the same base name. I recommend that you do not change the name of the directory. If you want one called bamUtil, create a link to this directory. The hex number in the directory name identifies the version of the repository that you downloaded and is necessary to easily troubleshoot any issues you encounter. If you must rename the directory, be sure to record the hex number that was on the download for future reference.

Building

After obtaining the bamUtil repository (either by download or from github), compile the code using make all. This creates the executable, bam, in the bamUtil/bin/ directory, the debug executable in the bamUtil/bin/debug/ directory, and the profiling executable in the bamUtil/bin/profile/ directory.


Programs

The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.

The bam executable has the following functions.



This executable is built using C++ Library: libStatGen.

Just running ./bam will print the Usage information for the bam executable.