BamUtil
bam Executable
When the pipeline is compiled, the SAM/BAM executable, "bam" is generated in the pipeline/bam/ directory.
The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
The bam executable has the following functions.
- validate - Read and Validate a SAM/BAM file
- convert - Read a SAM/BAM file and write as a SAM/BAM file
- dumpHeader - Print SAM/BAM header
- splitChromosome - Split BAM by Chromosome
- writeRegion - Write the alignments in the indexed BAM file that fall into the specified region
- dumpIndex - Dump a BAM index file into an easy to read text version
- readIndexedBam - Read an indexed BAM file reference by reference id -1 to 22 and write it out as a SAM/BAM file
This executable is built using the bam library.
Just running ./bam will print the Usage information for the bam executable.
validate
The validate
option on the bam executable reads and validates a SAM/BAM file.
The validation checks that the file is sorted as specified in the user options. Default is unsorted, in which case, no order validation is done.
SAM fields are validated against: SAM Validation Criteria
NOTE: Currently only minimal validation is currently done.
Parameters
Required Parameters: --in : the SAM/BAM file to be validated Optional Parameters: --noeof : do not expect an EOF block on a bam file. --so_flag : validate the file is sorted based on the header's @HD SO flag. --so_coord : validate the file is sorted based on the coordinate. --so_query : validate the file is sorted based on the query name. --quitAfterErrorNum : Number of records with errors/invalids to allow before quiting. -1 (default) indicates to not quit until the entire file is validated. 0 indicates not to read/validate anything. --maxReportedErrors : Maximum number of errors to print (defaults to 100) --disableStatistics : Turn off statistic generation
Usage
./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--quitAfterErrorNum <numErrors>] [--maxReportedErrors <numReportedErrors>] [--disableStatistics]
Return Value
- 0: all records are successfully read, are valid, and are properly sorted.
- non-0: at least one record was not successfully read, not valid, or not properly sorted.
Example Output
./bam validate --in t.sam --disableStatistics The following parameters are in effect: Input Parameters --in [t.sam], --noeof, --quitAfterErrorNum [-1], --maxReportedErrors [100], --disableStatistics [ON] SortOrder : --so_flag, --so_coord, --so_query Record 1 FAIL_PARSE: Too few columns in the Record Record 2 FAIL_PARSE: Too few columns in the Record Number of records read = 2 Number of valid records = 0 Returning: 5 (FAIL_PARSE)
Statistics Generated
The following statistics are generated when disableStatistics option is not used:
- TotalReads
- MappedReads
- PairedReads
- ProperPair
- DuplicateReads
- QCFailureReads
- MappingRate(%)
- PairedReads(%)
- ProperPair(%)
- DupRate(%)
- QCFailRate(%)
- TotalBases
- BasesInMappedReads
convert
The convert
option on the bam executable reads a SAM/BAM file and writes it as a SAM/BAM file.
The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
./bam --in <bamFile>.bam --out <newSamFile>.sam
Don't forget to put in the paths to the executable and your test files.
Parameters
Required Parameters: --in : the SAM/BAM file to be read --out : the SAM/BAM file to be written Optional Parameters: --noeof : do not expect an EOF block on a bam file.
Usage
./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--noeof]
Return Value
Returns the SamStatus for the reads/writes.
Example Output
Number of records read = 10 Number of records written = 10
dumpHeader
The dumpHeader
option on the bam executable prints the header of the specified SAM/BAM file to cout.
Parameters
Required Parameters: filename : the sam/bam filename whose header should be printed.
Usage
./bam dump_header <inputFile>
Return Value
- 0: the header was successfully read and printed.
- non-0: the header was not successfully read or was not printed. (Returns the SamStatus.)
Example Output
@SQ SN:1 LN:247249719 @SQ SN:2 LN:242951149 @SQ SN:3 LN:199501827
splitChromosome
The splitChromosome
option on the bam executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).
The files all have the same base name, but with an _# where # corresponds with the associated reference id from the BAM file.
Parameters
Required Parameters: --in : the SAM/BAM file to be split --out : the base filename for the SAM/BAM files to write into. Does not include the extension. _N will be appended to the basename where N indicates the Chromosome. Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --bamout : write the output files in BAM format (default). --samout : write the output files in SAM format.
Usage
./bam splitChromosome --in <inputFilename> --out <outputFileBaseName> [--bamIndex <bamIndexFile>] [--noeof] [--bamout|--samout]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
The following parameters are in effect: Input Parameters --in [test/testFiles/sortedBam.bam], --out [chromosome], --bamIndex [], --noeof Output Type : --bamout [ON], --samout Reference ID -1 has 2 records Reference ID 0 has 5 records Reference ID 1 has 2 records Reference ID 2 has 1 records Reference ID 3 has 0 records Reference ID 4 has 0 records Reference ID 5 has 0 records Reference ID 6 has 0 records Reference ID 7 has 0 records Reference ID 8 has 0 records Reference ID 9 has 0 records Reference ID 10 has 0 records Reference ID 11 has 0 records Reference ID 12 has 0 records Reference ID 13 has 0 records Reference ID 14 has 0 records Reference ID 15 has 0 records Reference ID 16 has 0 records Reference ID 17 has 0 records Reference ID 18 has 0 records Reference ID 19 has 0 records Reference ID 20 has 0 records Reference ID 21 has 0 records Reference ID 22 has 0 records Number of records = 10 Returning: 0 (SUCCESS)
writeRegion
The writeRegion
option on the bam executable writes the alignments in the indexed BAM file that fall into the specified region (reference id and start/end position).
Parameters
Required Parameters: --in : the BAM file to be read --out : the SAM/BAM file to write to Optional Parameters: --noeof : do not expect an EOF block on a bam file. --bamIndex : the path/name of the bam index file (if not specified, uses the --in value + ".bai") --refID : the BAM reference ID to read (defaults to -1: unmapped) --start : the 0-based start position (defaults to -1) --end : the 0-based end position (defaults to -1: meaning til the end of the reference)
Usage
./bam writeRegion --in <inputFilename> --out <outputFilename> [--bamIndex <bamIndexFile>] [--noeof]
Return Value
- 0: all records are successfully read and written.
- non-0: at least one record was not successfully read or written.
Example Output
The following parameters are in effect: Input Parameters --in [test/testFiles/sortedBam.bam], --out [t.sam], --bamIndex [], --refID, --start [1], --end [100], --noeof Wrote t.sam with 2 records.
dumpIndex
The dumpIndex
option on the bam executable prints BAM index file in an easy to read format.
Parameters
Required Parameters: bamIndexFile - path/name of the index file to display Optional Parameters: ref# - the reference number to print (optional) defaults to print all
Usage
./bam dumpIndex <bamIndexFile> <ref#>
Return Value
- -1 if the bam index file could not be opened.
- 0 if the bam index file could be opened.
readIndexedBam
The readIndexedBam
option on the bam executable reads an indexed BAM file reference id by reference id -1 to 22 and writes it out as a SAM/BAM file.
Parameters
Required Parameters: inputFilename - path/name of the input BAM file outputFile.sam/bam - path/name of the output file bamIndexFile - path/name of the BAM index file Optional Parameters: ref# - the reference number to print (optional) defaults to print all
Usage
./bam readIndexedBam <inputFilename> <outputFile.sam/bam> <bamIndexFile>
Return Value
- 0