Difference between revisions of "BamUtil"

From Genome Analysis Wiki
Jump to: navigation, search
Line 19: Line 19:
 
The validation checks that the file is sorted as specified in the user options.  Default is unsorted, in which case, no order validation is done.
 
The validation checks that the file is sorted as specified in the user options.  Default is unsorted, in which case, no order validation is done.
  
'''NOTE: Currently the only validation that is done is that the file is sorted as specified in the user specified options.'''
+
'''NOTE: Currently only minimal validation is done.'''
  
 +
=== Parameters ===
 +
<pre>
 +
    Required Parameters:
 +
--in : the SAM/BAM file to be validated
 +
    Optional Parameters:
 +
--noeof            : do not expect an EOF block on a bam file.
 +
--so_flag          : validate the file is sorted based on the header's @HD SO flag.
 +
--so_coord          : validate the file is sorted based on the coordinate.
 +
--so_query          : validate the file is sorted based on the query name.
 +
--quitAfterErrorNum : Number of records with errors/invalids to allow before quiting.
 +
                      -1 (default) indicates to not quit until the entire file is validated.
 +
                      0 indicates not to read/validate anything.
 +
--maxReportedErrors : Maximum number of errors to print (defaults to 100)
 +
</pre>
  
 
=== Usage ===
 
=== Usage ===
  
  ./bam validate --in <inputFile> [--so_flag|--so_unsorted|--so_coord|--so_query]
+
  ./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--quitAfterErrorNum <numErrors>] [--maxReportedErrors <numReportedErrors>]
  
  
Line 33: Line 47:
  
 
=== Example Output ===
 
=== Example Output ===
 +
<pre>
 +
The following parameters are in effect:
 +
 +
Input Parameters
 +
--in [t.sam], --noeof, --quitAfterErrorNum [-1], --maxReportedErrors [100]
 +
  SortOrder : --so_flag, --so_coord, --so_query
 +
 +
Record 1
 +
FAIL_PARSE: Too few columns in the Record
 +
 +
Record 2
 +
FAIL_PARSE: Too few columns in the Record
 +
 +
 +
Number of records read = 2
 +
Number of valid records = 0
 +
Returning: 5 (FAIL_PARSE)
 +
</pre>
  
  
 
== Read a SAM/BAM file and write as a SAM/BAM file ==
 
== Read a SAM/BAM file and write as a SAM/BAM file ==
This executable takes 2 arguments.  The first argument is the input file.  The second argument is the output file.  The executable converts the first file into the format of the second file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
+
This executable takes 2/3 arguments.  The first argument is the input file.  The second argument is the output file.  The executable converts the first file into the format of the second file.  So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:
 
  ./bam <bamFile>.bam <newSamFile>.sam
 
  ./bam <bamFile>.bam <newSamFile>.sam
 
Don't forget to put in the paths to the executable and your test files.
 
Don't forget to put in the paths to the executable and your test files.
 +
The third argument, <code>NOEOF</code>, specifies that the End-Of-File Block should not be checked for when opening the file.
  
 
=== Usage ===
 
=== Usage ===
  ./bam <inputFile> <outputFile.sam/bam>
+
  ./bam <inputFile> <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [NOEOF]
 +
 
  
 
=== Return Value ===
 
=== Return Value ===

Revision as of 17:59, 11 May 2010

bam Executable

When the pipeline is compiled, the SAM/BAM executable, "bam" is generated in the pipeline/bam/ directory.

The software reads the beginning of an input file to determine if it is SAM/BAM. To determine the format (SAM/BAM) of the output file, the software checks the output file's extension. If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.

The bam executable has the following functions.

This executable is built using the bam library.


Read and Validate a SAM/BAM file

The validate option on the bam executable validates a SAM/BAM file.

The validation checks that the file is sorted as specified in the user options. Default is unsorted, in which case, no order validation is done.

NOTE: Currently only minimal validation is done.

Parameters

    Required Parameters:
	--in : the SAM/BAM file to be validated
    Optional Parameters:
	--noeof             : do not expect an EOF block on a bam file.
	--so_flag           : validate the file is sorted based on the header's @HD SO flag.
	--so_coord          : validate the file is sorted based on the coordinate.
	--so_query          : validate the file is sorted based on the query name.
	--quitAfterErrorNum : Number of records with errors/invalids to allow before quiting.
	                      -1 (default) indicates to not quit until the entire file is validated.
	                      0 indicates not to read/validate anything.
	--maxReportedErrors : Maximum number of errors to print (defaults to 100)

Usage

./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--quitAfterErrorNum <numErrors>] [--maxReportedErrors <numReportedErrors>]


Return Value

  • 0: all records are successfully read, are valid, and are properly sorted.
  • non-0: at least one record was not successfully read, not valid, or not properly sorted.


Example Output

The following parameters are in effect:

Input Parameters
 --in [t.sam], --noeof, --quitAfterErrorNum [-1], --maxReportedErrors [100]
   SortOrder : --so_flag, --so_coord, --so_query

Record 1
FAIL_PARSE: Too few columns in the Record

Record 2
FAIL_PARSE: Too few columns in the Record


Number of records read = 2
Number of valid records = 0
Returning: 5 (FAIL_PARSE)


Read a SAM/BAM file and write as a SAM/BAM file

This executable takes 2/3 arguments. The first argument is the input file. The second argument is the output file. The executable converts the first file into the format of the second file. So if you want to convert a BAM file to a SAM file, from the pipeline/bam/ directory you just call:

./bam <bamFile>.bam <newSamFile>.sam

Don't forget to put in the paths to the executable and your test files. The third argument, NOEOF, specifies that the End-Of-File Block should not be checked for when opening the file.

Usage

./bam <inputFile> <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [NOEOF]


Return Value

Example Output

Dump a BAM index file

Usage

./bam dump_index <bamIndexFile>

Return Value

  • -1 if the bam index file could not be opened.
  • 0 if the bam index file could be opened.

Example Output

Read & Write indexed BAM file

Usage

./bam read_indexed_bam <inputFilename> <outputFile.sam/bam> <bamIndexFile>

Return Value

  • 0

Example Output