Line 1: |
Line 1: |
− | [[Category:Software|BamValidator]] | + | [[Category:BamUtil|validate]] |
− | == Status ==
| + | [[Category:BAM Software]] |
| + | [[Category:Software]] |
| + | |
| + | = Status = |
| | | |
| The initial version of a SAM/BAM Validator is complete, but does not yet validate all fields or produce all desired statistics. Future releases will add more validation and more statistics. | | The initial version of a SAM/BAM Validator is complete, but does not yet validate all fields or produce all desired statistics. Future releases will add more validation and more statistics. |
| | | |
− | == Download ==
| + | = Download = |
− | Click the link to download the tar of the source code: [[Media:bam.0.0.2.tgz|bam.0.0.2.tgz]]
| + | http://genome.sph.umich.edu/wiki/BamUtil |
| + | After compiling, the BAM Validator is found in bamUtil/bin/bam and is the "validate" subprogram (bamUtil/bin/bam validate). |
| | | |
− | If you use this software, please e-mail me, Mary Kate Trost, at mktrost@umich.edu
| + | = Purpose = |
− | | |
− | This version is recommended for Unix users with access to the GNU C++ compiler.
| |
− | | |
− | To install the BAM Library and the BAM Validator, unpack the downloaded file (tar xvf) and type make. The BAM Validator is found in pipeline/bam and is called bam (pipeline/bam/bam).
| |
− | | |
− | == Purpose ==
| |
| | | |
| The BamValidator processes the specified SAM/BAM file: | | The BamValidator processes the specified SAM/BAM file: |
Line 22: |
Line 20: |
| | | |
| | | |
− | === Valid SAM/BAM File Requirements ===
| + | == Valid SAM/BAM File Requirements == |
| | | |
| A valid SAM/BAM file meets the validation criteria specified in [[SAM Validation Criteria]]. | | A valid SAM/BAM file meets the validation criteria specified in [[SAM Validation Criteria]]. |
| | | |
− | === Statistic Generation ===
| + | == Statistic Generation == |
− | [[C++ Class: SamFile#Statistic Generation]] | + | |
| + | Statistics are generated by the BAM Validator if the <code>--disableStatistics</code> option is not set. A description of the statistics generated are found at: [[C++ Class: SamFile#Statistic Generation|Sam File Statistics]] |
| + | |
| + | = Usage = |
| + | |
| + | ./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--maxErrors <numErrors>] [--verbose] [--printableErrors <numReportedErrors>] [--disableStatistics] [--params] |
| + | |
| + | == Recommended Usage == |
| + | If you don't want the file statistics, use --disableStatistics. |
| + | |
| + | If you want to validate that the file is sorted, use the appropriate sorting flag. If you trust the @HD SO flag, use <code>so_flag</code>, otherwise if you want to check that it is sorted by coordinate, use <code>--so_coord</code>. |
| + | |
| + | If you want to see the error details, use --verbose, but if you want to limit the number of errors displayed, use --printableErrors. |
| + | |
| + | If you just want to know if the file is validly formatted or not, use --maxErrors 1 |
| + | |
| + | The following will give the most information (without validating that the file is sorted): |
| + | ./bam validate --in <inputFile> --verbose |
| | | |
− | == How to Use the Bam Validator Executable ==
| + | = Parameters = |
− | === Parameters ===
| |
| <pre> | | <pre> |
| Required Parameters: | | Required Parameters: |
Line 36: |
Line 50: |
| Optional Parameters: | | Optional Parameters: |
| --noeof : do not expect an EOF block on a bam file. | | --noeof : do not expect an EOF block on a bam file. |
| + | --refFile : the reference file |
| --so_flag : validate the file is sorted based on the header's @HD SO flag. | | --so_flag : validate the file is sorted based on the header's @HD SO flag. |
| --so_coord : validate the file is sorted based on the coordinate. | | --so_coord : validate the file is sorted based on the coordinate. |
Line 46: |
Line 61: |
| before suppressing them when in verbose (defaults to 100) | | before suppressing them when in verbose (defaults to 100) |
| --disableStatistics : Turn off statistic generation | | --disableStatistics : Turn off statistic generation |
| + | --params : Print the parameter settings |
| </pre> | | </pre> |
| + | {{PhoneHomeParamDesc}} |
| + | |
| + | == Required Parameters == |
| + | {{inBAMInputFile|hdr======}} |
| + | |
| + | == Optional Parameters == |
| + | {{noeofBGZFParameter}} |
| + | {{refFile}} |
| + | |
| + | === Validate Sort Order (<code>--so_flag</code>, <code>--so_coord</code>,<code>--so_query</code>)=== |
| + | Validate the sort order of the file: |
| + | * <code>--so_flag</code> - based on the flag in the header |
| + | * <code>--so_coord</code> - based on the coordinates/positions |
| + | * <code>--so_query</code> - based on the query/read names |
| + | |
| + | === Print Specific Errors (<code>--maxErrors</code>)=== |
| + | Use <code>--maxErrors</code> followed by a number to specify the maximum number of records with errors/invalids to process before quiting. |
| + | |
| + | -1 (default) indicates to not quit until the entire file is validated. |
| + | |
| + | 0 indicates not to read/validate anything. |
| + | |
| + | === Print Specific Errors (<code>--verbose</code>)=== |
| + | Use <code>--verbose</code> to print specific error details rather than just a summary. |
| | | |
− | === Usage === | + | === Maxium Number of Record Error Details to Print (<code>--printableErrors</code>)=== |
| + | Use <code>--printableErrors</code> followed by a number to specify the maximum number of records with errors to print the details of before suppressing them. This parameter is only valid when [[#Print Specific Errors (--verbose)|<code>--verbose</code>]] is also specified. |
| | | |
− | ./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--maxErrors <numErrors>] [--verbose] [--printableErrors <numReportedErrors>] [--disableStatistics]
| + | The default is 100. |
| | | |
− | ==== Recommended Usage ==== | + | === Disable Statistic Generation (<code>--disableStatistics</code>)=== |
− | If you don't want the file statistics, use --disableStatistics.
| + | Use <code>--disableStatistics</code> to turn off statistic generation (statistics are generated by default). |
| | | |
− | If you want to validate that the file is sorted, use the appropriate sorting flag. If you trust the @HD SO flag, use <code>so_flag</code>, otherwise if you want to check that it is sorted by coordinate, use <code>--so_coord</code>.
| + | {{paramsParameter}} |
| | | |
− | If you want to see the error details, use --verbose, but if you want to limit the number of errors displayed, use --printableErrors.
| + | {{PhoneHomeParameters}} |
| | | |
− | If you just want to know if the file is validly formatted or not, use --maxErrors 1 | + | = Output = |
| + | The error details (--verbose) and the statistics are printed to stderr. If you want that to go to a file you need to redirect stderr. |
| | | |
− | The following will give the most information (without validating that the file is sorted):
| + | For a bash shell, redirect to stderr by doing: |
− | ./bam validate --in <inputFile> --verbose | + | ./bam validate --in <inputFile> --verbose 2> outputFile.txt |
| | | |
| | | |
− | === Return Value ===
| + | = Return Value = |
| * 0: all records are successfully read, are valid, and are properly sorted. | | * 0: all records are successfully read, are valid, and are properly sorted. |
| * non-0: at least one record was not successfully read, not valid, or not properly sorted. | | * non-0: at least one record was not successfully read, not valid, or not properly sorted. |
| | | |
− | === Example Outputs ===
| + | = Example Outputs = |
| | | |
− | ==== Valid File ====
| + | == Valid File == |
| <pre> | | <pre> |
| ./bam validate --in ~/data/bamExample/37mer_alt.bwa.bam | | ./bam validate --in ~/data/bamExample/37mer_alt.bwa.bam |
| | | |
− | The following parameters are available. Ones with "[]" are in effect:
| |
− |
| |
− | Input Parameters
| |
− | --in [/home/mktrost/data/bamExample/37mer_alt.bwa.bam], --noeof,
| |
− | --maxErrors [-1], --verbose, --printableErrors [100],
| |
− | --disableStatistics
| |
− | SortOrder : --so_flag, --so_coord, --so_query
| |
− |
| |
− | '
| |
| Number of records read = 18900000 | | Number of records read = 18900000 |
| Number of valid records = 18900000 | | Number of valid records = 18900000 |
Line 105: |
Line 138: |
| </pre> | | </pre> |
| | | |
− | ==== Invalid File ====
| + | == Invalid File == |
| <pre> | | <pre> |
| ./bam validate --in test/testFiles/testInvalid.sam | | ./bam validate --in test/testFiles/testInvalid.sam |
− |
| |
− | The following parameters are available. Ones with "[]" are in effect:
| |
− |
| |
− | Input Parameters
| |
− | --in [test/testFiles/testInvalid.sam], --noeof, --maxErrors [-1], --verbose,
| |
− | --printableErrors [100], --disableStatistics
| |
− | SortOrder : --so_flag, --so_coord, --so_query
| |
− |
| |
| | | |
| Number of records read = 32 | | Number of records read = 32 |
Line 147: |
Line 172: |
| </pre> | | </pre> |
| | | |
− | ==== Invalid File with Verbose ====
| + | == Invalid File with Verbose == |
| Printable errors is specified to produce a smaller example that does not print all the errors since that would take up more space. | | Printable errors is specified to produce a smaller example that does not print all the errors since that would take up more space. |
| | | |
| <pre> | | <pre> |
| ./bam validate --in test/testFiles/testInvalid.sam --verbose --printableErrors 5 | | ./bam validate --in test/testFiles/testInvalid.sam --verbose --printableErrors 5 |
− |
| |
− | The following parameters are available. Ones with "[]" are in effect:
| |
− |
| |
− | Input Parameters
| |
− | --in [test/testFiles/testInvalid.sam], --noeof, --maxErrors [-1],
| |
− | --verbose [ON], --printableErrors [5], --disableStatistics
| |
− | SortOrder : --so_flag, --so_coord, --so_query
| |
| | | |
| Record 1 | | Record 1 |
Line 207: |
Line 225: |
| Returning: 7 (INVALID) | | Returning: 7 (INVALID) |
| </pre> | | </pre> |
− |
| |
− |
| |
− | == Libraries ==
| |
− | *[[C++ Library: libbam|libbam.a]]
| |
− | *[[C++ Library: libcsg|libcsg.a]]
| |