Changes

From Genome Analysis Wiki
Jump to navigationJump to search
795 bytes added ,  14:05, 6 January 2014
no edit summary
Line 1: Line 1: −
[[Category:Software|BamValidator]]
+
[[Category:BamUtil|validate]]
== Status  ==
+
[[Category:BAM Software]]
 +
[[Category:Software]]
 +
 
 +
= Status  =
    
The initial version of a SAM/BAM Validator is complete, but does not yet validate all fields or produce all desired statistics.  Future releases will add more validation and more statistics.
 
The initial version of a SAM/BAM Validator is complete, but does not yet validate all fields or produce all desired statistics.  Future releases will add more validation and more statistics.
   −
== Download ==
+
= Download =
Click the link to download the tar of the source code: [[Media:bam.0.0.2.tgz|bam.0.0.2.tgz]]
+
http://genome.sph.umich.edu/wiki/BamUtil
 +
After compiling, the BAM Validator is found in bamUtil/bin/bam and is the "validate" subprogram (bamUtil/bin/bam validate).  
   −
If you use this software, please e-mail me, Mary Kate Trost, at mktrost@umich.edu
+
= Purpose =
 
  −
This version is recommended for Unix users with access to the GNU C++ compiler.
  −
 
  −
To install the BAM Library and the BAM Validator, unpack the downloaded file (tar xvf) and type make. The BAM Validator is found in pipeline/bam and is called bam (pipeline/bam/bam).
  −
 
  −
== Purpose ==
      
The BamValidator processes the specified SAM/BAM file:
 
The BamValidator processes the specified SAM/BAM file:
Line 22: Line 20:       −
=== Valid SAM/BAM File Requirements ===
+
== Valid SAM/BAM File Requirements ==
    
A valid SAM/BAM file meets the validation criteria specified in [[SAM Validation Criteria]].
 
A valid SAM/BAM file meets the validation criteria specified in [[SAM Validation Criteria]].
   −
=== Statistic Generation ===
+
== Statistic Generation ==
[[C++ Class: SamFile#Statistic Generation]]
+
 
 +
Statistics are generated by the BAM Validator if the <code>--disableStatistics</code> option is not set.  A description of the statistics generated are found at: [[C++ Class: SamFile#Statistic Generation|Sam File Statistics]]
 +
 
 +
= Usage =
 +
 
 +
./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--maxErrors <numErrors>] [--verbose] [--printableErrors <numReportedErrors>] [--disableStatistics] [--params]
 +
 
 +
== Recommended Usage ==
 +
If you don't want the file statistics, use --disableStatistics.
 +
 
 +
If you want to validate that the file is sorted, use the appropriate sorting flag. If you trust the @HD SO flag, use <code>so_flag</code>, otherwise if you want to check that it is sorted by coordinate, use <code>--so_coord</code>.
 +
 
 +
If you want to see the error details, use --verbose, but if you want to limit the number of errors displayed, use --printableErrors.
 +
 
 +
If you just want to know if the file is validly formatted or not, use --maxErrors 1
 +
 
 +
The following will give the most information (without validating that the file is sorted):
 +
./bam validate --in <inputFile> --verbose
   −
== How to Use the Bam Validator Executable ==
+
= Parameters =
=== Parameters ===
   
<pre>
 
<pre>
 
Required Parameters:
 
Required Parameters:
Line 36: Line 50:  
Optional Parameters:
 
Optional Parameters:
 
--noeof            : do not expect an EOF block on a bam file.
 
--noeof            : do not expect an EOF block on a bam file.
 +
--refFile          : the reference file
 
--so_flag          : validate the file is sorted based on the header's @HD SO flag.
 
--so_flag          : validate the file is sorted based on the header's @HD SO flag.
 
--so_coord          : validate the file is sorted based on the coordinate.
 
--so_coord          : validate the file is sorted based on the coordinate.
Line 46: Line 61:  
                      before suppressing them when in verbose (defaults to 100)
 
                      before suppressing them when in verbose (defaults to 100)
 
--disableStatistics : Turn off statistic generation
 
--disableStatistics : Turn off statistic generation
 +
--params            : Print the parameter settings
 
</pre>
 
</pre>
 +
{{PhoneHomeParamDesc}}
 +
 +
== Required Parameters ==
 +
{{inBAMInputFile|hdr======}}
 +
 +
== Optional Parameters ==
 +
{{noeofBGZFParameter}}
 +
{{refFile}}
 +
 +
=== Validate Sort Order (<code>--so_flag</code>, <code>--so_coord</code>,<code>--so_query</code>)===
 +
Validate the sort order of the file:
 +
* <code>--so_flag</code> - based on the flag in the header
 +
* <code>--so_coord</code> - based on the coordinates/positions
 +
* <code>--so_query</code> - based on the query/read names
 +
 +
=== Print Specific Errors (<code>--maxErrors</code>)===
 +
Use <code>--maxErrors</code> followed by a number to specify the maximum number of records with errors/invalids to process before quiting.
 +
 +
-1 (default) indicates to not quit until the entire file is validated.
 +
 +
0 indicates not to read/validate anything.
 +
 +
=== Print Specific Errors (<code>--verbose</code>)===
 +
Use <code>--verbose</code> to print specific error details rather than just a summary.
   −
=== Usage ===
+
=== Maxium Number of Record Error Details to Print  (<code>--printableErrors</code>)===
 +
Use <code>--printableErrors</code> followed by a number to specify the maximum number of records with errors to print the details of before suppressing them.  This parameter is only valid when [[#Print Specific Errors (--verbose)|<code>--verbose</code>]] is also specified.
   −
./bam validate --in <inputFile> [--noeof] [--so_flag|--so_coord|--so_query] [--maxErrors <numErrors>] [--verbose] [--printableErrors <numReportedErrors>] [--disableStatistics]
+
The default is 100.
   −
==== Recommended Usage ====
+
=== Disable Statistic Generation (<code>--disableStatistics</code>)===
If you don't want the file statistics, use --disableStatistics.
+
Use <code>--disableStatistics</code> to turn off statistic generation (statistics are generated by default).
   −
If you want to validate that the file is sorted, use the appropriate sorting flag. If you trust the @HD SO flag, use <code>so_flag</code>, otherwise if you want to check that it is sorted by coordinate, use <code>--so_coord</code>.
+
{{paramsParameter}}
   −
If you want to see the error details, use --verbose, but if you want to limit the number of errors displayed, use --printableErrors.
+
{{PhoneHomeParameters}}
   −
If you just want to know if the file is validly formatted or not, use --maxErrors 1
+
= Output =
 +
The error details (--verbose) and the statistics are printed to stderr.  If you want that to go to a file you need to redirect stderr.
   −
The following will give the most information (without validating that the file is sorted):
+
For a bash shell, redirect to stderr by doing:
  ./bam validate --in <inputFile> --verbose
+
  ./bam validate --in <inputFile> --verbose 2> outputFile.txt
      −
=== Return Value ===
+
= Return Value =
 
*    0: all records are successfully read, are valid, and are properly sorted.
 
*    0: all records are successfully read, are valid, and are properly sorted.
 
* non-0: at least one record was not successfully read, not valid, or not properly sorted.
 
* non-0: at least one record was not successfully read, not valid, or not properly sorted.
   −
=== Example Outputs ===
+
= Example Outputs =
   −
==== Valid File ====
+
== Valid File ==
 
<pre>
 
<pre>
 
./bam validate --in ~/data/bamExample/37mer_alt.bwa.bam
 
./bam validate --in ~/data/bamExample/37mer_alt.bwa.bam
   −
The following parameters are available.  Ones with "[]" are in effect:
  −
  −
Input Parameters
  −
--in [/home/mktrost/data/bamExample/37mer_alt.bwa.bam], --noeof,
  −
              --maxErrors [-1], --verbose, --printableErrors [100],
  −
              --disableStatistics
  −
  SortOrder : --so_flag, --so_coord, --so_query
  −
  −
'
   
Number of records read = 18900000
 
Number of records read = 18900000
 
Number of valid records = 18900000
 
Number of valid records = 18900000
Line 105: Line 138:  
</pre>
 
</pre>
   −
==== Invalid File ====
+
== Invalid File ==
 
<pre>
 
<pre>
 
./bam validate --in test/testFiles/testInvalid.sam  
 
./bam validate --in test/testFiles/testInvalid.sam  
  −
The following parameters are available.  Ones with "[]" are in effect:
  −
  −
Input Parameters
  −
--in [test/testFiles/testInvalid.sam], --noeof, --maxErrors [-1], --verbose,
  −
              --printableErrors [100], --disableStatistics
  −
  SortOrder : --so_flag, --so_coord, --so_query
  −
      
Number of records read = 32
 
Number of records read = 32
Line 147: Line 172:  
</pre>
 
</pre>
   −
==== Invalid File with Verbose ====  
+
== Invalid File with Verbose ==  
 
Printable errors is specified to produce a smaller example that does not print all the errors since that would take up more space.
 
Printable errors is specified to produce a smaller example that does not print all the errors since that would take up more space.
    
<pre>
 
<pre>
 
./bam validate --in test/testFiles/testInvalid.sam --verbose --printableErrors 5
 
./bam validate --in test/testFiles/testInvalid.sam --verbose --printableErrors 5
  −
The following parameters are available.  Ones with "[]" are in effect:
  −
  −
Input Parameters
  −
--in [test/testFiles/testInvalid.sam], --noeof, --maxErrors [-1],
  −
              --verbose [ON], --printableErrors [5], --disableStatistics
  −
  SortOrder : --so_flag, --so_coord, --so_query
      
Record 1
 
Record 1
Line 207: Line 225:  
Returning: 7 (INVALID)
 
Returning: 7 (INVALID)
 
</pre>
 
</pre>
  −
  −
== Libraries ==
  −
*[[C++ Library: libbam|libbam.a]]
  −
*[[C++ Library: libcsg|libcsg.a]]
 

Navigation menu