BamUtil: squeeze
From Genome Analysis Wiki
Revision as of 13:19, 27 September 2011 by Mktrost (talk | contribs) (→Parameters: Fix typo in usage)
Overview of the squeeze
function of bamUtil
The squeeze
option on the bamUtil executable reduces files size by optionally:
- dropping OQ fields
- dropping duplicates
- dropping specified tags
- using '=' when a base matches the reference
- binning quality scores
- replacing readNames with unique integers
Parameters
Required Parameters: --in : the SAM/BAM file to be read --out : the SAM/BAM file to be written Optional Parameters: --refFile : reference file name used to convert any bases that match the reference to '=' --keepOQ : keep the OQ tag rather than removing it. Default is to remove it. --keepDups : keep duplicates rather than removing records marked duplicate. Default is to remove them. --sReadName : Replace read names with unique integers and write the mapping to the specified file. This version requires the input file to have been presorted by readname, but no validation is done to ensure this. If it is not sorted, a readname will get mapped to multiple new values. --readName : Replace read names with unique integers and write the mapping to the specified file. This version does not require the input file to have been presorted by readname, but uses a lot of memory since it stores all the read names. --rmTags : Remove the specified Tags formatted as Tag:Type;Tag:Type;Tag:Type... Quality Binning Parameters (optional): Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used) Ranges are specified by comma separated minimum phred score for the bin, example: 1,17,20,30,40,50,70 The first bin always starts at 0, so does not need to be specified. By default, the bin value is the low end of the range. --binQualS : Bin the Qualities as specified (phred): minQualOfBin2, minQualofBin3... --binQualF : Bin the Qualities based on the specified file --binMid : Use the mid point of the quality bin range for the quality value of the bin. --binHigh : Use the high end of the quality bin range for the quality value of the bin. --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings
Usage
./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]
Return Value
Returns the SamStatus for the reads/writes.
Example Output
Number of records read = 13 Number of records written = 10