Line 5: |
Line 5: |
| = Overview of the <code>squeeze</code> function of <code>bamUtil</code> = | | = Overview of the <code>squeeze</code> function of <code>bamUtil</code> = |
| The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally: | | The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally: |
− | * dropping OQ fields | + | * dropping OQ fields (default, disable using <code>--keepOQ</code>) |
− | * dropping duplicates | + | * dropping duplicates (default, disable using <code>--keepDups</code>) |
− | * dropping specified tags | + | * dropping specified tags (<code>--rmTags "Tag1:Type1;Tag2:Type2"</code>) |
− | * using '=' when a base matches the reference | + | * using '=' when a base matches the reference (<code>--refFile refFileName.fa</code>) |
− | * binning quality scores | + | * binning quality scores (<code>--binQualS</code><code>--binQualF</code>) |
− | * replacing readNames with unique integers | + | * replacing readNames with unique integers (<code>--readName</code>/<code>--sReadName</code>) |
| + | |
| + | |
| + | = Usage = |
| + | ./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <"Tag:Type[;Tag:Type]*>"] [--noeof] [--params] |
| + | |
| | | |
| = Parameters = | | = Parameters = |
Line 18: |
Line 23: |
| --out : the SAM/BAM file to be written | | --out : the SAM/BAM file to be written |
| Optional Parameters: | | Optional Parameters: |
− | --refFile : reference file name used to convert any bases that match the reference to '-' | + | --refFile : reference file name used to convert any bases that match the reference to '=' |
| --keepOQ : keep the OQ tag rather than removing it. Default is to remove it. | | --keepOQ : keep the OQ tag rather than removing it. Default is to remove it. |
| --keepDups : keep duplicates rather than removing records marked duplicate. Default is to remove them. | | --keepDups : keep duplicates rather than removing records marked duplicate. Default is to remove them. |
Line 28: |
Line 33: |
| This version does not require the input file to have been presorted by readname, | | This version does not require the input file to have been presorted by readname, |
| but uses a lot of memory since it stores all the read names. | | but uses a lot of memory since it stores all the read names. |
− | --rmTags : Remove the specified Tags formatted as Tag:Type;Tag:Type;Tag:Type... | + | --rmTags : Remove the specified Tags formatted as "Tag:Type;Tag:Type;Tag:Type"... |
| + | --noeof : do not expect an EOF block on a bam file. |
| + | --params : print the parameter settings |
| Quality Binning Parameters (optional): | | Quality Binning Parameters (optional): |
| Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used) | | Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used) |
Line 38: |
Line 45: |
| --binMid : Use the mid point of the quality bin range for the quality value of the bin. | | --binMid : Use the mid point of the quality bin range for the quality value of the bin. |
| --binHigh : Use the high end of the quality bin range for the quality value of the bin. | | --binHigh : Use the high end of the quality bin range for the quality value of the bin. |
− | --noeof : do not expect an EOF block on a bam file.
| |
− | --params : print the parameter settings
| |
| </pre> | | </pre> |
| + | {{PhoneHomeParamDesc}} |
| + | |
| + | == Required Parameters == |
| + | {{inBAMInputFile}} |
| + | {{outBAMOutputFile}} |
| + | |
| + | ==Optional Parameters== |
| + | {{refFile}} |
| | | |
| + | === Keep OQ Tag (<code>--keepOQ</code>) === |
| + | Use <code>--keepOQ</code> to keep the OQ tag rather than removing it. By default, the OQ tag is removed. |
| + | |
| + | === Keep Duplicates (<code>--keepDups</code>) === |
| + | Use <code>--keepDups</code> to keep records that are marked as duplicate (in the flag). By default, records marked as duplicate are removed. |
| + | |
| + | === Replace Read Names with Unique Integers (<code>--sReadName</code>, <code>--readName</code>) === |
| + | Use <code>--sReadName</code> or <code>--readName</code> to replace read names with unique integers and write the mapping to the specified file. |
| + | |
| + | <code>--sReadName</code> requires the input file to have been presorted by readname, but no validation is done to ensure proper sorting. If it is not sorted, a readname will get mapped to multiple new values. |
| + | |
| + | <code>--readName</code> does not require the input file to have been presorted by readname, but uses a lot of memory since it stores all the read names in memory. |
| + | |
| + | === Remove Tags (<code>--rmTags</code>) === |
| + | Use <code>--rmTags</code> followed by a list of tags separated by ';' to remove the specified tags. The tags should be formatted as: <code>"Tag:Type"</code>. Note: when using the ';' to specify multiple tags, be sure to put the whole string in quotes - otherwise the ';' will be interpreted as the end of the command. Example: <code>--rmTags "OQ:Z;MD:Z"</code> or <code>--rmTags 'OQ:Z;MD:Z'</code> |
| + | |
| + | {{noeofBGZFParameter}} |
| + | {{paramsParameter}} |
| + | |
| + | ==Optional Quality Binning Parameters== |
| + | Optionally, Quality scores can be binned to reduce the number of possible quality scores. |
| + | |
| + | === Quality Score Bins (<code>--binQualS</code>, <code>--binQualF</code>)=== |
| + | Use <code>--binQualS</code> or <code>--binQualF</code> to bin qualities by phred score, into the specified ranges (only one of the two options can be specified). |
| + | |
| + | The ranges are specified by comma separated minimum phred score for the bin, example: 1,17,20,30,40,50,70 |
| + | |
| + | The first bin always starts at 0, so does not need to be specified. |
| + | |
| + | By default, the bin value is the low end of the range. Use [[#Quality Score Bin Value (--binMid, --binHigh)|<code>--binMid</code> or <code>--binHigh</code>]] to change the value for the bin. |
| + | |
| + | Use <code>--binQualS</code> followed by the comma-separated bin minimum phred scores to specify the ranges on the command line. |
| + | |
| + | Use <code>--binQualF</code> followed by the filename to specify the ranges in a file. |
| + | |
| + | === Quality Score Bin Value (<code>--binMid</code>, <code>--binHigh</code>)=== |
| + | By default the lowest number in a bin is used as the bin's value. |
| + | |
| + | Use <code>--binMid</code> to use the mid point of the quality bin range for the quality value of the bin. |
| + | |
| + | Use <code>--binHigh</code> to use the highest number in the quality bin for the quality value of the bin. |
| | | |
− | = Usage =
| |
− | ./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]
| |
| | | |
| + | {{PhoneHomeParameters}} |
| | | |
| = Return Value = | | = Return Value = |
− | Returns the SamStatus for the reads/writes. | + | Returns the SamStatus for the reads/writes (0 for success, non-0 for failure). |
| | | |
| | | |