Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,246 bytes added ,  11:58, 26 March 2014
no edit summary
Line 5: Line 5:  
= Overview of the <code>squeeze</code> function of <code>bamUtil</code> =
 
= Overview of the <code>squeeze</code> function of <code>bamUtil</code> =
 
The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally:
 
The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally:
* dropping OQ fields
+
* dropping OQ fields (default, disable using <code>--keepOQ</code>)
* dropping duplicates
+
* dropping duplicates (default, disable using <code>--keepDups</code>)
* dropping specified tags
+
* dropping specified tags (<code>--rmTags "Tag1:Type1;Tag2:Type2"</code>)
* using '=' when a base matches the reference
+
* using '=' when a base matches the reference (<code>--refFile refFileName.fa</code>)
* binning quality scores
+
* binning quality scores (<code>--binQualS</code><code>--binQualF</code>)
* replacing readNames with unique integers
+
* replacing readNames with unique integers (<code>--readName</code>/<code>--sReadName</code>)
 +
 
 +
 
 +
= Usage =
 +
./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <"Tag:Type[;Tag:Type]*>"] [--noeof] [--params]
 +
 
    
= Parameters =
 
= Parameters =
Line 18: Line 23:  
--out        : the SAM/BAM file to be written
 
--out        : the SAM/BAM file to be written
 
Optional Parameters:
 
Optional Parameters:
--refFile    : reference file name used to convert any bases that match the reference to '-'
+
--refFile    : reference file name used to convert any bases that match the reference to '='
 
--keepOQ    : keep the OQ tag rather than removing it.  Default is to remove it.
 
--keepOQ    : keep the OQ tag rather than removing it.  Default is to remove it.
 
--keepDups  : keep duplicates rather than removing records marked duplicate.  Default is to remove them.
 
--keepDups  : keep duplicates rather than removing records marked duplicate.  Default is to remove them.
Line 28: Line 33:  
                   This version does not require the input file to have been presorted by readname,
 
                   This version does not require the input file to have been presorted by readname,
 
                   but uses a lot of memory since it stores all the read names.
 
                   but uses a lot of memory since it stores all the read names.
--rmTags    : Remove the specified Tags formatted as Tag:Type;Tag:Type;Tag:Type...
+
--rmTags    : Remove the specified Tags formatted as "Tag:Type;Tag:Type;Tag:Type"...
 +
--noeof      : do not expect an EOF block on a bam file.
 +
--params    : print the parameter settings
 
Quality Binning Parameters (optional):
 
Quality Binning Parameters (optional):
 
  Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used)
 
  Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used)
Line 38: Line 45:  
--binMid    : Use the mid point of the quality bin range for the quality value of the bin.
 
--binMid    : Use the mid point of the quality bin range for the quality value of the bin.
 
--binHigh    : Use the high end of the quality bin range for the quality value of the bin.
 
--binHigh    : Use the high end of the quality bin range for the quality value of the bin.
--noeof      : do not expect an EOF block on a bam file.
  −
--params    : print the parameter settings
   
</pre>
 
</pre>
 +
{{PhoneHomeParamDesc}}
 +
 +
== Required Parameters ==
 +
{{inBAMInputFile}}
 +
{{outBAMOutputFile}}
 +
 +
==Optional Parameters==
 +
{{refFile}}
    +
=== Keep OQ Tag (<code>--keepOQ</code>) ===
 +
Use <code>--keepOQ</code> to keep the OQ tag rather than removing it.  By default, the OQ tag is removed.
 +
 +
=== Keep Duplicates (<code>--keepDups</code>) ===
 +
Use <code>--keepDups</code> to keep records that are marked as duplicate (in the flag).  By default, records marked as duplicate are removed.
 +
 +
=== Replace Read Names with Unique Integers (<code>--sReadName</code>, <code>--readName</code>) ===
 +
Use <code>--sReadName</code> or <code>--readName</code> to replace read names with unique integers and write the mapping to the specified file.
 +
 +
<code>--sReadName</code> requires the input file to have been presorted by readname, but no validation is done to ensure proper sorting.  If it is not sorted, a readname will get mapped to multiple new values.
 +
 +
<code>--readName</code> does not require the input file to have been presorted by readname, but uses a lot of memory since it stores all the read names in memory.
 +
 +
=== Remove Tags (<code>--rmTags</code>) ===
 +
Use <code>--rmTags</code> followed by a list of tags separated by ';' to remove the specified tags.  The tags should be formatted as: <code>"Tag:Type"</code>.  Note: when using the ';' to specify multiple tags, be sure to put the whole string in quotes - otherwise the ';' will be interpreted as the end of the command.  Example: <code>--rmTags "OQ:Z;MD:Z"</code> or <code>--rmTags 'OQ:Z;MD:Z'</code>
 +
 +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
 +
 +
==Optional Quality Binning Parameters==
 +
Optionally, Quality scores can be binned to reduce the number of possible quality scores.
 +
 +
=== Quality Score Bins (<code>--binQualS</code>, <code>--binQualF</code>)===
 +
Use <code>--binQualS</code> or <code>--binQualF</code> to bin qualities by phred score, into the specified ranges (only one of the two options can be specified).
 +
 +
The ranges are specified by comma separated minimum phred score for the bin, example: 1,17,20,30,40,50,70
 +
 +
The first bin always starts at 0, so does not need to be specified.
 +
 +
By default, the bin value is the low end of the range.  Use [[#Quality Score Bin Value (--binMid, --binHigh)|<code>--binMid</code> or <code>--binHigh</code>]] to change the value for the bin.
 +
 +
Use <code>--binQualS</code> followed by the comma-separated bin minimum phred scores to specify the ranges on the command line.
 +
 +
Use <code>--binQualF</code> followed by the filename to specify the ranges in a file.
 +
 +
=== Quality Score Bin Value (<code>--binMid</code>, <code>--binHigh</code>)===
 +
By default the lowest number in a bin is used as the bin's value.
 +
 +
Use <code>--binMid</code> to use the mid point of the quality bin range for the quality value of the bin.
 +
 +
Use <code>--binHigh</code> to use the highest number in the quality bin for the quality value of the bin.
   −
= Usage =
  −
./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]
      +
{{PhoneHomeParameters}}
    
= Return Value =
 
= Return Value =
Returns the SamStatus for the reads/writes.
+
Returns the SamStatus for the reads/writes (0 for success, non-0 for failure).
     

Navigation menu