Difference between revisions of "BamUtil: squeeze"

From Genome Analysis Wiki
Jump to navigationJump to search
(→‎Parameters: Fix typo in usage)
(Update usage and start adding parameters)
Line 5: Line 5:
 
= Overview of the <code>squeeze</code> function of <code>bamUtil</code> =
 
= Overview of the <code>squeeze</code> function of <code>bamUtil</code> =
 
The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally:
 
The <code>squeeze</code> option on the [[bamUtil]] executable reduces files size by optionally:
* dropping OQ fields
+
* dropping OQ fields (default, disable using <code>--keepOQ</code>)
* dropping duplicates
+
* dropping duplicates (default, disable using <code>--keepDups</code>)
* dropping specified tags
+
* dropping specified tags (<code>--rmTags Tag1:Type1;Tag2:Type2</code>)
* using '=' when a base matches the reference
+
* using '=' when a base matches the reference (<code>--refFile refFileName.fa</code>)
* binning quality scores
+
* binning quality scores (<code>--binQualS</code><code>--binQualF</code>)
* replacing readNames with unique integers
+
* replacing readNames with unique integers (<code>--readName</code>/<code>--sReadName</code>)
 +
 
 +
 
 +
= Usage =
 +
./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]
 +
 
  
 
= Parameters =
 
= Parameters =
Line 42: Line 47:
 
</pre>
 
</pre>
  
= Usage =
+
 
./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]
+
{{inBAMInputFile}}
 +
{{outBAMOutputFile}}
 +
 
 +
 
  
  

Revision as of 17:35, 7 October 2011


Overview of the squeeze function of bamUtil

The squeeze option on the bamUtil executable reduces files size by optionally:

  • dropping OQ fields (default, disable using --keepOQ)
  • dropping duplicates (default, disable using --keepDups)
  • dropping specified tags (--rmTags Tag1:Type1;Tag2:Type2)
  • using '=' when a base matches the reference (--refFile refFileName.fa)
  • binning quality scores (--binQualS--binQualF)
  • replacing readNames with unique integers (--readName/--sReadName)


Usage

./bam squeeze --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <refFilePath/Name>] [--keepOQ] [--keepDups] [--readName <readNameMapFile.txt>] [--sReadName <readNameMapFile.txt>] [--binQualS <minQualBin2>,<minQualBin3><...>] [--binQualF <filename>] [--rmTags <Tag:Type[;Tag:Type]*>] [--noeof] [--params]


Parameters

	Required Parameters:
		--in         : the SAM/BAM file to be read
		--out        : the SAM/BAM file to be written
	Optional Parameters:
		--refFile    : reference file name used to convert any bases that match the reference to '='
		--keepOQ     : keep the OQ tag rather than removing it.  Default is to remove it.
		--keepDups   : keep duplicates rather than removing records marked duplicate.  Default is to remove them.
		--sReadName  : Replace read names with unique integers and write the mapping to the specified file.
                   This version requires the input file to have been presorted by readname, but
                   no validation is done to ensure this.  If it is not sorted, a readname will
                   get mapped to multiple new values.
		--readName   : Replace read names with unique integers and write the mapping to the specified file.
                   This version does not require the input file to have been presorted by readname,
                   but uses a lot of memory since it stores all the read names.
		--rmTags     : Remove the specified Tags formatted as Tag:Type;Tag:Type;Tag:Type...
	Quality Binning Parameters (optional):
	  Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used)
	  Ranges are specified by comma separated minimum phred score for the bin, example: 1,17,20,30,40,50,70
	  The first bin always starts at 0, so does not need to be specified.
	  By default, the bin value is the low end of the range.
		--binQualS   : Bin the Qualities as specified (phred): minQualOfBin2, minQualofBin3...
		--binQualF   : Bin the Qualities based on the specified file
		--binMid     : Use the mid point of the quality bin range for the quality value of the bin.
		--binHigh    : Use the high end of the quality bin range for the quality value of the bin.
		--noeof      : do not expect an EOF block on a bam file.
		--params     : print the parameter settings


Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.



Return Value

Returns the SamStatus for the reads/writes.


Example Output

Number of records read = 13
Number of records written = 10