BamUtil: convert

From Genome Analysis Wiki
Revision as of 12:35, 2 September 2011 by Mktrost (talk | contribs) (Move the Sequence Representation Description to be a subsection of Parameters)
Jump to navigationJump to search

Overview of the convert function of bamUtil

The convert option on the bamUtil executable reads a SAM/BAM file and writes it as a SAM/BAM file.

The executable converts the input file into the format of the output file. So if you want to convert a BAM file to a SAM file, just call:

<pathToExe>/bam --in <bamFile>.bam --out <newSamFile>.sam

Don't forget to put in the paths to the executable and your test files.


    Required Parameters:
        --in        : the SAM/BAM file to be read
        --out       : the SAM/BAM file to be written
    Optional Parameters:
	--refFile   : reference file name
        --noeof     : do not expect an EOF block on a bam file.
        --params    : print the parameter settings
    Optional Sequence Parameters (only specify one):
	--seqOrig   : Leave the sequence as is (default & used if reference is not specified).
	--seqBases  : Convert any '=' in the sequence to the appropriate base using the reference (requires --ref).
	--seqEquals : Convert any bases that match the reference to '=' (requires --ref).

Sequence Representation Parameters

The sequence parameters options specify how to represent the sequence if the reference is specified (refFile option). If the reference is not specified or seqOrig is specified, no modifications are made to the sequence. If the reference and seqBases is specified, any matches between the sequence and the reference are represented in the sequence as the appropriate base. If the reference and seqEquals is specified, any matches between the sequence and the reference are represented in the sequence as '='.


Sequence:      AATAA  CTAGA   T AGGG
Reference:       TAACCCTA ACCCT A
Sequence with Orig:   AATAACTAGATAGGG
Sequence with Bases:  AATAACTAGATAGGG
Sequence with Equals: AA======G===GGG
Sequence:      AATGA  CTGGA   T AGGG
Reference:       TAACCCTA ACCCT A
Sequence with Orig:   AATGACTGGATAGGG
Sequence with Bases:  AATGACTGGATAGGG
Sequence with Equals: AA=G===GG===GGG
Sequence:      AAT=A  CT=GA   T AGGG
Reference:       TAACCCTA ACCCT A
Sequence with Orig:   AAT=ACT=GATAGGG
Sequence with Bases:  AATGACTGGATAGGG
Sequence with Equals: AA======G===GGG
Sequence:      AA===  ===G=   = =GGG
Reference:       TAACCCTA ACCCT A
Sequence with Orig:   AA======G===GGG
Sequence with Bases:  AATAACTAGATAGGG
Sequence with Equals: AA======G===GGG


./bam convert --in <inputFile> --out <outputFile.sam/bam/ubam (ubam is uncompressed bam)> [--refFile <reference filename>] [--seqBases|--seqEquals|--seqOrig] [--noeof] [--params]

Return Value

Returns the SamStatus for the reads/writes.

Example Output

Number of records read = 10
Number of records written = 10