Difference between revisions of "BamUtil: mergeBam"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 1: Line 1:
= Overview of the <code>rgMergeBam</code> function of <code>bamUtil</code> =
+
= Overview of the <code>mergeBam</code> function of <code>bamUtil</code> =
The <code>rgMergeBam</code> option on the [[bamUtil]] executable merges multiple BAM files appending ReadGroup IDs.
+
The <code>mergeBam</code> option on the [[bamUtil]] executable merges multiple BAM files appending ReadGroup IDs if necessary.
  
rgMergeBam merges multiple sorted BAM files into one BAM file like 'samtools merge' command, but merges BAM headers.
+
As of version 1.0.7, this program was renamed from rgMergeBam to mergeBam.
* Checks that the HD and SQ tags are identical across the BAM files
+
 
* Adds @RG headers from a tabular input file containing the fields' info
+
mergeBam merges multiple sorted SAM/BAM files into one BAM file like 'samtools merge' command, but merges BAM headers.
* Adds RG:Z:[RGID] tag for each record based on the source BAM file
+
* Checks that the non RG header fields are identical across the BAM files
* Ensures that the headers are identical across the input files and that input/output BAM records are sorted
+
* Checks that the input SAM/BAM records are sorted
 +
* If --list option is used:
 +
** Ensures that the headers are identical across the input files
 +
** Adds @RG headers from a tabular input file containing the fields' info
 +
** Adds RG:Z:[RGID] tag for each record based on the source BAM file
 +
* If --in is used:
 +
** Merges the RG headers from the files, checking that they RG IDs are unique or if they are the same that the rest of the fields are the same
  
  
 
= Usage=
 
= Usage=
 
<pre>
 
<pre>
./bam rgMergeBam [-v] [--log logFile] --list <listFile> --out <outFile>
+
./bam mergeBam [-v] [--log logFile] [--list <listFile>|--in <inputFile> --in <inputFile>] --out <outFile>
 
</pre>
 
</pre>
  
Line 18: Line 24:
 
Required parameters :
 
Required parameters :
 
--out/-o : Output BAM file (sorted)
 
--out/-o : Output BAM file (sorted)
 +
--in/-i  : BAM file to be input, must be more than one of these options.
 +
            cannot be used with --list/-l
 
--list/-l : RGAList File. Tab-delimited list consisting of following columns (with headers):
 
--list/-l : RGAList File. Tab-delimited list consisting of following columns (with headers):
 
BAM* : Input BAM file name to be merged
 
BAM* : Input BAM file name to be merged
Line 35: Line 43:
 
</pre>
 
</pre>
  
[[Category:BamUtil|rgMergeBam]]
+
[[Category:BamUtil|mergeBam]]
 
[[Category:BAM Software]]
 
[[Category:BAM Software]]
 
[[Category:Software]]
 
[[Category:Software]]

Revision as of 23:44, 29 January 2013

Overview of the mergeBam function of bamUtil

The mergeBam option on the bamUtil executable merges multiple BAM files appending ReadGroup IDs if necessary.

As of version 1.0.7, this program was renamed from rgMergeBam to mergeBam.

mergeBam merges multiple sorted SAM/BAM files into one BAM file like 'samtools merge' command, but merges BAM headers.

  • Checks that the non RG header fields are identical across the BAM files
  • Checks that the input SAM/BAM records are sorted
  • If --list option is used:
    • Ensures that the headers are identical across the input files
    • Adds @RG headers from a tabular input file containing the fields' info
    • Adds RG:Z:[RGID] tag for each record based on the source BAM file
  • If --in is used:
    • Merges the RG headers from the files, checking that they RG IDs are unique or if they are the same that the rest of the fields are the same


Usage

./bam mergeBam [-v] [--log logFile] [--list <listFile>|--in <inputFile> --in <inputFile>] --out <outFile>

Parameters

Required parameters :
--out/-o : Output BAM file (sorted)
--in/-i  : BAM file to be input, must be more than one of these options.
            cannot be used with --list/-l
--list/-l : RGAList File. Tab-delimited list consisting of following columns (with headers):
	BAM* : Input BAM file name to be merged
	ID* : Unique read group identifier
	SM* : Sample name
	LB : Library name
	DS : Description
	PU : Platform unit
	PI : Predicted median insert size
	CN : Name of sequencing center producing the read
	DT : Date the rn was produced
	PL : Platform/technology used to produce the read
	* (Required fields)
Optional parameters : 
--log/-L : Log file
--verbose/-v : Turn on verbose mode