Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,856 bytes added ,  17:00, 6 January 2014
Line 11: Line 11:     
By default, just the chromosome/position and cigar are compared for each record.
 
By default, just the chromosome/position and cigar are compared for each record.
 +
 +
Note: The headers are not compared.
    
Options are available to compare:
 
Options are available to compare:
Line 24: Line 26:  
* turn off position comparison
 
* turn off position comparison
 
* turn off cigar comparison
 
* turn off cigar comparison
 +
 +
 +
= Usage =
 +
./bam diff --in1 <inputFile> --in2 <inputFile> [--out <outputFile>] [--all] [--flag] [--mapQual] [--mate] [--isize] [--seq] [--baseQual] [--tags <Tag:Type[;Tag:Type]*>] [--everyTag] [--noCigar] [--noPos] [--onlyDiffs] [--recPoolSize <int>] [--posDiff <int>] [--noeof] [--params]
 +
    
= Parameters =
 
= Parameters =
Line 49: Line 56:  
--onlyDiffs  : only print the fields that are different, otherwise for any diff all the fields that are compared are printed.
 
--onlyDiffs  : only print the fields that are different, otherwise for any diff all the fields that are compared are printed.
 
--recPoolSize : number of records to allow to be stored at a time, default value: 1000000
 
--recPoolSize : number of records to allow to be stored at a time, default value: 1000000
--posDiff    : max base pair difference between possibly matching records100000
+
                Set to -1 for unlimited number of records
 +
--posDiff    : max base pair difference between possibly matching records, default value: 100000
 
--noeof      : do not expect an EOF block on a bam file.
 
--noeof      : do not expect an EOF block on a bam file.
 
--params      : print the parameter settings
 
--params      : print the parameter settings
 
</pre>
 
</pre>
 +
{{PhoneHomeParamDesc}}
 +
 +
== Required Parameters ==
 +
 +
=== input Files 1 & 2 (<code>--in1</code> and <code>--in2</code>)  ===
   −
= Usage =
+
Use <code>--in1</code> and <code>--in2</code> followed by your file names to specify the SAM/BAM input files to compare.  They are both required.
./bam diff --in1 <inputFile> --in2 <inputFile> [--out <outputFile>] [--all] [--flag] [--mapQual] [--mate] [--isize] [--seq] [--baseQual] [--tags <Tag:Type[;Tag:Type]*>] [--everyTag] [--noCigar] [--noPos] [--onlyDiffs] [--recPoolSize <int>] [--posDiff <int>] [--noeof] [--params]
+
 
 +
The program automatically determines if your input files are SAM/BAM/uncompressed BAM unless your input file is stdin.
 +
 
 +
A <code>-</code> is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
 +
 
 +
{|border="1" cellspacing="0" cellpadding="2"
 +
|SAM/BAM/Uncompressed BAM from file
 +
| <code>--in1 yourFileName</code>
 +
|-
 +
|SAM from stdin
 +
| <code>--in1 -</code>
 +
|-
 +
|BAM from stdin
 +
| <code>--in1 -.bam</code>
 +
|-
 +
|Uncompressed BAM from stdin
 +
| <code>--in1 -.ubam</code>
 +
|}
 +
 
 +
 
 +
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file).  This matches the <code>samtools</code> implementation so pipes between our tools and <code>samtools</code> are supported.
 +
 
 +
== Optional Parmaeters ==
 +
=== output File (<code>--out</code>)  ===
 +
Use <code>--out</code> (optional) to specify the name of the output file.
 +
 
 +
It is output in [[Diff Format]] by default.  Specify the filename with a .bam, .sam, .ubam extension to output in [[SAM/BAM Format]].
 +
 
 +
=== Fields to Diff (<code>--all</code>, <code>--flag</code>, <code>--mapQual</code>, <code>--mate</code>, <code>--isize</code>, <code>--seq</code>, <code>--baseQual</code>, <code>--tags</code>, <code>--everyTag</code>, <code>--noCigar</code>, <code>--noPos</code>, )===
 +
 
 +
By default only the chromosome/position and cigar are compared for each record.
 +
 
 +
SAM/BAM Record fields:
 +
{|border="1" cellspacing="0" cellpadding="2"
 +
! Field Name !! Flag to Enable !! Flag to Disable
 +
|-
 +
|Read Name ||colspan="2"|used to match records between files
 +
|-
 +
|Flag Fragment bit || colspan="2"|used to match records between files
 +
|-
 +
| Flag other bits || --flag ||
 +
|-
 +
| Reference (chrom) Name
 +
| rowspan="2" |  ''(on by default)'' || rowspan="2" |--noPos
 +
|-
 +
| Position
 +
|-
 +
|Mapping Quality || --mapQual ||
 +
|-
 +
| Cigar || ''(on by default)'' || --noCigar
 +
|-
 +
| Mate Reference (chrom) || rowspan="2" | --mate ||
 +
|-
 +
| Mate Position ||
 +
|-
 +
| Insert Size || --isize ||
 +
|-
 +
| Sequence || --seq ||
 +
|-
 +
| Quality || --baseQual ||
 +
|-
 +
|}
 +
 
 +
To diff all Tags, use <code>--everyTag</code>.  To diff only certain tags, use <code>--tags Tag1:Type1;Tag2:Type2;Tag3:Type3</code> specifying a semicolon separated list of tag/type pairs (separated by a colon).
 +
 
 +
'''OR use <code>--all</code> to diff all SAM/BAM record fields.
 +
 
 +
=== Only print different fields (<code>--onlyDiffs</code>)===
 +
 
 +
Specify <code>--onlyDiffs</code> to only print the fields that are different, otherwise for any diff all the fields that are compared are printed.  The read name is always printed.
 +
 
 +
=== Maximum Number of Records That Can be Allocated (<code>--recPoolSize</code>)===
 +
When comparing the files, matching reads may not have the same positions and thus may not be at the same location in the files.  In this case, reads need to be stored until its match is found in the other file.
 +
 
 +
<code>--recPoolSize</code> is used to specify the number of records allowed to be allocated at one time by the program.  Set it to -1 to allow unlimited records.  Note: If the number of allocated records is large, it will use up a large amount of memory.
 +
 
 +
The default pool size is 1000000.
 +
 
 +
Records are released when the match is found in the other file or when the opposite file is [[Maximum Base Pair Difference Between Possibly Matching Records (<code>--posDiff</code>)|--posDiff]] number of positions past the position in the record.
 +
 
 +
When the Pool Size is exceeded, the oldest record in the file that has more records stored is released and treated as unique to that file.  If the matching record is later found in the other file, it will also be treated as unique to its file.  At the end of the run, a warning message is printed with the number of times the PoolSize was hit and records were forced to be released.
 +
 
 +
=== Maximum Base Pair Difference Between Possibly Matching Records (<code>--posDiff</code>)===
 +
In order to limit th number of records that are held onto while looking for matching records, a maximum difference in position between the matches is used.  This value is defaulted to 100000 amd cam be modified using <code>--posDiff</code>.  Any matching pairs that are further than <code>--posDiff</code> are treated as unique to their files.
 +
 
 +
Note: No warning message is printed about <code>--posDiff</code> affecting your output since the software doesn't know if the matching records don't exist or are just further away.
 +
 
 +
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
 +
 
 +
{{PhoneHomeParameters}}
    
= Return Value =
 
= Return Value =
Line 132: Line 235:  
</pre>
 
</pre>
   −
== SAM/Bam Format ==
+
== SAM/BAM Format ==
 
use .sam/.bam extension to output in SAM/BAM format instead of diff format.
 
use .sam/.bam extension to output in SAM/BAM format instead of diff format.
  

Navigation menu