Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,583 bytes added ,  13:41, 4 May 2012
Created page with 'gapInfo Category:BAM Software Category:Software = Overview of the <code>gapInfo</code> function of <code>bamUtil</code> = The <code>gapInfo</code> o…'
[[Category:BamUtil|gapInfo]]
[[Category:BAM Software]]
[[Category:Software]]

= Overview of the <code>gapInfo</code> function of <code>bamUtil</code> =
The <code>gapInfo</code> option on the [[bamUtil]] prints information on the gap between read pairs in a SAM/BAM File.

There are two ways to run: standard/detailed. To run as detailed, use the <code>--detailed</code> option.

Standard output prints the number of pairs that have a given gap size.

The gap size is calculated by counting the number of bases between the clipped end of the first read and the clipped start of the 2nd read. (mate0BasedClippedStart - 0BasedPositionClippedEnd - 1)

The gap size will be negative if the reads overlap.


= Rules =
gapInfo skips any records that are marked in the flag as:
* unmapped
* not paired
* mate is unmapped
* secondary alignment (not primary alignment)
* duplicates
* QC Failure
* mate is on a different chromosome
* chromosome is unknown (-1/*)
* mate starts before this record
* mate starts at the same location as this record & this record is the reverse strand
* reverse strands (unless <code>--detailed</code> is specified)

When <code>--refFile</code> and <code>--detailed</code> is not specified gaps that contain reference base 'N' are skipped.

= Usage =
./bam gapInfo --in <inputFile> --out <outputFile> [--noeof] [--params]


= Parameters =
<pre>
Required Parameters:
--in : the SAM/BAM file to print read pair gap info for
--out : the output file to be written
Optional Parameters:
--refFile : reference file, used to skip gaps that include reference base 'N' (for runs without --detailed)
--detailed : Print the details for each read pair
Optional Parameters for the Detailed Option:
--checkFirst : Check the first in pair flag and print "NotFirst" if it isn't first
--checkStrand : Check the strand flag and print "Reverse" if it is reverse complimented
--noeof : Do not expect an EOF block on a bam file.
--params : Print the parameter settings to stderr

</pre>


{{inBAMInputFile}}

== Output File (<code>--out</code>) ==
Use <code>--out</code> followed by a file name to specify the output file to write.

The Standard Output prints a 2-column (separated by tabs) line for each gapSize found in the SAM/BAM file. The first column contains the gap size and the 2nd column contains the number of pairs that have that gap size. The first line is a header line describing the columns.

Detailed output does not have a header line and is described below under the [[#Print Detailed Per-Pair Information (<code>--detailed</code>) | --detailed]] parameter.

{{RefFile}}

With this option specified, do not increment counters for the number of times that a gap is found if any of the reference bases in the gap are an 'N'. (N/A if <code>--detailed</code> is specified.)

== Print Detailed Per-Pair Information (<code>--detailed</code>) ==
With this option, for every record processed per the above rules, the following information is printed on a line as tab separated columns:
* Reference/Chromosome Name
* 1-based read end position (clipped)
* gap size
Additional columns if <code>--checkFirst</code> and/or <code>--checkStrand</code> are specified.

Detailed output does not have a header line.

== (<code>--checkFirst</code>) ==
Only applicable if <code>--detailed</code> is also provided.

When specified along with <code>--detailed</code>, the output for each record processed also includes "NotFirst" if it is not marked as FirstFragment in the flags.

== (<code>--checkStrand</code>) ==
Only applicable if <code>--detailed</code> is also provided.

When specified along with <code>--detailed</code>, the output for each record processed also includes "Reverse" if it is marked as the reverse strand in the flags.

{{noeofBGZFParameter}}
{{paramsParameter}}

= Return Value =

Returns -1 if input parameters are invalid.

Returns the SamStatus for the reads/writes (0 on success).


= Output =

All status messages are written to stderr.

Tab-delimited columns as described above.

== Example Output ==
For standard output:
<pre>
GapSize NumPairs
-23 3
-21 3
-20 4
-5 1
30 1
70 3
</pre>

For detailed output with both <code>--checkFirst</code> & <code>--checkStrand</code> specified:
<pre>
1 28 70
1 10028 71 NotFirst Reverse
1 10028 70
1 10028 70
1 10028 30
1 10028 -19 NotFirst Reverse
1 10028 -19 NotFirst Reverse
1 10028 -19 NotFirst Reverse
1 10030 -21
1 10030 -20
1 10030 -20
1 10030 -21
1 10030 -21
1 10030 -20
1 10030 -20
2 32 -18 NotFirst Reverse
4 24 -23
4 27 -23
4 30 -23
4 34 -5
</pre>

Navigation menu