BamUtil: gapInfo
Overview of the gapInfo
function of bamUtil
The gapInfo
option on the bamUtil prints information on the gap between read pairs in a SAM/BAM File.
There are two ways to run: standard/detailed. To run as detailed, use the --detailed
option.
Standard output prints the number of pairs that have a given gap size.
The gap size is calculated by counting the number of bases between the clipped end of the first read and the clipped start of the 2nd read. (mate0BasedClippedStart - 0BasedPositionClippedEnd - 1)
The gap size will be negative if the reads overlap.
Rules
gapInfo skips any records that are marked in the flag as:
- unmapped
- not paired
- mate is unmapped
- secondary alignment (not primary alignment)
- supplementary alignment
- duplicates
- QC Failure
- mate is on a different chromosome
- chromosome is unknown (-1/*)
- mate starts before this record
- mate starts at the same location as this record & this record is the reverse strand
- reverse strands (unless
--detailed
is specified)
When --refFile
and --detailed
is not specified gaps that contain reference base 'N' are skipped.
Usage
./bam gapInfo --in <inputFile> --out <outputFile> [--noeof] [--params]
Parameters
Required Parameters: --in : the SAM/BAM file to print read pair gap info for --out : the output file to be written Optional Parameters: --refFile : reference file, used to skip gaps that include reference base 'N' (for runs without --detailed) --detailed : Print the details for each read pair Optional Parameters for the Detailed Option: --checkFirst : Check the first in pair flag and print "NotFirst" if it isn't first --checkStrand : Check the strand flag and print "Reverse" if it is reverse complimented --noeof : Do not expect an EOF block on a bam file. --params : Print the parameter settings to stderr
PhoneHome: --noPhoneHome : disable PhoneHome (default enabled) --phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)
Required Parameters
Input File (--in
)
Use --in
followed by your file name to specify the SAM/BAM input file.
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
A -
is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
SAM/BAM/Uncompressed BAM from file | --in yourFileName
|
SAM from stdin | --in - |
BAM from stdin | --in -.bam |
Uncompressed BAM from stdin | --in -.ubam |
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output File (--out
)
Use --out
followed by a file name to specify the output file to write.
The Standard Output prints a 2-column (separated by tabs) line for each gapSize found in the SAM/BAM file. The first column contains the gap size and the 2nd column contains the number of pairs that have that gap size. The first line is a header line describing the columns.
Detailed output does not have a header line and is described below under the [[#Print Detailed Per-Pair Information (--detailed
) | --detailed]] parameter.
Optional Prameters
Reference File (--refFile
)
Use --refFile
followed by the reference file name to specify the reference sequence file.
With this option specified, do not increment counters for the number of times that a gap is found if any of the reference bases in the gap are an 'N'. (N/A if --detailed
is specified.)
Print Detailed Per-Pair Information (--detailed
)
With this option, for every record processed per the above rules, the following information is printed on a line as tab separated columns:
- Reference/Chromosome Name
- 1-based read end position (clipped)
- gap size
Additional columns if --checkFirst
and/or --checkStrand
are specified.
Detailed output does not have a header line.
See Optional Parameters for --detailed
for additional options related to --detailed
.
Do not require BGZF EOF block (--noeof
)
Use --noeof
if you do not expect a trailing eof block in your bgzf file.
By default, the trailing empty block is expected and checked for.
Print the Program Parameters (--params
)
Use --params
to print the parameters for your program to stderr.
Optional Parameters for --detailed
Check First (--checkFirst
)
Only applicable if --detailed
is also provided.
When specified along with --detailed
, the output for each record processed also includes "NotFirst" if it is not marked as FirstFragment in the flags.
Check Strand (--checkStrand
)
Only applicable if --detailed
is also provided.
When specified along with --detailed
, the output for each record processed also includes "Reverse" if it is marked as the reverse strand in the flags.
PhoneHome Parameters
See PhoneHome for more information on how PhoneHome works and what it does.
Turn off PhoneHome (--noPhoneHome
)
Use the --noPhoneHome
option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.
Adjust the Frequency of PhoneHome (--phoneHomeThinning
)
Use --phoneHomeThinning
to modify the percentage of the time that PhoneHome will run (0-100).
- By default,
--phoneHomeThinning
is set to 50, running 50% of the time. - PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
- N/A if
--noPhoneHome
is set.
Return Value
Returns -1 if input parameters are invalid.
Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).
Output
All status messages are written to stderr.
Tab-delimited columns as described above.
Example Output
For standard output:
GapSize NumPairs -23 3 -21 3 -20 4 -5 1 30 1 70 3
For detailed output with both --checkFirst
& --checkStrand
specified:
1 28 70 1 10028 71 NotFirst Reverse 1 10028 70 1 10028 70 1 10028 30 1 10028 -19 NotFirst Reverse 1 10028 -19 NotFirst Reverse 1 10028 -19 NotFirst Reverse 1 10030 -21 1 10030 -20 1 10030 -20 1 10030 -21 1 10030 -21 1 10030 -20 1 10030 -20 2 32 -18 NotFirst Reverse 4 24 -23 4 27 -23 4 30 -23 4 34 -5