Overview of the
indelDiscordance function of
indelDiscordance option on the bamUtil looks at insertion/deletion discordance.
By default it looks only at the non-pseudoautosomal region of the X-Chromosome.
- Only works on one chromosome at a time
- Repeats are single-base only
- An unrepeated base has repeat count = 0 ('A')
- A base repeated once has repeat count = 1 ('AA')
- Skips reference positions with base 'N'
What is a Discordance
A position is considered to have a Deletion Discordance if at least 1 read has a match/mismatch AND at least 1 read has a deletion AND there are at least the "minimum depth" reads at this position.
A position is considered to have an Insertion Discordance if at least 1 read has an insertion following this position AND at least 1 read does not AND there are at least the "minimum depth" reads at this position.
Error Rate Algorithm
The weighted average error rate, weighted average deletion error rate, and weighted average insertion error rate are calculated for each repeat count.
The error rate is weighted by the depth, so the discordant counts are
For each Repeat Count in the File For each Depth at this Repeat Count in the file if (depth > MaxAllowedDepth) skip calculating error rate for this depth else // Note: numDiscordant is not equivalent to numDeleteDiscordant + numInsertDiscordant since a position could have both types of discordants numDiscordant = number of discordant positions with this repeat count and depth numDeleteDiscordant = number of deletion discordant positions with this repeat count and depth numInsertDiscordant = number of insertion discordant positions with this repeat count and depth count = number of positions with this repeat count and depth sumErrorRates += errorRate * count * (depth-1) numErrorRates += count * (depth-1) Repeat Count Weighted Error Rate = sumErrorRates/numErrorRates
./bam indelDiscordance --in <inputFile> [--bamIndex <bamIndexFile] [--refFile <filename>] [--umRef] [--depth minDepth] [--minRepeatLen len] [--sumRepeatLen len] [--printPos] [--chrom <name>] [--start 0basedPos] [--end 0basedPos] [--noeof] [--params]
Required Parameters: --in : the SAM/BAM file to calculate indelDiscordance for Optional Parameters: --bamIndex : The path/name of the bam index file (if required and not specified, uses the --in value + ".bai") --refFile : reference file for determining repeat counts --umRef : use the reference at the default UofM location, /data/local/ref/karma.ref/human.g1k.v37.umfa --depth : min depth at which to report indel discordance, DEFAULT >= 2 --minRepeatLen : min repeat length for printing repeat info, DEFAULT = 1 --sumRepeatLen : all repeats this length and longer will be accumulated, DEFAULT = 5 --avgDepthMult : max depth used is the average depth * this multiplier, DEFAULT = 3 --printPos : print details for each position --printCounts : print counts of occurrances of each repeat count and of discordant cigars for each repeat count --sample : output the specified sample name as part of the error rate/depth table --gender : output the specified gender as part of the error rate/depth table --chrom : chromosome name other than X --start : use a 0-based inclusive start position other than the default, 2699520 --end : use a 0-based exclusive end position other than the default, 154931043 --noeof : Do not expect an EOF block on a bam file. --params : Print the parameter settings
Input File (
--in followed by your file name to specify the SAM/BAM input file.
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
- is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
|SAM/BAM/Uncompressed BAM from file|| |
|SAM from stdin|
|BAM from stdin|
|Uncompressed BAM from stdin|
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the
samtools implementation so pipes between our tools and
samtools are supported.
Bam Index File (
--bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.
If this file is required but not specified, it will use the input file name + ".bai".
Reference File (
--refFile followed by the reference file name to specify the reference sequence file.
Do not require BGZF EOF block (
--noeof if you do not expect a trailing eof block in your bgzf file.
By default, the trailing empty block is expected and checked for.
Print the Program Parameters (
--params to print the parameters for your program to stderr.
All status messages are written to stderr.