Line 4: |
Line 4: |
| | | |
| = Overview of the <code>indelDiscordance</code> function of <code>bamUtil</code> = | | = Overview of the <code>indelDiscordance</code> function of <code>bamUtil</code> = |
− | The <code>indelDiscordance</code> option on the [[bamUtil]] looks at discordance at sites on the male X chromosome. | + | The <code>indelDiscordance</code> option on the [[bamUtil]] looks at insertion/deletion discordance. |
| + | |
| + | By default it looks only at the non-pseudoautosomal region of the X-Chromosome. |
| + | |
| | | |
| == ASSUMPTIONS/RESTRICTIONS == | | == ASSUMPTIONS/RESTRICTIONS == |
| + | * Only works on one chromosome at a time |
| + | * Repeats are single-base only |
| + | ** An unrepeated base has repeat count = 0 ('A') |
| + | ** A base repeated once has repeat count = 1 ('AA') |
| + | * Skips reference positions with base 'N' |
| + | |
| + | == What is a Discordance == |
| + | |
| + | A position is considered to have a Deletion Discordance if at least 1 read has a match/mismatch AND at least 1 read has a deletion AND there are at least the "minimum depth" reads at this position. |
| + | |
| + | A position is considered to have an Insertion Discordance if at least 1 read has an insertion following this position AND at least 1 read does not AND there are at least the "minimum depth" reads at this position. |
| + | |
| + | == Error Rate Algorithm == |
| + | |
| + | The weighted average error rate, weighted average deletion error rate, and weighted average insertion error rate are calculated for each repeat count. |
| + | |
| + | The error rate is weighted by the depth, so the discordant counts are |
| + | |
| + | |
| + | For each Repeat Count in the File |
| + | For each Depth at this Repeat Count in the file |
| + | if (depth > MaxAllowedDepth) |
| + | skip calculating error rate for this depth |
| + | else |
| + | // Note: numDiscordant is not equivalent to numDeleteDiscordant + numInsertDiscordant since a position could have both types of discordants |
| + | numDiscordant = number of discordant positions with this repeat count and depth |
| + | numDeleteDiscordant = number of deletion discordant positions with this repeat count and depth |
| + | numInsertDiscordant = number of insertion discordant positions with this repeat count and depth |
| + | count = number of positions with this repeat count and depth |
| + | |
| + | <math>errorRate = 1 - (\tfrac{numDiscordant}{count})^{({\tfrac{1}{depth}})}</math> |
| + | sumErrorRates += errorRate * count * (depth-1) |
| + | numErrorRates += count * (depth-1) |
| + | Repeat Count Weighted Error Rate = sumErrorRates/numErrorRates |
| | | |
| | | |
| = Usage = | | = Usage = |
− | ./bam indelDiscordance --in <inputFile> [--bamIndex <bamIndexFile] [--refFile <filename>] [--umRef] [--depth minDepth] [--minRepeatLen len] [--sumRepeatLen len] [--printPos] [--chrom <name>] [--start 0basedPos] [--end 0basedPos] [--noeof] [--params]
| + | ./bam indelDiscordance --in <inputFile> [--bamIndex <bamIndexFile] [--refFile <filename>] [--umRef] [--depth minDepth] [--minRepeatLen len] [--sumRepeatLen len] [--printPos] [--chrom <name>] [--start 0basedPos] [--end 0basedPos] [--noeof] [--params] |
| | | |
| = Parameters = | | = Parameters = |
Line 21: |
Line 58: |
| --refFile : reference file for determining repeat counts | | --refFile : reference file for determining repeat counts |
| --umRef : use the reference at the default UofM location, /data/local/ref/karma.ref/human.g1k.v37.umfa | | --umRef : use the reference at the default UofM location, /data/local/ref/karma.ref/human.g1k.v37.umfa |
− | --depth : min depth at which to report indel discordance, DEFAULT >= 3 | + | --depth : min depth at which to report indel discordance, DEFAULT >= 2 |
| --minRepeatLen : min repeat length for printing repeat info, DEFAULT = 1 | | --minRepeatLen : min repeat length for printing repeat info, DEFAULT = 1 |
| --sumRepeatLen : all repeats this length and longer will be accumulated, | | --sumRepeatLen : all repeats this length and longer will be accumulated, |
| DEFAULT = 5 | | DEFAULT = 5 |
| + | --avgDepthMult : max depth used is the average depth * this multiplier, |
| + | DEFAULT = 3 |
| --printPos : print details for each position | | --printPos : print details for each position |
| + | --printCounts : print counts of occurrances of each repeat count and of discordant cigars for each repeat count |
| + | --sample : output the specified sample name as part of the error rate/depth table |
| + | --gender : output the specified gender as part of the error rate/depth table |
| --chrom : chromosome name other than X | | --chrom : chromosome name other than X |
| --start : use a 0-based inclusive start position other than the default, 2699520 | | --start : use a 0-based inclusive start position other than the default, 2699520 |