Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 4: Line 4:     
= Overview of the <code>indelDiscordance</code> function of <code>bamUtil</code> =
 
= Overview of the <code>indelDiscordance</code> function of <code>bamUtil</code> =
The <code>indelDiscordance</code> option on the [[bamUtil]] looks at discordance at sites on the male X chromosome.
+
The <code>indelDiscordance</code> option on the [[bamUtil]] looks at insertion/deletion discordance.
 +
 
 +
By default it looks only at the non-pseudoautosomal region of the X-Chromosome.
 +
 
    
== ASSUMPTIONS/RESTRICTIONS ==
 
== ASSUMPTIONS/RESTRICTIONS ==
 +
* Only works on one chromosome at a time
 +
* Repeats are single-base only
 +
** An unrepeated base has repeat count = 0 ('A')
 +
** A base repeated once has repeat count = 1 ('AA')
 +
* Skips reference positions with base 'N'
 +
 +
== What is a Discordance ==
 +
 +
A position is considered to have a Deletion Discordance if at least 1 read has a match/mismatch AND at least 1 read has a deletion AND there are at least the "minimum depth" reads at this position.
 +
 +
A position is considered to have an Insertion Discordance if at least 1 read has an insertion following this position AND at least 1 read does not AND there are at least the "minimum depth" reads at this position.
 +
 +
== Error Rate Algorithm ==
 +
 +
The weighted average error rate, weighted average deletion error rate, and weighted average insertion error rate are calculated for each repeat count. 
 +
 +
The error rate is weighted by the depth, so the discordant counts are
 +
 +
 +
For each Repeat Count in the File
 +
  For each Depth at this Repeat Count in the file
 +
      if (depth > MaxAllowedDepth)
 +
        skip calculating error rate for this depth
 +
      else
 +
        // Note: numDiscordant is not equivalent to numDeleteDiscordant + numInsertDiscordant since a position could have both types of discordants
 +
        numDiscordant = number of discordant positions with this repeat count and depth
 +
        numDeleteDiscordant = number of deletion discordant positions with this repeat count and depth
 +
        numInsertDiscordant = number of insertion discordant positions with this repeat count and depth
 +
        count = number of positions with this repeat count and depth
 +
 +
        <math>errorRate = 1 - (\tfrac{numDiscordant}{count})^{({\tfrac{1}{depth}})}</math>
 +
        sumErrorRates += errorRate * count * (depth-1)
 +
        numErrorRates += count * (depth-1)
 +
  Repeat Count Weighted Error Rate = sumErrorRates/numErrorRates
       
= Usage =
 
= Usage =
./bam indelDiscordance --in <inputFile> [--bamIndex <bamIndexFile] [--refFile <filename>] [--umRef] [--depth minDepth] [--minRepeatLen len] [--sumRepeatLen len] [--printPos] [--chrom <name>] [--start 0basedPos] [--end 0basedPos] [--noeof] [--params]
+
./bam indelDiscordance --in <inputFile> [--bamIndex <bamIndexFile] [--refFile <filename>] [--umRef] [--depth minDepth] [--minRepeatLen len] [--sumRepeatLen len] [--printPos] [--chrom <name>] [--start 0basedPos] [--end 0basedPos] [--noeof] [--params]
    
= Parameters =
 
= Parameters =
Line 21: Line 58:  
--refFile      : reference file for determining repeat counts
 
--refFile      : reference file for determining repeat counts
 
--umRef        : use the reference at the default UofM location,                 /data/local/ref/karma.ref/human.g1k.v37.umfa
 
--umRef        : use the reference at the default UofM location,                 /data/local/ref/karma.ref/human.g1k.v37.umfa
--depth        : min depth at which to report indel discordance, DEFAULT >= 3
+
--depth        : min depth at which to report indel discordance, DEFAULT >= 2
 
--minRepeatLen : min repeat length for printing repeat info, DEFAULT = 1
 
--minRepeatLen : min repeat length for printing repeat info, DEFAULT = 1
 
--sumRepeatLen : all repeats this length and longer will be accumulated,
 
--sumRepeatLen : all repeats this length and longer will be accumulated,
 
                DEFAULT = 5
 
                DEFAULT = 5
 +
--avgDepthMult : max depth used is the average depth * this multiplier,
 +
                DEFAULT = 3
 
--printPos    : print details for each position
 
--printPos    : print details for each position
 +
--printCounts  : print counts of occurrances of each repeat count and of discordant cigars for each repeat count
 +
--sample      : output the specified sample name as part of the error rate/depth table
 +
--gender      : output the specified gender as part of the error rate/depth table
 
--chrom        : chromosome name other than X
 
--chrom        : chromosome name other than X
 
--start        : use a 0-based inclusive start position other than the default, 2699520
 
--start        : use a 0-based inclusive start position other than the default, 2699520

Navigation menu