Difference between revisions of "MutationFilter"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 80: Line 80:
  
 
  #CHR    START  END    REF    MUT    FILTER  nRef    nMut    nOA    MP:MM:RP:RM    nINS    nDEL    nClip_mut      Cycle_mut      SB      nIP    nIP_mut nLMQ    nLMQ_mut        nLBQ    nLBQ_mut
 
  #CHR    START  END    REF    MUT    FILTER  nRef    nMut    nOA    MP:MM:RP:RM    nINS    nDEL    nClip_mut      Cycle_mut      SB      nIP    nIP_mut nLMQ    nLMQ_mut        nLBQ    nLBQ_mut
  11      190324  190324  A      C      BQM    0      5      0      2:3:0:0 0      1      0      20      .      0      0      0      0      5      5
+
  11      190324  190324  A      C      INDEL,AB,BQM    0      5      0      2:3:0:0 0      1      0      20      .      0      0      0      0      5      5
 +
 
 +
The first 5 columns are the same as the input BED-like file. The following columns are various statistics based on which various filtering can be achieved. The meaning of each column is as follows
 +
FILTER: indicates the filtering criteria a candidate failed, separated by commas
 +
nRef: # of REF alleles
 +
nMut: # of MUT alleles
 +
nOA: # of other alleles different from REF and MUT alleles.
 +
MP:MM:RP:RM: #Mut on plus strand : #Mut on minus strand : #Ref on plus strand : #Ref on minus strand
 +
nINS: # of reads with insertions in a window of sized specified by --indel_winsize
 +
nDEL: # of reads with deletions in a window as nINS
 +
nClip_mut: # of reads with clipping from the head of reads
 +
Cycle_mut: the median distance of the mutation allele to the nearest end of a read
 +
SB: strand bias odds ratio, which can be derived from MP:MM:RP:RM
 +
nIP: # of improperly paired reads
 +
nIP_mut: # of improperly paired reads carrying the mutant allele
 +
nLMQ: # of reads with low mapping quality
 +
nLMQ_mut: # of low mapping quality reads that carry the mutant allele
 +
nLBQ: # of low quality bases
 +
nLBQ_mut: # of low quality mutant bases.

Revision as of 14:28, 1 April 2013

Introduction

  • The tool mutfilter generates various diagnosis statistics based on sequence alignment and filters alignment artifacts based on user-provided criteria.
  • It takes as input a SAM/BAM file (through --bam) and a BED-like file (through --bed) and generates output on screen.
  • Additional filtering options can be provided and mutfilter will generate filtering flags for each input filtering option. See details below.

Usage

  • Typing mutfiler without any other options will display the following message


The following parameters are available.  Ones with "[]" are in effect:
             Input Files : --bam [], --bed []
         Cycle Bias (CB) : --mut_median_cycles2ends [-1]
        Strand Bias (SB) : --SB_OR [-1.0e+00]
   Nearby Indels (INDEL) : --indel_winsize [30], --indel_cnt [-1],
                           --indel_pct [-1.0e+00]
          Head Clip (HC) : --mut_clip_cnt [-1], --mut_clip_pct [-1.0e+00]
      Other Alleles (OA) : --other_allele_cnt [-1],
                           --other_allele_pct [-1.0e+00]
    Allelic Balance (AB) : --mut_base_cnt [-1], --mut_base_pct [-1.0e+00]
   Low Map Quality (LMQ) : --low_mapq_cutoff [-1.0e+00], --low_mapq_cnt [-1],
                           --low_mapq_pct [-1.0e+00], --mut_low_mapq_cnt [-1],
                           --mut_low_mapq_pct [-1.0e+00]
  Low Base Quality (LBQ) : --low_baseq_cutoff [-1.0e+00],
                           --low_baseq_cnt [-1], --low_baseq_pct [-1.0e+00],
                           --mut_low_baseq_cnt [-1],
                           --mut_low_baseq_pct [-1.0e+00]
    Improper Paired (IP) : --improper_paried_cnt [-1],
                           --improper_paried_pct [-1.0e+00],
                           --mut_improper_paried_cnt [-1],
                           --mut_improper_paried_pct [-1.0e+00]
NOTE: When parameters are negative these filters are NOT in effect!
     When filters have 'mut_' the filters are for the statistis calculated for MUTANT alleles only!

Some examples

1. Filter based on nearby Indels (INDEL): filtered if there are >=3 reads with Indels in a window if 20bp up- and down-stream of the mutation candidate
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 
2. Filter based on cycle bias (CB): filtered if the median distance to the nearest end of the mutant allele is >=5
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5
3. Filter based on read clipping (CL): filter if percentage of the reads carrying the mutant allele have clipping from the head of the read is >=20%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5 --mut_clip_pct 20
4. Filter based on Allelic Balance (AB): filtered if the percentage of the reads carrying the mutant allele is <30%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5 --mut_clip_pct 20 --mut_base_pct 30
5. Filter based on low Map Quality (LMQ): filtered if the percentage of reads with low Map quality (defined as map quality below 10 vis --low_mapq_cutoff) is >=15%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5 --low_mapq_cutoff 10 --low_mapq_pct 15
6. Filter based on low Map Quality (LMQ): filtered if the percentage of reads carrying the mutant allele with low Map quality (defined as map quality below 10 via --low_mapq_cutoff) is >=15%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5 --low_mapq_cutoff 10 --mut_low_mapq_pct 10
7. Filter based on low Base Quality (LBQ): filtered if the percentage of reads carrying the mutant allele with low Base Quality (defined as base quality below 30 via --low_baseq_cutoff) is >=50%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5  --low_baseq_cutoff 30 --mut_low_baseq_pct 50
8. Filter based on improperly pairing (IP): filtered if the percentage of improperly paired reads is >=10%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5  --improper_paired_pct 10
9. Filter based on improperly pairing (IP): filtered if the percentage of improperly paired reads carrying the mutant allele is >=5%
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5  --mut_improper_paired_pct 5
10. Filter based on strand bias (SB): filtered if the odds ratio of the strand bias is >=2
mutfilter --bam in.bam  --bed in.bed --indel_winsize 20 --indel_cnt 3 --mut_median_cycles2ends 5  --SB_OR 2
You can create any combination of the filtering and mutfilter will report in the FILTER column which criteria the mutation candidate failed
  • NOTE1: The filtering criteria will only change the FILTER column which indicates which criteria a candidate fails. The output statistics are not affected by these criteria
  • NOTE2: Most of the filtering can be achieved based on the output statistics and post-hoc filtering can be tuned to achieve desired filtering based on the output statistics

Input

  • SAM/BAM file
  • BED-like file: It requires 5 columns as CHR, START, END, REF, MUTANT in that order. It also requires that START=END here since it deal with single sites
1    10000    10000      A        C
1    20000    20000      G        T
2    15000    15000      A        G

Output

#CHR    START   END     REF     MUT     FILTER  nRef    nMut    nOA     MP:MM:RP:RM     nINS    nDEL    nClip_mut       Cycle_mut       SB      nIP     nIP_mut nLMQ    nLMQ_mut        nLBQ    nLBQ_mut
11      190324  190324  A       C       INDEL,AB,BQM     0       5       0       2:3:0:0 0       1       0       20      .       0       0       0       0       5       5

The first 5 columns are the same as the input BED-like file. The following columns are various statistics based on which various filtering can be achieved. The meaning of each column is as follows FILTER: indicates the filtering criteria a candidate failed, separated by commas nRef: # of REF alleles nMut: # of MUT alleles nOA: # of other alleles different from REF and MUT alleles. MP:MM:RP:RM: #Mut on plus strand : #Mut on minus strand : #Ref on plus strand : #Ref on minus strand nINS: # of reads with insertions in a window of sized specified by --indel_winsize nDEL: # of reads with deletions in a window as nINS nClip_mut: # of reads with clipping from the head of reads Cycle_mut: the median distance of the mutation allele to the nearest end of a read SB: strand bias odds ratio, which can be derived from MP:MM:RP:RM nIP: # of improperly paired reads nIP_mut: # of improperly paired reads carrying the mutant allele nLMQ: # of reads with low mapping quality nLMQ_mut: # of low mapping quality reads that carry the mutant allele nLBQ: # of low quality bases nLBQ_mut: # of low quality mutant bases.