Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 240: Line 240:     
==== Looking at final INDEL VCF ====
 
==== Looking at final INDEL VCF ====
Check the number of passing INDEL variants:
  −
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS"
  −
Gives something like:
  −
      no. Indels                        :    570661
  −
          2 alleles (ins/del)            :          570661 (0.87) [265448/305213]
  −
          >=3 alleles (ins/del)          :              0 (-nan) [0/0]
     −
Check the number of passing INDEL's with allele count > 0:
+
Note that because this is a single sample calling, many of the INFO fields are less meaningful as many of the values like HWE p values, allele frequencies, inbreeding coefficient are a function of a population.
 +
Nonetheless, we may examine the results.  First, we see how many indels were discovered for your genome:
 +
 
 +
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz
 +
      no. Indels                        :    588566
 +
          2 alleles (ins/del)            :          588566 (0.87) [273261/315305]
 +
 
 +
This gives use 588,566 indels with an insertion deletion ratio of 0.87.
 +
 
 +
We next look at the filtered set. The PASS filter extracts all non overlapping variants and the INFO.AC!=0 extracts all indels that are either heterozygous or homozygous alternative.
 +
Some indels that were originally discovered were found to be the homozygous reference genotype.  Invariably, these are relative high depth calls where the
 +
alternative allele is discovered less or is mis-specified.
 +
 
 
  $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC!=0"
 
  $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC!=0"
Gives something like:
+
 
 
       no. Indels                        :    549963
 
       no. Indels                        :    549963
 
           2 alleles (ins/del)            :          549963 (0.91) [261480/288483]
 
           2 alleles (ins/del)            :          549963 (0.91) [261480/288483]
          >=3 alleles (ins/del)          :              0 (-nan) [0/0]
  −
Some INDELs had allele count 0.
  −
  −
  −
Check the number of passing INDEL's with allele count 2:
  −
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC==2"
  −
Gives something like:
  −
      no. Indels                        :    216134
  −
          2 alleles (ins/del)            :          216134 (1.17) [116511/99623]
  −
          >=3 alleles (ins/del)          :              0 (-nan) [0/0]
      +
About 38K indels were removed, the insertion deletion ratio increases to 0.91.  Note that in general, for high depth data, discovered indels are reported with insertion deletion ratios
 +
close to 1. So this is a good sign.  Next generation sequencing errors are bias for deletions.
   −
Check the number of passing INDEL's with allele balance > 0.5:
+
It is possible to perform a slightly more stringent filtering using allele balance.  The allele balance estimator in this case is meaningful still for an individual because it is a function of read depth.
  $GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB>0.5"
+
Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias.
Gives something like:
  −
      no. Indels                        :    132878
  −
          2 alleles (ins/del)            :          132878 (0.68) [53714/79164]
  −
          >=3 alleles (ins/del)          :              0 (-nan) [0/0]
      +
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB<0.7&&INFO.AB>0.3"
 +
      no. Indels                        :    490965
 +
          2 alleles (ins/del)            :          490965 (0.92) [235254/255711]
   −
Check the number of passing INDEL's with allele balance < 0.5:
+
The insertion deletion ratio increases from 0.91 to 0.92.
$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB<0.5"
  −
Gives something like:
  −
      no. Indels                        :    169198
  −
          2 alleles (ins/del)            :          169198 (0.89) [79504/89694]
  −
          >=3 alleles (ins/del)          :              0 (-nan) [0/0]
      
</div>
 
</div>
 
</div>
 
</div>
1,102

edits

Navigation menu