Changes

SeqShop: Calling Your Own Genome, December 2014 (view source)

Revision as of 10:36, 9 December 2014

178 bytes added , 10:36, 9 December 2014

→‎Looking at final INDEL VCF

Line 240: Line 240:

==== Looking at final INDEL VCF ====

−

~~Check the number of passing INDEL variants:~~

−

~~$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS"~~

−

~~Gives something like:~~

−

~~no. Indels : 570661~~

−

~~2 alleles (ins/del) : 570661 (0.87) [265448/305213]~~

−

~~>=3 alleles (ins/del) : 0 (-nan) [0/0]~~

−

~~Check~~ the ~~number~~ of ~~passing INDEL's~~ with allele ~~count > 0:~~

+

Note that because this is a single sample calling, many of the INFO fields are less meaningful as many of the values like HWE p values, allele frequencies, inbreeding coefficient are a function of a population.

+

Nonetheless, we may examine the results. First, we see how many indels were discovered for your genome:

+

$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz

+

no. Indels : 588566

+

2 alleles (ins/del) : 588566 (0.87) [273261/315305]

+

This gives use 588,566 indels with an insertion deletion ratio of 0.87.

+

We next look at the filtered set. The PASS filter extracts all non overlapping variants and the INFO.AC!=0 extracts all indels that are either heterozygous or homozygous alternative.

+

Some indels that were originally discovered were found to be the homozygous reference genotype. Invariably, these are relative high depth calls where the

+

alternative allele is discovered less or is mis-specified.

+

$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC!=0"

−

~~Gives something like:~~

+

no. Indels : 549963

2 alleles (ins/del) : 549963 (0.91) [261480/288483]

−

~~>=3 alleles (ins/del) : 0 (-nan) [0/0]~~

−

~~Some INDELs had allele count 0.~~

−

~~Check the number of passing INDEL's with allele count 2:~~

−

~~$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AC==2"~~

−

~~Gives something like:~~

−

~~no. Indels : 216134~~

−

~~2 alleles (ins/del) : 216134 (1.17) [116511/99623]~~

−

~~>=3 alleles (ins/del) : 0 (-nan) [0/0]~~

+

About 38K indels were removed, the insertion deletion ratio increases to 0.91. Note that in general, for high depth data, discovered indels are reported with insertion deletion ratios

+

close to 1. So this is a good sign. Next generation sequencing errors are bias for deletions.

−

~~Check the number of passing INDEL's with~~ allele balance ~~> 0~~.5:

+

It is possible to perform a slightly more stringent filtering using allele balance. The allele balance estimator in this case is meaningful still for an individual because it is a function of read depth.

−

~~$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO~~.AB>0.5"

+

Note that AB>0.5 denotes reference bias and AB<0.5 denotes alternative allele bias.

−

~~Gives something like:~~

−

no. ~~Indels : 132878~~

−

~~2 alleles (ins/del) : 132878 (0~~.~~68) [53714/79164]~~

−

~~>=3 alleles (ins/del) : 0 (-nan) [0/0]~~

+

$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB<0.7&&INFO.AB>0.3"

+

no. Indels : 490965

+

2 alleles (ins/del) : 490965 (0.92) [235254/255711]

−

~~Check the number of passing INDEL's with allele balance <~~ 0.5:

+

The insertion deletion ratio increases from 0.91 to 0.92.

−

~~$GC/bin/vt peek ~/$SAMPLE/output/indel/final/all.genotypes.vcf.gz -f "PASS&&INFO.AB<~~0.5"

−

~~Gives something like:~~

−

no. ~~Indels : 169198~~

−

~~2 alleles (ins/del) : 169198 (0.89) [79504/89694]~~

−

~~>=3 alleles (ins/del) : 0 (-nan) [0/0]~~

</div>

Atks

1,102

edits

Changes

SeqShop: Calling Your Own Genome, December 2014 (view source)

Revision as of 10:36, 9 December 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools