Difference between revisions of "Variant classification"
From Genome Analysis Wiki
Jump to navigationJump to search (Created page with "= Introduction = The Variant Call Format (VCF) is a flexible file format specification that allows us to represent many different variant types ranging from SNPs, indels to c...") |
|||
Line 7: | Line 7: | ||
= Definitions = | = Definitions = | ||
− | The normalization of a variant representation in VCF consists of two parts: parsimony and left alignment pertaining to the nature of a variant's length and position respectively. | + | The normalization of a variant representation in VCF consists of two parts: parsimony and left alignment pertaining to the nature of a variant's length and position respectively. |
+ | == Example == | ||
+ | |||
+ | Summarizes the variants in a VCF file | ||
+ | |||
+ | <div class=" mw-collapsible mw-collapsed"> | ||
+ | #summarizes the variants found in mills.vcf | ||
+ | vt peek mills.vcf | ||
+ | |||
+ | <div class="mw-collapsible-content"> | ||
+ | usage : vt peek [options] <in.vcf> | ||
+ | |||
+ | options : -o output VCF file [-] | ||
+ | -I file containing list of intervals [] | ||
+ | -i intervals [] | ||
+ | -r reference sequence fasta file [] | ||
+ | -- ignores the rest of the labeled arguments following this flag | ||
+ | -h displays help | ||
+ | </div> | ||
+ | </div> | ||
+ | |||
+ | #This is a sample output of a peek command which summarizes the variants found in a VCF file. | ||
+ | stats: no. of samples : 0 | ||
+ | no. of chromosomes : 22<br> | ||
+ | ========== Micro variants ==========<br> | ||
+ | no. of SNPs : 77228885 | ||
+ | 2 alleles (ts/tv) : 77011302 (2.11) [52287790/24723512] | ||
+ | 3 alleles (ts/tv) : 216560 (0.75) [185520/247600] | ||
+ | 4 alleles (ts/tv) : 1023 (0.50) [1023/2046]<br> | ||
+ | no. of MNPs : 0 | ||
+ | 2 alleles (ts/tv) : 0 (-nan) [0/0] | ||
+ | >=3 alleles (ts/tv) : 0 (-nan) [0/0]<br> | ||
+ | no. Indels : 2147564 | ||
+ | 2 alleles (ins/del) : 2124842 (0.47) [683250/1441592] | ||
+ | >=3 alleles (ins/del) : 22722 (2.12) [32411/15286]<br> | ||
+ | no. SNP/MNP : 0 | ||
+ | 3 alleles (ts/tv) : 0 (-nan) [0/0] | ||
+ | >=4 alleles (ts/tv) : 0 (-nan) [0/0] <br> | ||
+ | no. SNP/Indels : 12913 | ||
+ | 2 alleles (ts/tv) (ins/del) : 412 (0.41) [120/292] (3.68) [324/88] | ||
+ | >=3 alleles (ts/tv) (ins/del) : 12501 (0.43) [7670/17649] (18.64) [12434/667]<br> | ||
+ | no. MNP/Indels : 153 | ||
+ | 2 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0] | ||
+ | >=3 alleles (ts/tv) (ins/del) : 153 (0.30) [138/465] (0.27) [67/248]<br> | ||
+ | no. SNP/MNP/Indels : 2 | ||
+ | 3 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0] | ||
+ | 4 alleles (ts/tv) (ins/del) : 2 (0.00) [3/5] (1.00) [3/3] | ||
+ | >=5 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0]<br> | ||
+ | no. of clumped variants : 19025 | ||
+ | 2 alleles : 0 (-nan) [0/0] (-nan) [0/0] | ||
+ | 3 alleles : 18508 (0.16) [12152/75366] (0.00) [93/18653] | ||
+ | 4 alleles : 451 (0.15) [369/2390] (0.33) [201/609] | ||
+ | >=5 alleles : 66 (0.09) [37/414] (1.19) [107/90]<br> | ||
+ | ====== Other useful categories =====<br> | ||
+ | no. complex variants : 32093 | ||
+ | 2 alleles (ts/tv) (ins/del) : 412 (0.41) [120/292] (3.68) [324/88] | ||
+ | >=3 alleles (ts/tv) (ins/del) : 31681 (0.21) [20369/96289] (0.64) [12905/20270]<br> | ||
+ | ======= Structural variants ========<br> | ||
+ | no. of structural variants : 41217 | ||
+ | 2 alleles : 38079 | ||
+ | deletion : 13135 | ||
+ | insertion : 16451 | ||
+ | mobile element : 16253 | ||
+ | ALU : 12513 | ||
+ | LINE1 : 2911 | ||
+ | SVA : 829 | ||
+ | numt : 198 | ||
+ | duplication : 664 | ||
+ | inversion : 100 | ||
+ | copy number variation : 7729 | ||
+ | >=3 alleles : 3138 | ||
+ | copy number variation : 3138 <br> | ||
+ | ========= General summary ========== <br> | ||
+ | no. of reference : 0 <br> | ||
+ | no. of observed variants : 79449759 | ||
+ | no. of unclassified variants : 0 | ||
= Maintained by = | = Maintained by = | ||
This page is maintained by [mailto:atks@umich.edu Adrian]. | This page is maintained by [mailto:atks@umich.edu Adrian]. |
Revision as of 10:13, 5 September 2014
Introduction
The Variant Call Format (VCF) is a flexible file format specification that allows us to represent many different variant types ranging from SNPs, indels to copy number variations. However, variant representation in VCF is non-unique for variants that have explicitly expressed reference and alternate sequences.
On this wiki page, we describe a a variant classification system for VCF variants.
Definitions
The normalization of a variant representation in VCF consists of two parts: parsimony and left alignment pertaining to the nature of a variant's length and position respectively.
Example
Summarizes the variants in a VCF file
#summarizes the variants found in mills.vcf vt peek mills.vcf
usage : vt peek [options] <in.vcf>
options : -o output VCF file [-] -I file containing list of intervals [] -i intervals [] -r reference sequence fasta file [] -- ignores the rest of the labeled arguments following this flag -h displays help
#This is a sample output of a peek command which summarizes the variants found in a VCF file. stats: no. of samples : 0 no. of chromosomes : 22
========== Micro variants ==========
no. of SNPs : 77228885 2 alleles (ts/tv) : 77011302 (2.11) [52287790/24723512] 3 alleles (ts/tv) : 216560 (0.75) [185520/247600] 4 alleles (ts/tv) : 1023 (0.50) [1023/2046]
no. of MNPs : 0 2 alleles (ts/tv) : 0 (-nan) [0/0] >=3 alleles (ts/tv) : 0 (-nan) [0/0]
no. Indels : 2147564 2 alleles (ins/del) : 2124842 (0.47) [683250/1441592] >=3 alleles (ins/del) : 22722 (2.12) [32411/15286]
no. SNP/MNP : 0 3 alleles (ts/tv) : 0 (-nan) [0/0] >=4 alleles (ts/tv) : 0 (-nan) [0/0]
no. SNP/Indels : 12913 2 alleles (ts/tv) (ins/del) : 412 (0.41) [120/292] (3.68) [324/88] >=3 alleles (ts/tv) (ins/del) : 12501 (0.43) [7670/17649] (18.64) [12434/667]
no. MNP/Indels : 153 2 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0] >=3 alleles (ts/tv) (ins/del) : 153 (0.30) [138/465] (0.27) [67/248]
no. SNP/MNP/Indels : 2 3 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0] 4 alleles (ts/tv) (ins/del) : 2 (0.00) [3/5] (1.00) [3/3] >=5 alleles (ts/tv) (ins/del) : 0 (-nan) [0/0] (-nan) [0/0]
no. of clumped variants : 19025 2 alleles : 0 (-nan) [0/0] (-nan) [0/0] 3 alleles : 18508 (0.16) [12152/75366] (0.00) [93/18653] 4 alleles : 451 (0.15) [369/2390] (0.33) [201/609] >=5 alleles : 66 (0.09) [37/414] (1.19) [107/90]
====== Other useful categories =====
no. complex variants : 32093 2 alleles (ts/tv) (ins/del) : 412 (0.41) [120/292] (3.68) [324/88] >=3 alleles (ts/tv) (ins/del) : 31681 (0.21) [20369/96289] (0.64) [12905/20270]
======= Structural variants ========
no. of structural variants : 41217 2 alleles : 38079 deletion : 13135 insertion : 16451 mobile element : 16253 ALU : 12513 LINE1 : 2911 SVA : 829 numt : 198 duplication : 664 inversion : 100 copy number variation : 7729 >=3 alleles : 3138 copy number variation : 3138
========= General summary ==========
no. of reference : 0
no. of observed variants : 79449759 no. of unclassified variants : 0
Maintained by
This page is maintained by Adrian.