Line 238: |
Line 238: |
| | | |
| <div> | | <div> |
− | [http://genome.sph.umich.edu/wiki/Variant_Normalization Normalize] variants in a [http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 VCF] file. Normalized variants may have their positions changed; in such cases, the normalized variants
| + | Decompose multiallelic variants in a [http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 VCF] file. If the VCF file has genotype fields GT,PL or GL, they are |
− | are reordered and output in an ordered fashion. The local reordering takes place over a window
| + | modified to reflect the change in alleles. All other genotype fields are removed. |
− | of 10000 base pairs.
| |
| </div> | | </div> |
| | | |
Line 247: |
Line 246: |
| vt decompose gatk.vcf -o gatk.decomposed.vcf | | vt decompose gatk.vcf -o gatk.decomposed.vcf |
| | | |
− | #variants that are normalized will be annotated with an OLD_VARIANT info tag. | + | #before decomposition |
− | #CHROM POS ID REF ALT QUAL FILTER INFO | + | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 |
− | 19 29238772 . C G . PASS VT=SNP;OLD_VARIANT=19:29238771:TC/TG | + | 1 3759889 . TA TAA,TAAA,T . PASS AF=0.342,0.173,0.037 GT:DP:PL 1/2::81:281,5,9,58,0,115,338,46,116,809 0/0:86:0,30,323,31,365,483,38,291,325,567 |
− | 20 60674709 . GCCCAGCCCCAC G . PASS VT=INDEL;OLD_VARIANT=20:60674718:CACCCCAGCCCC/C
| |
| | | |
− | #this shows a sample output with the normalization operations that were used | + | #after decomposition |
− | #categorized into 5 categories each for biallelic and multiallelic variants. <br> | + | #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 |
− | stats: biallelic | + | 1 3759889 . TA TAA . PASS AF=0.342,0.173,0.037;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL 1/.:281,5,9 0/0:0,30,323 |
− | no. left trimmed : 156908
| + | 1 3759889 . TA TAAA . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./1:281,58,115 0/0:0,31,483 |
− | no. right trimmed : 323
| + | 1 3759889 . TA T . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./.:281,338,809 0/0:0,38,567 |
− | no. left and right trimmed : 33
| |
− | no. right trimmed and left aligned : 7
| |
− | no. left aligned : 12360 <br>
| |
− | total no. biallelic normalized : 169631 <br> <br>
| |
− | multiallelic
| |
− | no. left trimmed : 627189
| |
− | no. right trimmed : 2509
| |
− | no. left and right trimmed : 1498
| |
− | no. right trimmed and left aligned : 212
| |
− | no. left aligned : 1783 <br>
| |
− | total no. multiallelic normalized : 633191 <br>
| |
− | total no. variants normalized : 802822
| |
− | total no. variants observed : 88052639
| |
| | | |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |