Difference between revisions of "Vmatch"
(→Output) |
|||
Line 19: | Line 19: | ||
== Output == | == Output == | ||
+ | atks@fantasia:~/data/got2d$ vmatch got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf -w 10 -d | ||
+ | VCF file A : got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf | ||
+ | VCF file B : got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf | ||
+ | Genome file : /net/fantasia/home/atks/ref/genome/human.g1k.v37.fa | ||
+ | Window Size : 10 | ||
+ | SRSA : 8578 | ||
+ | SRSAN : 34522 | ||
+ | SRDA : 2363 | ||
+ | SRDNA : 888 | ||
+ | DRDA : 2322 | ||
+ | DRDNA : 439 | ||
+ | #A Records : 73976 | ||
+ | #B Records : 71994 | ||
+ | Match %tage for VCF file A | ||
+ | Level 1 (SRSA, SRSAN) : 58.2621 | ||
+ | Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 62.6568 | ||
+ | Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 66.3891 | ||
+ | Match %tage for VCF file B | ||
+ | Level 1 (SRSA, SRSAN) : 59.8661 | ||
+ | Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 64.3818 | ||
+ | Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 68.2168 | ||
+ | Matched variants written to match.txt | ||
+ | Match logs written to match.log | ||
+ | |||
+ | atks@fantasia:~/data/got2d$ head match.txt | ||
+ | id1 id2 match_type extended_no_bases normalized | ||
+ | A4 B1 SRSAN 0 1 | ||
+ | A5 B2 SRSAN 0 1 | ||
+ | A6 B4 SRSA 0 0 | ||
+ | A7 B5 SRSAN 0 1 | ||
+ | A8 B6 SRSA 0 0 | ||
+ | A9 B7 SRDA 0 1 | ||
+ | A10 B8 SRSAN 0 1 | ||
+ | A11 B9 SRSAN 0 1 | ||
+ | A12 B10 SRSAN 0 1 | ||
== Description == | == Description == |
Revision as of 11:27, 16 January 2012
vmatch is a variant matching program for MNPs, INDELs and precise SVs in VCF files.
Basic Usage Example
vmatch <vcf-file-1> <vcf-file-2> -g <genome-file> -w <int> -d
Here is an example of how vmatch
works:
vmatch 1000g.vcf got2d.vcf -g hg18.fa -w 10 -d
Command Line Options
vcf-file VCF file genome-file Memory Mapped Sequence file w window size to detect overlaps between variants d debug option to generate a match.log file that gives all the matches made
Output
atks@fantasia:~/data/got2d$ vmatch got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf -w 10 -d VCF file A : got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf VCF file B : got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf Genome file : /net/fantasia/home/atks/ref/genome/human.g1k.v37.fa Window Size : 10 SRSA : 8578 SRSAN : 34522 SRDA : 2363 SRDNA : 888 DRDA : 2322 DRDNA : 439
- A Records : 73976
- B Records : 71994
Match %tage for VCF file A Level 1 (SRSA, SRSAN) : 58.2621 Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 62.6568 Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 66.3891 Match %tage for VCF file B Level 1 (SRSA, SRSAN) : 59.8661 Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 64.3818 Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 68.2168 Matched variants written to match.txt Match logs written to match.log
atks@fantasia:~/data/got2d$ head match.txt id1 id2 match_type extended_no_bases normalized A4 B1 SRSAN 0 1 A5 B2 SRSAN 0 1 A6 B4 SRSA 0 0 A7 B5 SRSAN 0 1 A8 B6 SRSA 0 0 A9 B7 SRDA 0 1 A10 B8 SRSAN 0 1 A11 B9 SRSAN 0 1 A12 B10 SRSAN 0 1
Description
Outputs 2 files match.txt : gives the matched pairs 1)id1 2)id2 3)match type 4)extended no of bases 5)normalized match.log : Details of the extension and normalization process for all compared pairs vmatch matches the variants in 2 VCF files by choosing the best match for every possible variant pair. The percentage of matches is given at 3 levels for each variant total of both VCF files.
The 3 match levels (in order of decreasing strictness) are given as: Level 1) SRSA - Same Position, same REF and ALT Level 1) SRSAN - Same Position, same REF and ALT after normalization Level 2) SRDA - Same Position, same REF and different ALT Level 2) SRDNA - Same Position, same REF and different number of ALT Level 3) DRDA - Same Position, different REF and different ALT Level 3) DRDNA - Same Position, different REF and different number of ALT Level 1 represents matches in position and alleles Level 2 represents matches in position and reference alleles but different alternate alleles Level 3 represents matches only in position
Download
For the current lfSingle, please go to our GLF Tools Website.