Difference between revisions of "Vmatch"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 19: Line 19:
 
==  Output ==
 
==  Output ==
  
atks@fantasia:~/data/got2d$ vmatch got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf  got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf -w 10 -d
+
atks@fantasia:~/data/got2d$ vmatch got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf  got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf -w 10 -d
VCF file A  : got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf
+
VCF file A  : got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf
VCF file B  : got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf
+
VCF file B  : got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf
Genome file : /net/fantasia/home/atks/ref/genome/human.g1k.v37.fa
+
Window Size : 10
+
Genome file : /net/fantasia/home/atks/ref/genome/human.g1k.v37.fa
SRSA  : 8578
+
Window Size : 10
SRSAN : 34522
+
SRSA  : 8578
SRDA  : 2363
+
SRSAN : 34522
SRDNA : 888
+
SRDA  : 2363
DRDA  : 2322
+
SRDNA : 888
DRDNA : 439
+
DRDA  : 2322
#A Records : 73976
+
DRDNA : 439
#B Records : 71994
+
Match %tage for VCF file A
+
#A Records : 73976
Level 1 (SRSA, SRSAN)                          : 58.2621
+
#B Records : 71994
Level 2 (SRSA, SRSAN, SRDA, SRDNA)            : 62.6568
+
Match %tage for VCF file A
Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 66.3891
+
Level 1 (SRSA, SRSAN)                          : 58.2621
Match %tage for VCF file B
+
Level 2 (SRSA, SRSAN, SRDA, SRDNA)            : 62.6568
Level 1 (SRSA, SRSAN)                          : 59.8661
+
Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 66.3891
Level 2 (SRSA, SRSAN, SRDA, SRDNA)            : 64.3818
+
Match %tage for VCF file B
Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 68.2168
+
Level 1 (SRSA, SRSAN)                          : 59.8661
Matched variants written to match.txt
+
Level 2 (SRSA, SRSAN, SRDA, SRDNA)            : 64.3818
Match logs written to match.log
+
Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 68.2168
 
+
Matched variants written to match.txt
atks@fantasia:~/data/got2d$ head match.txt  
+
Match logs written to match.log
id1 id2 match_type extended_no_bases normalized
+
A4 B1 SRSAN 0 1
+
atks@fantasia:~/data/got2d$ head match.txt  
A5 B2 SRSAN 0 1
+
id1 id2 match_type extended_no_bases normalized
A6 B4 SRSA 0 0
+
A4 B1 SRSAN 0 1
A7 B5 SRSAN 0 1
+
A5 B2 SRSAN 0 1
A8 B6 SRSA 0 0
+
A6 B4 SRSA 0 0
A9 B7 SRDA 0 1
+
A7 B5 SRSAN 0 1
A10 B8 SRSAN 0 1
+
A8 B6 SRSA 0 0
A11 B9 SRSAN 0 1
+
A9 B7 SRDA 0 1
A12 B10 SRSAN 0 1
+
A10 B8 SRSAN 0 1
 +
A11 B9 SRSAN 0 1
 +
A12 B10 SRSAN 0 1
  
 
== Description ==
 
== Description ==

Revision as of 11:29, 16 January 2012

vmatch is a variant matching program for MNPs, INDELs and precise SVs in VCF files.

Basic Usage Example

 vmatch <vcf-file-1> <vcf-file-2> -g <genome-file> -w <int> -d

Here is an example of how vmatch works:

  vmatch 1000g.vcf got2d.vcf -g hg18.fa  -w 10 -d

Command Line Options

   vcf-file       VCF file
   genome-file    Memory Mapped Sequence file
   w              window size to detect overlaps between variants
   d              debug option to generate a match.log file that gives all the matches made

Output

	atks@fantasia:~/data/got2d$ vmatch got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf  got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf -w 10 -d  

VCF file A : got2d.wg4x.1511samples.73976indels.gatk.chr20.sites.vcf VCF file B : got2d.wg4x.1514samples.71994indels.samtools.chr20.sites.vcf

Genome file : /net/fantasia/home/atks/ref/genome/human.g1k.v37.fa Window Size : 10 SRSA : 8578 SRSAN : 34522 SRDA : 2363 SRDNA : 888 DRDA : 2322 DRDNA : 439

#A Records : 73976 #B Records : 71994 Match %tage for VCF file A Level 1 (SRSA, SRSAN) : 58.2621 Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 62.6568 Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 66.3891 Match %tage for VCF file B Level 1 (SRSA, SRSAN) : 59.8661 Level 2 (SRSA, SRSAN, SRDA, SRDNA) : 64.3818 Level 3 (SRSA, SRSAN, SRDA, SRDNA, DRDA, DRDNA): 68.2168 Matched variants written to match.txt Match logs written to match.log

atks@fantasia:~/data/got2d$ head match.txt id1 id2 match_type extended_no_bases normalized A4 B1 SRSAN 0 1 A5 B2 SRSAN 0 1 A6 B4 SRSA 0 0 A7 B5 SRSAN 0 1 A8 B6 SRSA 0 0 A9 B7 SRDA 0 1 A10 B8 SRSAN 0 1 A11 B9 SRSAN 0 1 A12 B10 SRSAN 0 1

Description

   Outputs 2 files
     match.txt : gives the matched pairs
                 1)id1
                 2)id2
                 3)match type
                 4)extended no of bases
                 5)normalized
     match.log : Details of the extension and normalization process for all compared pairs
   vmatch matches the variants in 2 VCF files by choosing the best match for every
   possible variant pair.  The percentage of matches is given at 3 levels for each
   variant total of both VCF files.
   The 3 match levels (in order of decreasing strictness) are given as:
      Level 1) SRSA    - Same Position, same REF and ALT
      Level 1) SRSAN   - Same Position, same REF and ALT after normalization
      Level 2) SRDA    - Same Position, same REF and different ALT
      Level 2) SRDNA   - Same Position, same REF and different number of ALT
      Level 3) DRDA    - Same Position, different REF and different ALT
      Level 3) DRDNA   - Same Position, different REF and different number of ALT
 
      Level 1 represents matches in position and alleles
      Level 2 represents matches in position and reference alleles but different alternate alleles
      Level 3 represents matches only in position

Download

For the current lfSingle, please go to our GLF Tools Website.