Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,118 bytes added ,  01:03, 2 March 2018
Line 54: Line 54:  
   </div>
 
   </div>
   −
Building has been tested on Linux and Mac systems on gcc 4.8.1 and clang 3.4. <br>
+
=== Mac ===
Some features of C++11 is used, thus there is a need for newer versions of gcc and clang.
+
 
 +
  You will need to install the package xz prior to installing vt.
   −
== Mac ==
+
  homebrew install xz
   −
You may also install vt on mac via homebrew.
     −
  brew install homebrew/science/vt
+
Building has been tested on Linux and Mac systems on gcc 4.8.1 and clang 3.4. <br>
 +
Some features of C++11 are used, thus there is a need for newer versions of gcc and clang.
    
= Updating =
 
= Updating =
Line 719: Line 720:     
<div class=" mw-collapsible mw-collapsed">
 
<div class=" mw-collapsible mw-collapsed">
   #converts in.bcf to tab format with selected INFO fields
+
   #converts in.bcf to tab format with selected INFO and FILTER fields
   vt info2tab in.bcf -v -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT
+
   vt info2tab in.bcf -u PASS -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT
 
   
   <div style="height:6em; overflow:auto; border: 2px solid #FFF">
 
   <div style="height:6em; overflow:auto; border: 2px solid #FFF">
 +
  INPUT
 +
  =====
 
   20 17548608 . A AC . PASS CENTERS=vbi;NCENTERS=1;OLD_MULTIALLELIC=20:17548598:GAAAAAAAAAAAAA/GAAAAAAAAAAAA/GAAAAAAAAAAAAAA/GAAAAAAAAAA/GAAAAAAAAAAA/GAAAAAAAAAACAAA;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAACAAAG;EX_MOTIF=C;EX_MLEN=1;EX_RU=C;EX_BASIS=C;EX_BLEN=1;EX_REPEAT_TRACT=17548608,17548609;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=2;EX_RL=2;EX_LL=3;EX_RU_COUNTS=0,2;EX_SCORE=0;EX_TRF_SCORE=-14;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=14;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[A]AAAGAAGGAA;MDUST;LOBSTR
 
   20 17548608 . A AC . PASS CENTERS=vbi;NCENTERS=1;OLD_MULTIALLELIC=20:17548598:GAAAAAAAAAAAAA/GAAAAAAAAAAAA/GAAAAAAAAAAAAAA/GAAAAAAAAAA/GAAAAAAAAAAA/GAAAAAAAAAACAAA;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAACAAAG;EX_MOTIF=C;EX_MLEN=1;EX_RU=C;EX_BASIS=C;EX_BLEN=1;EX_REPEAT_TRACT=17548608,17548609;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=2;EX_RL=2;EX_LL=3;EX_RU_COUNTS=0,2;EX_SCORE=0;EX_TRF_SCORE=-14;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=14;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[A]AAAGAAGGAA;MDUST;LOBSTR
 
   20 17548608 . AAAAG A . PASS CENTERS=ox1;NCENTERS=1;EX_MOTIF=AAAG;EX_MLEN=4;EX_RU=AAAG;EX_BASIS=AG;EX_BLEN=2;EX_REPEAT_TRACT=17548609,17548612;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=0.75;EX_RL=4;EX_LL=4;EX_RU_COUNTS=0,1;EX_SCORE=0.75;EX_TRF_SCORE=-1;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=13;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[AAAAG]AAGGAACTAC;MDUST;LOBSTR;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAA
 
   20 17548608 . AAAAG A . PASS CENTERS=ox1;NCENTERS=1;EX_MOTIF=AAAG;EX_MLEN=4;EX_RU=AAAG;EX_BASIS=AG;EX_BLEN=2;EX_REPEAT_TRACT=17548609,17548612;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=0.75;EX_RL=4;EX_LL=4;EX_RU_COUNTS=0,1;EX_SCORE=0.75;EX_TRF_SCORE=-1;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=13;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[AAAAG]AAGGAACTAC;MDUST;LOBSTR;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAA
   
   </div>
 
   </div>
 
+
  OUTPUT
   CHROM POS   REF   ALT N_ALLELE  EX_RL  FZ_RL MDUST LOBSTR VNTRSEEK  RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2
+
  ======
   20 17548608  A   AC 2        2 13 1 1 0   0    17548608                17548608
+
   CHROM POS   REF   ALT N_ALLELE PASS EX_RL  FZ_RL MDUST LOBSTR VNTRSEEK  RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2
   20 17548608  AAAAG  A 2        4      13     1 1      0        0    17548609                17548609
+
   20 17548608  A   AC 2        1    2     13 1 1 0   0    17548608                17548608
 +
   20 17548608  AAAAG  A 2        1    4      13       1       1      0        0    17548609                17548609
    
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
 
   usage : vt info2tab [options] <in.vcf>
 
   usage : vt info2tab [options] <in.vcf>
 
    
 
    
   options : -v  print variant CHROM,POS,REF,ALT,N_ALLELE [false]
+
   options : -d  debug [false]
            -d  debug [false]
   
             -f  filter expression []
 
             -f  filter expression []
             -t  list of info tags to be extracted []
+
             -u  list of filter tags to be extracted []-t  list of info tags to be extracted []
 
             -o  output tab delimited file [-]
 
             -o  output tab delimited file [-]
 
             -I  file containing list of intervals []
 
             -I  file containing list of intervals []
Line 1,056: Line 1,057:  
</div>
 
</div>
   −
=== Profile SNPs ===
+
=== Profile Mendelian Errors ===
   −
Profile SNPs.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
+
Profile Mendelian errors
    
<div class=" mw-collapsible mw-collapsed">
 
<div class=" mw-collapsible mw-collapsed">
   #profile snps found in 20.sites.vcf
+
   #profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex.
   vt profile_snps -g snp.reference.txt 20.sites.vcf -r hs37d5.fa  -i 20
+
   vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel
 +
 
 +
  pedigree file format is described in [[Vt#Pedigree File|here]].
   −
  #this is a sample output for indel profiling.
+
  #this is a sample output for mendelian error profiling.
  # square brackets contain the ts/tv ratio.
+
  #R and A stand for reference and alternate allele respectively.
  # The numbers in curved bracket are the counts of ts and tv SNPs respectively.
+
  #Error% - mendelian error (confounded with de novo mutation)
  # Low complexity shows what percent of the SNPs are in low complexity regions.
+
  #HomHet - Homozygous-Heterozygous genotype ratios
   data set
+
  #Het% - proportion of hets
    No. SNPs         :     508603 [2.09]
+
  Mendelian Errors <br>
         Low complexity :      0.08 (39837/508603) <br>
+
   Father Mother      R/R          R/A          A/A    Error(%) HomHet    Het(%)
  1000g
+
  R/R    R/R        14889         210          38     1.64      nan    nan
    A-B     109970 [1.39]
+
  R/R    R/A         3403        3497          74    1.06      0.97  50.68
     A&B     398633 [2.37]
+
  R/R    A/A          176        1482          155    18.26      nan    nan
    B-A    1340682 [2.26]
+
  R/A   R/R        3665        3652          68     0.92      1.00  49.91
     Precision   78.4%
+
  R/A    R/A        1015        3151          990     0.00      0.64  61.11
     Sensitivity  22.9% <br>
+
  R/A    A/A          43        1300        1401     1.57      1.08  48.13
  dbsnp
+
  A/A    R/R          172        1365          147    18.94      nan    nan
     A-B     324063 [1.99]
+
  A/A    R/A          47        1164        1183     1.96      1.02  49.60
     A&B     184540 [2.29]
+
   A/A    A/A          20          78         5637     1.71      nan    nan <br>
     B-A    103893 [2.60]
+
  Parental            R/R          R/A          A/A    Error(%) HomHet    Het(%)
     Precision   36.3%
+
  R/R    R/R        14889          210          38     1.64      nan    nan
     Sensitivity 64.0%
+
  R/R    R/A         7068        7149          142     0.99      0.99 50.28
 +
  R/R    A/A          348        2847          302    18.59      nan    nan
 +
  R/A    R/A        1015        3151          990     0.00      0.64  61.11
 +
  R/A    A/A           90        2464        2584     1.75      1.05  48.81
 +
  A/A    A/A          20          78        5637     1.71      nan    nan  <br>
 +
  Parental            R/R          R/A          A/A   Error(%) HomHet    Het(%)
 +
  HOM    HOM        14909          288        5675     1.66      nan    nan
 +
  HOM    HET        7158        9613        2726     1.19      1.00  49.90
 +
  HET   HET        1015        3151          990    0.00      0.64  61.11
 +
  HOMREF HOMALT      348        2847          302    18.59      nan    nan  <br>
 +
  total mendelian error :  2.505%  
 +
  no. of trios     : 2
 +
  no. of variants : 25346
 +
 
 +
<div class="mw-collapsible-content">
 +
profile_mendelian v0.5
   −
   # This file contains information on how to process reference data sets.
+
   usage : vt profile_mendelian [options] <in.vcf>
  #
+
 
   # dataset - name of data set, this label will be printed.
+
   options : -q  minimum genotype quality
  # type    - True Positives (TP) and False Positives (FP)
+
            -d  minimum depth
  #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively
+
            -r  reference sequence fasta file []
  #        - annotation
+
            -x output latex directory []
  #          file is used for GENCODE annotation of frame shift and non frame shift Indels
+
            -p  pedigree file
  # filter - filter applied to variants for this particular data set
+
            -I  file containing list of intervals []
  # path    - path of indexed BCF file
+
             -i  intervals
  #dataset              type             filter                                path
+
          -?  displays help
  1000g                  TP              N_ALLELE==2&&VTYPE==SNP                /net/fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf
+
</div>
  dbsnp                  TP              N_ALLELE==2&&VTYPE==SNP                /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf
+
</div>
  GENCODE_V19            cds_annotation  .                                      /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz
+
 
  DUST                  cplx_annotation  .                                      /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
+
=== Profile SNPs ===
   −
<div class="mw-collapsible-content">
+
Profile SNPs.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
  usage : vt profile_snps [options] <in.vcf>
     −
  options : -f  filter expression []
+
<div class=" mw-collapsible mw-collapsed">
            -g file containing list of reference datasets []
+
  #profile snps found in 20.sites.vcf
            -I file containing list of intervals []
+
  vt profile_snps -g snp.reference.txt 20.sites.vcf -r hs37d5.fa -i 20
            -i intervals []
  −
            -r  reference sequence fasta file []
  −
            -?  displays help
  −
</div>
  −
</div>
     −
=== Profile Indels ===
+
  #this is a sample output for indel profiling.
 
+
  # square brackets contain the ts/tv ratio.   
Profile Indels.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
+
  # The numbers in curved bracket are the counts of ts and tv SNPs respectively.
 +
  # Low complexity shows what percent of the SNPs are in low complexity regions.
 +
  data set
 +
    No. SNPs          :    508603 [2.09]
 +
        Low complexity :      0.08 (39837/508603) <br>
 +
  1000g
 +
    A-B    109970 [1.39]
 +
    A&B    398633 [2.37]
 +
    B-A    1340682 [2.26]
 +
    Precision    78.4%
 +
    Sensitivity  22.9% <br>
 +
  dbsnp
 +
    A-B    324063 [1.99]
 +
    A&B    184540 [2.29]
 +
    B-A    103893 [2.60]
 +
    Precision    36.3%
 +
    Sensitivity  64.0%
   −
<div class=" mw-collapsible mw-collapsed">
+
  # This file contains information on how to process reference data sets.
  #profile indels found in mills.vcf
+
  #
  vt profile_indels -g indel.reference.txt mills.vcf -r hs37d5.fa  -i 20
+
   # dataset - name of data set, this label will be printed.
 
+
   # type    - True Positives (TP) and False Positives (FP)
   #this is a sample output for indel profiling.
+
   #           overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively
   # square brackets contain the ins/del ratio. 
+
  #        - annotation
   # for the FS/NFS field, that is the proportion of coding indels that are frame shifted. 
+
   #           file is used for GENCODE annotation of frame shift and non frame shift Indels
   # The numbers in curved bracket are the counts of frame shift and non frame shift indels respectively.
+
   # filter  - filter applied to variants for this particular data set  
   data set
+
  # path    - path of indexed BCF file
    No Indels :      46974 [0.89]
+
  #dataset              type            filter                                path
      FS/NFS :      0.26 (8/23) <br>
+
  1000g                  TP              N_ALLELE==2&&VTYPE==SNP                /net/fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf
  dbsnp
+
   dbsnp                  TP              N_ALLELE==2&&VTYPE==SNP                /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf
    A-B      30704 [0.92]
+
   GENCODE_V19            cds_annotation   .                                     /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz
    A&B      16270 [0.83]
+
   DUST                  cplx_annotation  .                                     /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
    B-A    2049488 [1.52]
+
 
    Precision    34.6%
+
<div class="mw-collapsible-content">
    Sensitivity   0.8% <br>
+
  usage : vt profile_snps [options] <in.vcf>
  mills
  −
    A-B      43234 [0.88]
  −
    A&B      3740 [1.00]
  −
    B-A    203278 [0.98]
  −
    Precision    8.0%
  −
    Sensitivity   1.8% <br>
  −
   mills.chip
  −
    A-B      46847 [0.89]
  −
    A&B        127 [0.90]
  −
    B-A      8777 [0.93]
  −
    Precision    0.3%
  −
    Sensitivity   1.4% <br>
  −
  affy.exome.chip
  −
    A-B      46911 [0.89]
  −
    A&B        63 [0.43]
  −
    B-A      33997 [0.47]
  −
    Precision    0.1%
  −
    Sensitivity  0.2% <br>
     −
   # This file contains information on how to process reference data sets.
+
   options : -f filter expression []
  # dataset - name of data set, this label will be printed.
+
            -g  file containing list of reference datasets []
  # type    - True Positives (TP) and False Positives (FP).
+
             -I  file containing list of intervals []
  #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.
+
             -i  intervals []
  #        - annotation.
+
             -r  reference sequence fasta file []
  #          file is used for GENCODE annotation of frame shift and non frame shift Indels.
+
             -?  displays help
  # filter - filter applied to variants for this particular data set.
  −
  # path    - path of indexed BCF file.
  −
  #dataset    type            filter                       path
  −
  1000g        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf
  −
  mills        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf
  −
  dbsnp        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf
  −
  GENCODE_V19  cds_annotation  .                            /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz
  −
  DUST        cplx_annotation .                            /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
  −
 
  −
<div class="mw-collapsible-content">
  −
  usage : vt profile_indels [options] <in.vcf>
  −
 
  −
  options : -g  file containing list of reference datasets []
  −
             -I  file containing list of intervals []
  −
             -i  intervals []
  −
             -r  reference sequence fasta file []
  −
             -?  displays help
   
  </div>
 
  </div>
 
</div>
 
</div>
   −
=== Profile VNTRs ===
+
=== Profile Indels ===
   −
Profile VNTRs.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
+
Profile Indels.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
    
<div class=" mw-collapsible mw-collapsed">
 
<div class=" mw-collapsible mw-collapsed">
 +
  #profile indels found in mills.vcf
 +
  vt profile_indels -g indel.reference.txt mills.vcf -r hs37d5.fa  -i 20
   −
   #profiles a set of VNTRs
+
   #this is a sample output for indel profiling.
   vt profile_vntrs vntrs.sites.bcf -g vntr.reference.txt
+
   # square brackets contain the ins/del ratio.
    
+
   # for the FS/NFS field, that is the proportion of coding indels that are frame shifted.
 
+
   # The numbers in curved bracket are the counts of frame shift and non frame shift indels respectively.
  profile_vntrs v0.5
+
  data set
    
+
     No Indels :      46974 [0.89]
    no VNTRs          5660874          #number of VNTRs in vntrs.sites.bcf
+
      FS/NFS :       0.26 (8/23) <br>
    no low complexity  2686460 (47.46%)  #number of VNTRs in low complexity region determined by MDUST
+
   dbsnp
     no coding          17911 (0.32%)    #number of VNTRs in coding regions determined by GENCODE v7
+
     A-B     30704 [0.92]
    no redundant       1312209 (23.18%) #number of VNTRs involved in overlapping with one another<br>
+
     A&B     16270 [0.83]
   trf_lobstr (1638516)  #TRF based reference set used in lobSTR, motif lengths 1 to 6.
+
     B-A   2049488 [1.52]
     A-B     3269285    #TRs specific to vntrs.sites.bcf
+
     Precision    34.6%
     A-B~    1666185    #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR.
+
     Sensitivity   0.8% <br>
     A&B1    725404    #TRs in vntrs.sites.bcf that overlap exactly with at least one TR in TRF(lobSTR)
+
   mills
     A&B2    723195    #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs.sites.bcf
+
     A-B     43234 [0.88]
     B-A~    710075    #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR.
+
     A&B       3740 [1.00]
    B-A      205246    #TRs specific to TRF(lobSTR)
+
     B-A    203278 [0.98]
   #note that the first 3 rows should sum up to the number of TRs in vntrs.sites.bcf
+
     Precision     8.0%
  #and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR)
+
     Sensitivity  1.8% <br>
  #This basically allows us to see the m to n overlapping in overlapping TRs<br>
+
   mills.chip
   trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6.
+
     A-B     46847 [0.89]
     A-B     3291652
+
     A&B       127 [0.90]
     A-B~    1650190
+
     B-A      8777 [0.93]
     A&B1     719032
+
     Precision     0.3%
     A&B2     716838
+
     Sensitivity  1.4% <br>
     B-A~    703948
+
   affy.exome.chip
    B-A      203767  <br>
+
     A-B     46911 [0.89]
   trf_vntrseek (230306)  #TRF based reference set used in vntrseek, motif lengths 7 to 2000.
+
     A&B        63 [0.43]
     A-B     5384453
+
     B-A     33997 [0.47]
     A-B~    271302
+
     Precision    0.1%
     A&B1       5119
+
     Sensitivity  0.2% <br>
     A&B2      4973
  −
     B-A~      92496
  −
     B-A      132837  <br>
  −
   codis+ (15)            #CODIS STRs + 2 STRs from PROMEGA
  −
     A-B     5660794
  −
     A-B~         79
  −
     A&B1          1
  −
     A&B2          1  
  −
     B-A~        14
  −
    B-A          0  
      
   # This file contains information on how to process reference data sets.
 
   # This file contains information on how to process reference data sets.
Line 1,230: Line 1,215:  
   #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.
 
   #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.
 
   #        - annotation.
 
   #        - annotation.
   #          file is used for GENCODE annotation of coding VNTRs.
+
   #          file is used for GENCODE annotation of frame shift and non frame shift Indels.
 
   # filter  - filter applied to variants for this particular data set.
 
   # filter  - filter applied to variants for this particular data set.
 
   # path    - path of indexed BCF file.
 
   # path    - path of indexed BCF file.
   #dataset     type            filter                      path
+
   #dataset     type            filter                      path
   trf_lobstr    TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.lobstr.sites.bcf
+
   1000g        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf
   trf_repeatseq TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.repeatseq.sites.bcf
+
   mills        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf
   trf_vntrseek  TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.vntrseek.sites.bcf
+
   dbsnp        TP              N_ALLELE==2&&VTYPE==INDEL    /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf
   codis+        TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/codis.strs.sites.bcf
+
   GENCODE_V19  cds_annotation  .                            /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz
   GENCODE_V19  cds_annotation  .                            /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz
+
   DUST        cplx_annotation .                            /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
  DUST          cplx_annotation .                             
      
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   usage : vt profile_vntrs [options] <in.vcf>
+
   usage : vt profile_indels [options] <in.vcf>
    
   options : -g  file containing list of reference datasets []
 
   options : -g  file containing list of reference datasets []
Line 1,252: Line 1,236:  
</div>
 
</div>
   −
=== Profile Mendelian Errors ===
+
=== Profile VNTRs ===
   −
Profile Mendelian errors
+
Profile VNTRs.  The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
    
<div class=" mw-collapsible mw-collapsed">
 
<div class=" mw-collapsible mw-collapsed">
  #profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex.
  −
  vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel
     −
  pedigree file format is described in [http://csg.sph.umich.edu//abecasis/merlin/tour/input_files.html here]
+
  #profiles a set of VNTRs
 +
  vt profile_vntrs vntrs.sites.bcf -g vntr.reference.txt
 +
 
   −
  #this is a sample output for mendelian error profiling.
+
  profile_vntrs v0.5
  #R and A stand for reference and alternate allele respectively.
+
 
  #Error% - mendelian error (confounded with de novo mutation)
+
    no VNTRs          5660874          #number of VNTRs in vntrs.sites.bcf
  #HomHet - Homozygous-Heterozygous genotype ratios
+
    no low complexity  2686460 (47.46%)  #number of VNTRs in low complexity region determined by MDUST
  #Het% - proportion of hets
+
    no coding          17911 (0.32%)     #number of VNTRs in coding regions determined by GENCODE v7
  Mendelian Errors <br>
+
    no redundant      1312209 (23.18%)  #number of VNTRs involved in overlapping with one another<br>
  Father Mother      R/R          R/A          A/A    Error(%) HomHet    Het(%)
+
  trf_lobstr (1638516) #TRF based reference set used in lobSTR, motif lengths 1 to 6.
  R/R    R/R        14889          210          38    1.64      nan    nan
+
    A-B    3269285     #TRs specific to vntrs.sites.bcf
  R/R    R/A         3403        3497          74     1.06      0.97  50.68
+
    A-B~   1666185     #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR.
  R/R    A/A          176        1482          155    18.26      nan   nan
+
    A&B1    725404     #TRs in vntrs.sites.bcf that overlap exactly with at least one TR in TRF(lobSTR)
  R/A    R/R        3665        3652          68     0.92      1.00  49.91
+
    A&B2    723195     #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs.sites.bcf
  R/A   R/A        1015        3151          990     0.00      0.64  61.11
+
    B-A~    710075     #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR.
  R/A   A/A          43        1300        1401     1.57      1.08  48.13
+
    B-A     205246     #TRs specific to TRF(lobSTR)
  A/A    R/R          172        1365          147    18.94      nan    nan
+
  #note that the first 3 rows should sum up to the number of TRs in vntrs.sites.bcf
  A/A    R/A          47        1164        1183     1.96      1.02  49.60
+
  #and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR)
  A/A    A/A          20          78        5637     1.71      nan    nan <br>
+
  #This basically allows us to see the m to n overlapping in overlapping TRs<br>
  Parental            R/R          R/A          A/A    Error(%) HomHet    Het(%)
+
  trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6.
  R/R    R/R        14889          210          38    1.64      nan    nan
+
    A-B     3291652
  R/R    R/A         7068        7149          142     0.99      0.99  50.28
+
    A-B~   1650190
  R/R    A/A          348        2847          302    18.59      nan   nan
+
    A&B1    719032
  R/A   R/A         1015        3151          990     0.00      0.64  61.11
+
    A&B2     716838
  R/A   A/A          90        2464        2584     1.75      1.05  48.81
+
    B-A~     703948
  A/A    A/A          20          78        5637    1.71      nan    nan <br>
+
    B-A     203767 <br>
  Parental            R/R          R/A         A/A   Error(%) HomHet    Het(%)
+
  trf_vntrseek (230306)  #TRF based reference set used in vntrseek, motif lengths 7 to 2000.
  HOM    HOM        14909          288        5675     1.66       nan    nan
+
    A-B    5384453
  HOM    HET        7158        9613        2726     1.19     1.00  49.90
+
    A-B~    271302
  HET    HET        1015        3151          990     0.00     0.64  61.11
+
    A&B1      5119
  HOMREF HOMALT      348        2847          302    18.59      nan    nan <br>
+
     A&B2       4973
  total mendelian error :   2.505%
+
     B-A~     92496
  no. of trios     : 2
+
     B-A     132837 <br>
  no. of variants  : 25346
+
   codis+ (15)            #CODIS STRs + 2 STRs from PROMEGA
 
+
    A-B    5660794
= Variant Calling =
+
    A-B~        79
 +
     A&B1          1
 +
    A&B2          1
 +
    B-A~        14
 +
    B-A          0
    +
  # This file contains information on how to process reference data sets.
 +
  # dataset - name of data set, this label will be printed.
 +
  # type    - True Positives (TP) and False Positives (FP).
 +
  #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.
 +
  #        - annotation.
 +
  #          file is used for GENCODE annotation of coding VNTRs.
 +
  # filter  - filter applied to variants for this particular data set.
 +
  # path    - path of indexed BCF file.
 +
  #dataset      type            filter                      path
 +
  trf_lobstr    TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.lobstr.sites.bcf
 +
  trf_repeatseq TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.repeatseq.sites.bcf
 +
  trf_vntrseek  TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/trf.vntrseek.sites.bcf
 +
  codis+        TP              VTYPE==VNTR                  /net/fantasia/home/atks/ref/vt/grch37/codis.strs.sites.bcf
 +
  GENCODE_V19  cds_annotation  .                            /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz
 +
  DUST          cplx_annotation .                             
   −
=== Discover ===
  −
  −
Discovers variants from reads in a BAM/CRAM file.
  −
  −
<div class=" mw-collapsible mw-collapsed">
  −
  #discover variants from NA12878.bam and write to stdout
  −
  vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20
   
<div class="mw-collapsible-content">
 
<div class="mw-collapsible-content">
   usage : vt discover2 [options]  
+
   usage : vt profile_vntrs [options] <in.vcf>
   −
   options : -b input BAM/CRAM file
+
   options : -g file containing list of reference datasets []
          -y  soft clipped unique sequences cutoff [0]
+
            -I file containing list of intervals []
          -x soft clipped mean quality cutoff [0]
+
            -i intervals []
          -w insertion desired type II error [0.0]
+
            -r reference sequence fasta file []
          -c insertion desired type I error [0.0]
+
            -? displays help
          -h insertion fractional evidence cutoff [0]
+
  </div>
          -g insertion count cutoff [1]
+
</div>
          -n deletion desired type II error [0.0]
+
 
          -m  deletion desired type I error [0.0]
+
=== Profile NA12878 ===
          -v  deletion fractional evidence cutoff [0]
+
 
          -u  deletion count cutoff [1]
+
Profile Mendelian errors
          -k  snp desired type II error [0.0]
+
 
          -j  snp desired type I error [0.0]
+
<div class=" mw-collapsible mw-collapsed">
          -f  snp fractional evidence cutoff [0]
+
  #profile NA12878 overlap with broad knowledgebase and illumina platinum genomes for the file vt.genotypes.bcf for chromosome 20.
          -e  snp evidence count cutoff [1]
+
  vt profile_na12878 vt.genotypes.bcf -g na12878.reference.txt -r hs37d5.fa -i 20
          -q  base quality cutoff for bases [0]
+
 
          -C likelihood ratio cutoff [0]
+
  #this is a sample output for mendelian error profiling.
          -B  reference bias [0]
+
  #R and A stand for reference and alternate allele respectively.
          -a  read exclude flag [0x0704]
+
  #Error% - mendelian error (confounded with de novo mutation)
           -l  ignore overlapping reads [false]
+
  #HomHet - Homozygous-Heterozygous genotype ratios
           -t  MAPQ cutoff for alignments [0]
+
  #Het% - proportion of hets
           -p  ploidy [2]
+
    data set
          -s sample ID
+
    No Indels :      27770 [0.94]
          -r  reference sequence fasta file []
+
      FS/NFS :      0.26 (8/23) <br>
          -o  output VCF file [-]
+
  broad.kb
          -z  ignore MD tags [0]
+
    A-B      13071 [1.19]
          -d  debug [0]
+
    A&B      14699 [0.76]
          -I  file containing list of intervals []
+
    B-A      21546 [0.62]
          -i intervals []
+
    Precision    52.9%
          -?  displays help
+
    Sensitivity  40.6% <br>
 +
  illumina.platinum
 +
    A-B      17952 [0.88]
 +
    A&B      9818 [1.07]
 +
    B-A      2418 [0.88]
 +
    Precision    35.4%
 +
    Sensitivity 80.2% <br>
 +
  broad.kb
 +
                R/R      R/A      A/A      ./.
 +
    R/R        346      145        3      5473
 +
    R/A           3      4133        9      758
 +
    A/A           2      136      2186      956
 +
    ./.           2       139        86      322 <br>
 +
    Total genotype pairs :      6963
 +
    Concordance          : 95.72% (6665)
 +
    Discordance          :  4.28% (298) <br>
 +
  illumina.platinum
 +
                R/R      R/A      A/A      ./.
 +
    R/R        1768        85        2        0
 +
    R/A          10      4479        14        0
 +
    A/A          13      180      3028        0
 +
    ./.          71        98        70        0<br>
 +
    Total genotype pairs :      9579
 +
    Concordance          : 96.83% (9275)
 +
    Discordance          :  3.17% (304)
   −
  </div>
+
  # This file contains information on how to process reference data sets.
</div>
+
  #
 +
  # dataset - name of data set, this label will be printed.
 +
  # type    - True Positives (TP) and False Positives (FP)
 +
  #          overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively
 +
  #        - annotation
 +
  #          file is used for GENCODE annotation of frame shift and non frame shift Indels
 +
  # filter - filter applied to variants for this particular data set
 +
  # path    - path of indexed BCF file
 +
  #dataset              type        filter    path
 +
  broad.kb              TP          PASS      /net/fantasia/home/atks/dev/vt/bundle/public/grch37/broad.kb.241365variants.genotypes.bcf
 +
  illumina.platinum    TP          PASS      /net/fantasia/home/atks/dev/vt/bundle/public/grch37/NA12878.illumina.platinum.5284448variants.genotypes.bcf
 +
  #gencode.v19          annotation  .        /net/fantasia/home/atks/dev/vt/bundle/public/grch37/gencode.v19.annotation.gtf.gz
 +
<div class="mw-collapsible-content">
 +
profile_na12878 v0.5
   −
=== Merge candidate variants ===
+
  usage : vt profile_na12878 [options] <in.vcf>
   −
 
+
   options : -g file containing list of reference datasets []
Merge candidate variants across samples.  Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.
  −
 
  −
<div class=" mw-collapsible mw-collapsed">
  −
  #merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf
  −
  vt merge_candidate_variants candidates.txt -o candidate.sites.vcf
  −
<div class="mw-collapsible-content">
  −
  usage : vt merge_candidate_variants [options]
  −
 
  −
   options : -L file containing list of input VCF files
  −
            -o  output VCF file [-]
   
             -I  file containing list of intervals []
 
             -I  file containing list of intervals []
             -i  intervals
+
             -i  intervals []
             -- ignores the rest of the labeled arguments following this flag
+
             -r reference sequence fasta file []
             -h displays help
+
             -? displays help
 
  </div>
 
  </div>
 
</div>
 
</div>
   −
=== Remove overlap ===
+
= Variant Calling =
   −
Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap.
     −
<div class=" mw-collapsible mw-collapsed">
+
=== Discover ===
  #annotates variants that are overlapping 
  −
  vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf
     −
<div class="mw-collapsible-content">
+
Discovers variants from reads in a BAM/CRAM file.
  usage : vt remove_overlap [options] <in.vcf>
  −
 
  −
  options : -o  output VCF file [-]
  −
            -I  file containing list of intervals []
  −
            -i  intervals []
  −
            -?  displays help
  −
</div>
  −
</div>
  −
 
  −
=== Annotate Indels ===
  −
 
  −
Annotates indels with VNTR information and adds a VNTR record.  Facilitates the simultaneous calling of VNTR together with Indels and SNPs.
      
<div class=" mw-collapsible mw-collapsed">
 
<div class=" mw-collapsible mw-collapsed">
   #annotates indels from VCFs with VNTR information.
+
   #discover variants from NA12878.bam and write to stdout
   vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf
+
   vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20
 +
<div class="mw-collapsible-content">
 +
  usage : vt discover2 [options]
   −
<div style="height:20em; overflow:auto; border: 2px solid #FFF">
+
  options : -b  input BAM/CRAM file
  CHROM  POS    ID      REF    ALT    QUAL    FILTER INFO
+
          -y soft clipped unique sequences cutoff [0]
  20      82079  .      G      A      1255.98 .      NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG
+
          -x  soft clipped mean quality cutoff [0]
  20      82217  .       G      A      1632.77 .      NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG
+
          -w  insertion desired type II error [0.0]
  20      83250  .      CTGTGTGTG      C      .      .      NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
+
          -c  insertion desired type I error [0.0]
  20      83250  .       CTGTGTGTGTG    C      .      .      NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
+
          -h  insertion fractional evidence cutoff [0]
  20      83251  .      TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG    <VNTR> .       .      MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT
+
          -g  insertion count cutoff [1]
  20      83252  .      G      C      359.204 .      NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG
+
          -n  deletion desired type II error [0.0]
  20      83260  .       G      C      500.163 .      NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG
+
          -m deletion desired type I error [0.0]
  20      83267  .      T      C      247.043 .      NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
+
          -v  deletion fractional evidence cutoff [0]
  20      83275  .      T      C      609.669 .      NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
+
          -u  deletion count cutoff [1]
  20      90008  .      C       A      1546.88 .      NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA
+
          -k  snp desired type II error [0.0]
  20      91088  .      C      T      1766.04 .      NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC
+
          -j  snp desired type I error [0.0]
  20      91508  .      G      A      1266.93 .      NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG
+
          -f  snp fractional evidence cutoff [0]
  20      91707  .      C      T      888.134 .      NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT
+
          -e  snp evidence count cutoff [1]
  20      92527  .      A      G      828.593 .      NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT
+
          -q  base quality cutoff for bases [0]
  20      93440  .      A      G      688.144 .      NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT
+
          -C likelihood ratio cutoff [0]
  20      93636  .      TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT    <VNTR> .      .      MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT
+
          -B  reference bias [0]
  20      93646  .      C      CT      .      .      NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T
+
          -a  read exclude flag [0x0704]
  20      93717  .      A      T      31.7622 .      NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA
+
          -l  ignore overlapping reads [false]
  20      93931  .      G      A      628.149 .      NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG
+
          -t  MAPQ cutoff for alignments [0]
  20      100699 .      C      T      809.09  .      NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT
+
          -p  ploidy [2]
  20      101362 .      G      A      1087.13 .      NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC
+
          -s  sample ID
 +
          -r reference sequence fasta file []
 +
          -o  output VCF file [-]
 +
          -z  ignore MD tags [0]
 +
          -d  debug [0]
 +
          -I file containing list of intervals []
 +
          -i intervals []
 +
          -?  displays help
    +
</div>
 
</div>
 
</div>
   −
  The following shows the trace of how the algorithm works
+
=== Merge candidate variants ===
 +
 
 +
 
 +
Merge candidate variants across samples.  Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.
   −
    ============================================
+
<div class=" mw-collapsible mw-collapsed">
    ANNOTATING INDEL FUZZILY
+
  #merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf
    ********************************************
+
  vt merge_candidate_variants candidates.txt -o candidate.sites.vcf
    EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT
+
<div class="mw-collapsible-content">
   
+
  usage : vt merge_candidate_variants [options]
    20:131948:C/CCA
+
 
     EXACT REGION 131948-131965 (18)
+
  options : -L  file containing list of input VCF files
                CCACACACACACACACAA
+
            -o  output VCF file [-]
    FINAL EXACT REGION 131948-131965 (18)
+
            -I  file containing list of intervals []
                      CCACACACACACACACAA
+
            -i  intervals
    ********************************************
+
            --  ignores the rest of the labeled arguments following this flag
    PICK CANDIDATE MOTIFS
+
            -h  displays help
   
+
</div>
    Longest Allele : C[CA]CACACACACACACACAA
+
</div>
    detecting motifs for an str
+
 
    seq: CCACACACACACACACACAA
+
=== Remove overlap ===
    len : 20
+
 
    cmax_len : 10
+
Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap.
    candidate motifs: 25
+
 
    AC : 0.894737 2 0
+
<div class=" mw-collapsible mw-collapsed">
    AAC : 0.5 3 0.0555556
+
  #annotates variants that are overlapping 
    ACC : 0.5 3 0.0555556
+
  vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf
    AAAC : 0.0588235 4 0.125 (< 2 copies)
+
 
    ACCC : 0.0588235 4 0.125 (< 2 copies)
+
<div class="mw-collapsible-content">
    AACAC : 0.5 5 0.02
+
  usage : vt remove_overlap [options] <in.vcf>
    ACACC : 0.5 5 0.02
+
 
    AAACAC : 0.0666667 6 0.0555556 (< 2 copies)
+
  options : -o  output VCF file [-]
    ACACCC : 0.0666667 6 0.0555556 (< 2 copies)
+
            -I  file containing list of intervals []
    AACACAC : 0.5 7 0.0102041
+
            -i  intervals []
    ACACACC : 0.5 7 0.0102041
+
            -?  displays help
     AAACACAC : 0.0769231 8 0.03125 (< 2 copies)
+
</div>
    ACACACCC : 0.0769231 8 0.03125 (< 2 copies)
+
</div>
    AACACACAC : 0.5 9 0.00617284 (< 2 copies)
+
 
    ACACACACC : 0.5 9 0.00617284 (< 2 copies)
+
=== Annotate Indels ===
    AAACACACAC : 0.0909091 10 0.02 (< 2 copies)
+
 
    ACACACACCC : 0.0909091 10 0.02 (< 2 copies)
+
Annotates indels with VNTR information and adds a VNTR record.  Facilitates the simultaneous calling of VNTR together with Indels and SNPs.
    ********************************************
+
 
    PICKING NEXT BEST MOTIF
+
<div class=" mw-collapsible mw-collapsed">
      
+
  #annotates indels from VCFs with VNTR information.
     selected:        AC 0.89 0.00
+
  vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf
     ********************************************
+
 
     DETECTING REPEAT TRACT FUZZILY
+
<div style="height:20em; overflow:auto; border: 2px solid #FFF">
    ++++++++++++++++++++++++++++++++++++++++++++
+
  CHROM  POS    ID      REF    ALT    QUAL    FILTER  INFO
    Exact left/right alignment
+
  20      82079  .      G      A      1255.98 .      NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG
   
+
  20      82217  .      G      A      1632.77 .      NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG
    repeat_tract              : CACACACACACACACA
+
  20      83250  .      CTGTGTGTG      C      .      .      NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
    position                  : [131949,131964]
+
  20      83250  .      CTGTGTGTGTG     C       .      .      NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
    motif_concordance        : 1
+
  20      83251  .      TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG    <VNTR>  .      .       MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT
    repeat units              : 8
+
  20      83252  .      G      C      359.204 .       NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG
    exact repeat units        : 8
+
  20      83260  .      G      C      500.163 .       NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG
    total no. of repeat units : 8
+
  20      83267  .      T      C      247.043 .       NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
   
+
  20      83275  .      T      C      609.669 .       NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
    ++++++++++++++++++++++++++++++++++++++++++++
+
  20      90008  .      C      A      1546.88 .       NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA
    Fuzzy right alignment
+
  20      91088  .      C      T      1766.04 .       NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC
   
+
  20      91508  .      G      A      1266.93 .       NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG
    repeat motif : CA
+
  20      91707  .      C      T      888.134 .       NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT
    rflank      : AACTC
+
  20      92527  .      A      G      828.593 .       NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT
    mlen        : 2
+
  20      93440  .      A      G      688.144 .       NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT
    rflen        : 5
+
  20      93636  .      TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT     <VNTR>  .      .      MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT
    plen        : 111
+
  20      93646  .      C      CT      .      .       NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T
 +
  20      93717  .      A      T      31.7622 .       NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA
 +
  20      93931  .      G      A      628.149 .       NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG
 +
  20      100699  .      C      T      809.09  .       NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT
 +
  20      101362  .      G      A      1087.13 .       NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC
 +
 
 +
</div>
 +
 
 +
  The following shows the trace of how the algorithm works
 +
 
 +
     ============================================
 +
     ANNOTATING INDEL FUZZILY
 +
     ********************************************
 +
     EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT
 
      
 
      
     read        : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC
+
     20:131948:C/CCA
    rlen        : 106
+
     EXACT REGION 131948-131965 (18)
      
+
                CCACACACACACACACAA
    optimal score: 50.5073
+
     FINAL EXACT REGION 131948-131965 (18)
     optimal state: MR
+
                      CCACACACACACACACAA
    optimal track: MR|r|0|5
+
     ********************************************
    optimal probe len: 25
+
     PICK CANDIDATE MOTIFS
    optimal path length : 107
  −
     max j: 106
  −
     probe: (1~82) [1~10] (1~5)
  −
    read : (1~82) [83~101] (102~106)
   
      
 
      
     motif #          : 10 [83,101]
+
     Longest Allele : C[CA]CACACACACACACACAA
     motif concordance : 95% (9/10)
+
    detecting motifs for an str
     motif discordance : 0|1|0|0|0|0|0|0|0|0
+
    seq: CCACACACACACACACACAA
      
+
    len : 20
     Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC
+
    cmax_len : 10
          SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME
+
    candidate motifs: 25
                                                                                              oo++oo++oo++oo++oo++RRRRR
+
    AC : 0.894737 2 0
     Read:   AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC
+
    AAC : 0.5 3 0.0555556
 +
     ACC : 0.5 3 0.0555556
 +
    AAAC : 0.0588235 4 0.125 (< 2 copies)
 +
    ACCC : 0.0588235 4 0.125 (< 2 copies)
 +
     AACAC : 0.5 5 0.02
 +
    ACACC : 0.5 5 0.02
 +
    AAACAC : 0.0666667 6 0.0555556 (< 2 copies)
 +
    ACACCC : 0.0666667 6 0.0555556 (< 2 copies)
 +
    AACACAC : 0.5 7 0.0102041
 +
    ACACACC : 0.5 7 0.0102041
 +
    AAACACAC : 0.0769231 8 0.03125 (< 2 copies)
 +
     ACACACCC : 0.0769231 8 0.03125 (< 2 copies)
 +
     AACACACAC : 0.5 9 0.00617284 (< 2 copies)
 +
    ACACACACC : 0.5 9 0.00617284 (< 2 copies)
 +
    AAACACACAC : 0.0909091 10 0.02 (< 2 copies)
 +
     ACACACACCC : 0.0909091 10 0.02 (< 2 copies)
 +
    ********************************************
 +
    PICKING NEXT BEST MOTIF
 
      
 
      
 +
    selected:        AC 0.89 0.00
 +
    ********************************************
 +
    DETECTING REPEAT TRACT FUZZILY
 
     ++++++++++++++++++++++++++++++++++++++++++++
 
     ++++++++++++++++++++++++++++++++++++++++++++
     Fuzzy left alignment
+
     Exact left/right alignment
 
      
 
      
     lflank      : ATCTTA
+
     repeat_tract              : CACACACACACACACA
     repeat motif : CA
+
    position                  : [131949,131964]
     lflen        : 6
+
    motif_concordance        : 1
 +
    repeat units              : 8
 +
    exact repeat units        : 8
 +
    total no. of repeat units : 8
 +
   
 +
    ++++++++++++++++++++++++++++++++++++++++++++
 +
    Fuzzy right alignment
 +
   
 +
     repeat motif : CA
 +
     rflank      : AACTC
 
     mlen        : 2
 
     mlen        : 2
 +
    rflen        : 5
 
     plen        : 111
 
     plen        : 111
 
      
 
      
     read        : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
+
     read        : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC
     rlen        : 105
+
     rlen        : 106
 
      
 
      
     optimal score: 50.5858
+
     optimal score: 50.5073
     optimal state: Z
+
     optimal state: MR
     optimal track: Z|m|10|2
+
     optimal track: MR|r|0|5
     optimal probe len: 26
+
     optimal probe len: 25
     optimal path length : 106
+
     optimal path length : 107
     max j: 105
+
     max j: 106
     mismatch penalty: 3
+
     probe: (1~82) [1~10] (1~5)
 +
    read : (1~82) [83~101] (102~106)
 
      
 
      
    model: (1~6) [1~10]
+
     motif #          : 10 [83,101]
    read : (1~6) [7~25][26~106]
  −
   
  −
     motif #          : 10 [7,25]
   
     motif concordance : 95% (9/10)
 
     motif concordance : 95% (9/10)
 
     motif discordance : 0|1|0|0|0|0|0|0|0|0
 
     motif discordance : 0|1|0|0|0|0|0|0|0|0
 
      
 
      
     Model:  ATCTTACACACACACACACACACACA--------------------------------------------------------------------------------  
+
     Model:  ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC
           SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE
+
           SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME
            LLLLLLoo++oo++oo++oo++oo++                                                                                
+
                                                                                              oo++oo++oo++oo++oo++RRRRR
     Read:  ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
+
     Read:  AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC
 
      
 
      
     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
     ++++++++++++++++++++++++++++++++++++++++++++
     VNTR Summary
+
     Fuzzy left alignment
    rid          : 19
  −
    motif        : AC
  −
    ru          : CA
   
      
 
      
     Exact
+
     lflank      : ATCTTA
     repeat_tract                    : CACACACACACACACA
+
     repeat motif : CA
     position                        : [131949,131964]
+
     lflen        : 6
     reference repeat unit length    : 8
+
     mlen        : 2
     motif_concordance              : 1
+
     plen        : 111
     repeat units                    : 8
+
      
     exact repeat units              : 8
+
     read        : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
     total no. of repeat units      : 8
+
     rlen        : 105
 
      
 
      
     Fuzzy
+
    optimal score: 50.5858
     repeat_tract                    : CACCACACACACACACACA
+
    optimal state: Z
     position                        : [131946,131964]
+
    optimal track: Z|m|10|2
     reference repeat unit length    : 19
+
    optimal probe len: 26
     motif_concordance              : 0.95
+
    optimal path length : 106
     repeat units                    : 19
+
    max j: 105
     exact repeat units              : 9
+
    mismatch penalty: 3
     total no. of repeat units      : 10
+
   
     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
    model: (1~6) [1~10]
 
+
    read : (1~6) [7~25][26~106]
<div class="mw-collapsible-content">
+
   
   usage : vt annotate_indels [options] <in.vcf>
+
    motif #          : 10 [7,25]
 
+
    motif concordance : 95% (9/10)
   options : -v  add vntr record [false]
+
    motif discordance : 0|1|0|0|0|0|0|0|0|0
             -x  override tags [false]
+
   
             -f  filter expression []
+
    Model:  ATCTTACACACACACACACACACACA--------------------------------------------------------------------------------
             -d  debug [false]
+
          SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE
             -m  mode [f]
+
            LLLLLLoo++oo++oo++oo++oo++                                                                               
                 e : by exact alignment              f : by fuzzy alignment
+
    Read:  ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT
             -c  classification schemas of tandem repeat [6]
+
   
                 1 : lai2003     
+
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                 2 : kelkar2008   
+
    VNTR Summary
                 3 : fondon2012   
+
    rid          : 19
                 4 : ananda2013   
+
    motif        : AC
                 5 : willems2014  
+
    ru          : CA
                 6 : tan_kang2015
+
   
             -a  annotation type [v]
+
    Exact
                 v : a. output VNTR variant (defined by classification).
+
    repeat_tract                    : CACACACACACACACA
                       RU                    repeat unit on reference sequence (CA)
+
    position                        : [131949,131964]
                       MOTIF                canonical representation (AC)
+
    reference repeat unit length    : 8
                       RL                    repeat tract length in bases (11)
+
    motif_concordance              : 1
                       FLANKS                flanking positions of repeat tract determined by exact alignment
+
    repeat units                    : 8
                       RU_COUNTS            number of exact repeat units and total number of repeat units in
+
    exact repeat units              : 8
                                             repeat tract determined by exact alignment
+
    total no. of repeat units      : 8
                       FZ_RL                fuzzy repeat tract length in bases (11)
+
   
                       FZ_FLANKS            flanking positions of repeat tract determined by fuzzy alignment
+
     Fuzzy
                       FZ_RU_COUNTS          number of exact repeat units and total number of repeat units in
+
     repeat_tract                    : CACCACACACACACACACA
                                             repeat tract determined by fuzzy alignment
+
     position                        : [131946,131964]
                       FLANKSEQ              flanking sequence of indel
+
     reference repeat unit length    : 19
                       LARGE_REPEAT_REGION  repeat region exceeding 2000bp
+
     motif_concordance              : 0.95
                     b. mark indels with overlapping VNTR.
+
     repeat units                    : 19
                       FLANKS      flanking positions of repeat tract determined by exact alignment
+
     exact repeat units              : 9
                       FZ_FLANKS    flanking positions of repeat tract determined by fuzzy alignment
+
     total no. of repeat units      : 10
                       GMOTIF      generating motif used in fuzzy alignment
+
     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                       TR    position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>)
+
 
                 a : annotate each indel with RU, RL, MOTIF, REF.
+
<div class="mw-collapsible-content">
             -r  reference sequence fasta file []
+
   usage : vt annotate_indels [options] <in.vcf>
             -o  output VCF file [-]
+
 
             -I  file containing list of intervals []
+
   options : -v  add vntr record [false]
             -i  intervals
+
             -x  override tags [false]
             -?  displays help
+
             -f  filter expression []
  </div>
+
             -d  debug [false]
</div>
+
             -m  mode [f]
 
+
                 e : by exact alignment              f : by fuzzy alignment
=== Construct Probes ===
+
             -c  classification schemas of tandem repeat [6]
 +
                 1 : lai2003     
 +
                 2 : kelkar2008   
 +
                 3 : fondon2012   
 +
                 4 : ananda2013   
 +
                 5 : willems2014  
 +
                 6 : tan_kang2015
 +
             -a  annotation type [v]
 +
                 v : a. output VNTR variant (defined by classification).
 +
                       RU                    repeat unit on reference sequence (CA)
 +
                       MOTIF                canonical representation (AC)
 +
                       RL                    repeat tract length in bases (11)
 +
                       FLANKS                flanking positions of repeat tract determined by exact alignment
 +
                       RU_COUNTS            number of exact repeat units and total number of repeat units in
 +
                                             repeat tract determined by exact alignment
 +
                       FZ_RL                fuzzy repeat tract length in bases (11)
 +
                       FZ_FLANKS            flanking positions of repeat tract determined by fuzzy alignment
 +
                       FZ_RU_COUNTS          number of exact repeat units and total number of repeat units in
 +
                                             repeat tract determined by fuzzy alignment
 +
                       FLANKSEQ              flanking sequence of indel
 +
                       LARGE_REPEAT_REGION  repeat region exceeding 2000bp
 +
                     b. mark indels with overlapping VNTR.
 +
                       FLANKS      flanking positions of repeat tract determined by exact alignment
 +
                       FZ_FLANKS    flanking positions of repeat tract determined by fuzzy alignment
 +
                       GMOTIF      generating motif used in fuzzy alignment
 +
                       TR    position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>)
 +
                 a : annotate each indel with RU, RL, MOTIF, REF.
 +
             -r  reference sequence fasta file []
 +
             -o  output VCF file [-]
 +
             -I  file containing list of intervals []
 +
             -i  intervals
 +
             -?  displays help
 +
  </div>
 +
</div>
 +
 
 +
=== Construct Probes ===
      Line 1,606: Line 1,696:  
   #construct probes from candidate.sites.bcf and output to standard out
 
   #construct probes from candidate.sites.bcf and output to standard out
 
   vt construct_probes candidates.sites.bcf -r ref.fa
 
   vt construct_probes candidates.sites.bcf -r ref.fa
<div class="mw-collapsible-content">
+
<div class="mw-collapsible-content">
   usage : vt construct_probes [options] <in.vcf>
+
   usage : vt construct_probes [options] <in.vcf>
 +
 
 +
  options : -o  output VCF file [-]
 +
            -f  minimum flank length [20]
 +
            -r  reference sequence fasta file []
 +
            -I  file containing list of intervals []
 +
            -i  intervals []
 +
            --  ignores the rest of the labeled arguments following this flag
 +
            -h  displays help
 +
</div>
 +
</div>
 +
 
 +
=== Genotype ===
 +
 
 +
Genotypes variants for each sample.
 +
 
 +
<div class=" mw-collapsible mw-collapsed">
 +
  #genotypes variants found in candidate.sites.vcf from sample.bam
 +
  vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf
 +
<div class="mw-collapsible-content">
 +
  usage : vt genotype [options]
 +
 
 +
  options : -r  reference sequence fasta file []
 +
            -s  sample ID []
 +
            -o  output VCF file [-]
 +
            -b  input BAM file []
 +
            -i  input candidate VCF file []
 +
            --  ignores the rest of the labeled arguments following this flag
 +
            -h  displays help
 +
</div>
 +
</div>
 +
 
 +
= Pedigree File =
 +
 
 +
  vt understands an augmented version introduced by [mailto:hmkang@umich.edu Hyun] of the PED described by [http://zzz.bwh.harvard.edu/plink/data.shtml#ped plink].
 +
  The pedigree file format is as follows with the following mandatory fields:
 +
       
 +
{| class="wikitable"
 +
|-
 +
! scope="col"| Field
 +
! scope="col"| Description
 +
! scope="col"| Valid Values
 +
! scope="col"| Missing Values
 +
|-
 +
|Family ID<br>
 +
Individual ID<br>
 +
Paternal ID<br>
 +
Maternal ID<br>
 +
Sex<br>
 +
Phenotype
 +
|ID of this family <br>
 +
ID(s) of this individual (comma separated) <br>
 +
ID of the father <br>
 +
ID of the mother <br>
 +
Sex of the individual<br>
 +
Phenotype
 +
|[A-Za-z0-9_]+<br>
 +
[A-Za-z0-9_]+(,[A-Za-z0-9_]+)* <br>
 +
[A-Za-z0-9_]+ <br>
 +
[A-Za-z0-9_]+<br>
 +
1=male, 2=female, other, male, female<br>
 +
[A-Za-z0-9_]+
 +
|  0 <br>
 +
cannot be missing <br>
 +
0 <br>
 +
0 <br>
 +
other<br>
 +
-9
 +
|}
   −
   options : -o  output VCF file [-]
+
   Examples:    
            -f  minimum flank length [20]
  −
            -r  reference sequence fasta file []
  −
            -I  file containing list of intervals []
  −
            -i  intervals []
  −
            --  ignores the rest of the labeled arguments following this flag
  −
            -h  displays help
  −
</div>
  −
</div>
     −
=== Genotype ===
+
    ceu      NA12878    NA12891    NA12892    female    -9
 +
    yri      NA19240    NA19239    NA19238    female    -9
   −
Genotypes variants for each sample.
+
    ceu      NA12878    NA12891    NA12892    2    -9
 +
    yri      NA19240    NA19239    NA19238    2    -9
   −
<div class=" mw-collapsible mw-collapsed">
+
    #allows tools like profile_mendelian to detect duplicates and check for concordance
  #genotypes variants found in candidate.sites.vcf from sample.bam
+
    ceu      NA12878,NA12878A   NA12891    NA12892    female  case
   vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf
+
    yri      NA19240            NA19239    NA19238    female   control
<div class="mw-collapsible-content">
  −
   usage : vt genotype [options]
     −
  options : -r  reference sequence fasta file []
+
    #allows tools like profile_mendelian to detect duplicates and check for concordance
            -s  sample ID []
+
    ceu      NA12412    0 0    female case
            -o  output VCF file [-]
+
    yri      NA19650    0 0    female control
            -b input BAM file []
  −
            -i input candidate VCF file []
  −
            -- ignores the rest of the labeled arguments following this flag
  −
            -h displays help
  −
</div>
  −
</div>
      
= Resource Bundle =
 
= Resource Bundle =
1,102

edits

Navigation menu