Line 54: |
Line 54: |
| </div> | | </div> |
| | | |
− | Building has been tested on Linux and Mac systems on gcc 4.8.1 and clang 3.4. <br>
| + | === Mac === |
− | Some features of C++11 is used, thus there is a need for newer versions of gcc and clang.
| + | |
| + | You will need to install the package xz prior to installing vt. |
| | | |
− | == Mac ==
| + | homebrew install xz |
| | | |
− | You may also install vt on mac via homebrew.
| |
| | | |
− | brew install homebrew/science/vt
| + | Building has been tested on Linux and Mac systems on gcc 4.8.1 and clang 3.4. <br> |
| + | Some features of C++11 are used, thus there is a need for newer versions of gcc and clang. |
| | | |
| = Updating = | | = Updating = |
Line 719: |
Line 720: |
| | | |
| <div class=" mw-collapsible mw-collapsed"> | | <div class=" mw-collapsible mw-collapsed"> |
− | #converts in.bcf to tab format with selected INFO fields | + | #converts in.bcf to tab format with selected INFO and FILTER fields |
− | vt info2tab in.bcf -v -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT | + | vt info2tab in.bcf -u PASS -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT |
− | | |
| <div style="height:6em; overflow:auto; border: 2px solid #FFF"> | | <div style="height:6em; overflow:auto; border: 2px solid #FFF"> |
| + | INPUT |
| + | ===== |
| 20 17548608 . A AC . PASS CENTERS=vbi;NCENTERS=1;OLD_MULTIALLELIC=20:17548598:GAAAAAAAAAAAAA/GAAAAAAAAAAAA/GAAAAAAAAAAAAAA/GAAAAAAAAAA/GAAAAAAAAAAA/GAAAAAAAAAACAAA;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAACAAAG;EX_MOTIF=C;EX_MLEN=1;EX_RU=C;EX_BASIS=C;EX_BLEN=1;EX_REPEAT_TRACT=17548608,17548609;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=2;EX_RL=2;EX_LL=3;EX_RU_COUNTS=0,2;EX_SCORE=0;EX_TRF_SCORE=-14;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=14;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[A]AAAGAAGGAA;MDUST;LOBSTR | | 20 17548608 . A AC . PASS CENTERS=vbi;NCENTERS=1;OLD_MULTIALLELIC=20:17548598:GAAAAAAAAAAAAA/GAAAAAAAAAAAA/GAAAAAAAAAAAAAA/GAAAAAAAAAA/GAAAAAAAAAAA/GAAAAAAAAAACAAA;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAACAAAG;EX_MOTIF=C;EX_MLEN=1;EX_RU=C;EX_BASIS=C;EX_BLEN=1;EX_REPEAT_TRACT=17548608,17548609;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=2;EX_RL=2;EX_LL=3;EX_RU_COUNTS=0,2;EX_SCORE=0;EX_TRF_SCORE=-14;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=14;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[A]AAAGAAGGAA;MDUST;LOBSTR |
| 20 17548608 . AAAAG A . PASS CENTERS=ox1;NCENTERS=1;EX_MOTIF=AAAG;EX_MLEN=4;EX_RU=AAAG;EX_BASIS=AG;EX_BLEN=2;EX_REPEAT_TRACT=17548609,17548612;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=0.75;EX_RL=4;EX_LL=4;EX_RU_COUNTS=0,1;EX_SCORE=0.75;EX_TRF_SCORE=-1;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=13;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[AAAAG]AAGGAACTAC;MDUST;LOBSTR;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAA | | 20 17548608 . AAAAG A . PASS CENTERS=ox1;NCENTERS=1;EX_MOTIF=AAAG;EX_MLEN=4;EX_RU=AAAG;EX_BASIS=AG;EX_BLEN=2;EX_REPEAT_TRACT=17548609,17548612;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=0.75;EX_RL=4;EX_LL=4;EX_RU_COUNTS=0,1;EX_SCORE=0.75;EX_TRF_SCORE=-1;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=13;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[AAAAG]AAGGAACTAC;MDUST;LOBSTR;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAA |
− |
| |
| </div> | | </div> |
− | | + | OUTPUT |
− | CHROM POS REF ALT N_ALLELE EX_RL FZ_RL MDUST LOBSTR VNTRSEEK RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2 | + | ====== |
− | 20 17548608 A AC 2 2 13 1 1 0 0 17548608 17548608 | + | CHROM POS REF ALT N_ALLELE PASS EX_RL FZ_RL MDUST LOBSTR VNTRSEEK RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2 |
− | 20 17548608 AAAAG A 2 4 13 1 1 0 0 17548609 17548609 | + | 20 17548608 A AC 2 1 2 13 1 1 0 0 17548608 17548608 |
| + | 20 17548608 AAAAG A 2 1 4 13 1 1 0 0 17548609 17548609 |
| | | |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
| usage : vt info2tab [options] <in.vcf> | | usage : vt info2tab [options] <in.vcf> |
| | | |
− | options : -v print variant CHROM,POS,REF,ALT,N_ALLELE [false] | + | options : -d debug [false] |
− | -d debug [false]
| |
| -f filter expression [] | | -f filter expression [] |
− | -t list of info tags to be extracted [] | + | -u list of filter tags to be extracted []-t list of info tags to be extracted [] |
| -o output tab delimited file [-] | | -o output tab delimited file [-] |
| -I file containing list of intervals [] | | -I file containing list of intervals [] |
Line 1,056: |
Line 1,057: |
| </div> | | </div> |
| | | |
− | === Profile SNPs === | + | === Profile Mendelian Errors === |
| | | |
− | Profile SNPs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]]. | + | Profile Mendelian errors |
| | | |
| <div class=" mw-collapsible mw-collapsed"> | | <div class=" mw-collapsible mw-collapsed"> |
− | #profile snps found in 20.sites.vcf | + | #profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex. |
− | vt profile_snps -g snp.reference.txt 20.sites.vcf -r hs37d5.fa -i 20 | + | vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel |
| + | |
| + | pedigree file format is described in [[Vt#Pedigree File|here]]. |
| | | |
− | #this is a sample output for indel profiling.
| + | #this is a sample output for mendelian error profiling. |
− | # square brackets contain the ts/tv ratio.
| + | #R and A stand for reference and alternate allele respectively. |
− | # The numbers in curved bracket are the counts of ts and tv SNPs respectively.
| + | #Error% - mendelian error (confounded with de novo mutation) |
− | # Low complexity shows what percent of the SNPs are in low complexity regions.
| + | #HomHet - Homozygous-Heterozygous genotype ratios |
− | data set | + | #Het% - proportion of hets |
− | No. SNPs : 508603 [2.09]
| + | Mendelian Errors <br> |
− | Low complexity : 0.08 (39837/508603) <br> | + | Father Mother R/R R/A A/A Error(%) HomHet Het(%) |
− | 1000g
| + | R/R R/R 14889 210 38 1.64 nan nan |
− | A-B 109970 [1.39]
| + | R/R R/A 3403 3497 74 1.06 0.97 50.68 |
− | A&B 398633 [2.37] | + | R/R A/A 176 1482 155 18.26 nan nan |
− | B-A 1340682 [2.26]
| + | R/A R/R 3665 3652 68 0.92 1.00 49.91 |
− | Precision 78.4% | + | R/A R/A 1015 3151 990 0.00 0.64 61.11 |
− | Sensitivity 22.9% <br> | + | R/A A/A 43 1300 1401 1.57 1.08 48.13 |
− | dbsnp
| + | A/A R/R 172 1365 147 18.94 nan nan |
− | A-B 324063 [1.99] | + | A/A R/A 47 1164 1183 1.96 1.02 49.60 |
− | A&B 184540 [2.29] | + | A/A A/A 20 78 5637 1.71 nan nan <br> |
− | B-A 103893 [2.60] | + | Parental R/R R/A A/A Error(%) HomHet Het(%) |
− | Precision 36.3% | + | R/R R/R 14889 210 38 1.64 nan nan |
− | Sensitivity 64.0% | + | R/R R/A 7068 7149 142 0.99 0.99 50.28 |
| + | R/R A/A 348 2847 302 18.59 nan nan |
| + | R/A R/A 1015 3151 990 0.00 0.64 61.11 |
| + | R/A A/A 90 2464 2584 1.75 1.05 48.81 |
| + | A/A A/A 20 78 5637 1.71 nan nan <br> |
| + | Parental R/R R/A A/A Error(%) HomHet Het(%) |
| + | HOM HOM 14909 288 5675 1.66 nan nan |
| + | HOM HET 7158 9613 2726 1.19 1.00 49.90 |
| + | HET HET 1015 3151 990 0.00 0.64 61.11 |
| + | HOMREF HOMALT 348 2847 302 18.59 nan nan <br> |
| + | total mendelian error : 2.505% |
| + | no. of trios : 2 |
| + | no. of variants : 25346 |
| + | |
| + | <div class="mw-collapsible-content"> |
| + | profile_mendelian v0.5 |
| | | |
− | # This file contains information on how to process reference data sets. | + | usage : vt profile_mendelian [options] <in.vcf> |
− | #
| + | |
− | # dataset - name of data set, this label will be printed. | + | options : -q minimum genotype quality |
− | # type - True Positives (TP) and False Positives (FP)
| + | -d minimum depth |
− | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively
| + | -r reference sequence fasta file [] |
− | # - annotation
| + | -x output latex directory [] |
− | # file is used for GENCODE annotation of frame shift and non frame shift Indels
| + | -p pedigree file |
− | # filter - filter applied to variants for this particular data set
| + | -I file containing list of intervals [] |
− | # path - path of indexed BCF file
| + | -i intervals |
− | #dataset type filter path
| + | -? displays help |
− | 1000g TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf
| + | </div> |
− | dbsnp TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf
| + | </div> |
− | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz
| + | |
− | DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
| + | === Profile SNPs === |
| | | |
− | <div class="mw-collapsible-content">
| + | Profile SNPs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]]. |
− | usage : vt profile_snps [options] <in.vcf>
| |
| | | |
− | options : -f filter expression []
| + | <div class=" mw-collapsible mw-collapsed"> |
− | -g file containing list of reference datasets []
| + | #profile snps found in 20.sites.vcf |
− | -I file containing list of intervals []
| + | vt profile_snps -g snp.reference.txt 20.sites.vcf -r hs37d5.fa -i 20 |
− | -i intervals []
| |
− | -r reference sequence fasta file []
| |
− | -? displays help
| |
− | </div>
| |
− | </div>
| |
| | | |
− | === Profile Indels ===
| + | #this is a sample output for indel profiling. |
− | | + | # square brackets contain the ts/tv ratio. |
− | Profile Indels. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].
| + | # The numbers in curved bracket are the counts of ts and tv SNPs respectively. |
| + | # Low complexity shows what percent of the SNPs are in low complexity regions. |
| + | data set |
| + | No. SNPs : 508603 [2.09] |
| + | Low complexity : 0.08 (39837/508603) <br> |
| + | 1000g |
| + | A-B 109970 [1.39] |
| + | A&B 398633 [2.37] |
| + | B-A 1340682 [2.26] |
| + | Precision 78.4% |
| + | Sensitivity 22.9% <br> |
| + | dbsnp |
| + | A-B 324063 [1.99] |
| + | A&B 184540 [2.29] |
| + | B-A 103893 [2.60] |
| + | Precision 36.3% |
| + | Sensitivity 64.0% |
| | | |
− | <div class=" mw-collapsible mw-collapsed">
| + | # This file contains information on how to process reference data sets. |
− | #profile indels found in mills.vcf
| + | # |
− | vt profile_indels -g indel.reference.txt mills.vcf -r hs37d5.fa -i 20
| + | # dataset - name of data set, this label will be printed. |
− | | + | # type - True Positives (TP) and False Positives (FP) |
− | #this is a sample output for indel profiling. | + | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively |
− | # square brackets contain the ins/del ratio. | + | # - annotation |
− | # for the FS/NFS field, that is the proportion of coding indels that are frame shifted. | + | # file is used for GENCODE annotation of frame shift and non frame shift Indels |
− | # The numbers in curved bracket are the counts of frame shift and non frame shift indels respectively. | + | # filter - filter applied to variants for this particular data set |
− | data set | + | # path - path of indexed BCF file |
− | No Indels : 46974 [0.89]
| + | #dataset type filter path |
− | FS/NFS : 0.26 (8/23) <br>
| + | 1000g TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf |
− | dbsnp
| + | dbsnp TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf |
− | A-B 30704 [0.92]
| + | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz |
− | A&B 16270 [0.83]
| + | DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz |
− | B-A 2049488 [1.52]
| + | |
− | Precision 34.6%
| + | <div class="mw-collapsible-content"> |
− | Sensitivity 0.8% <br>
| + | usage : vt profile_snps [options] <in.vcf> |
− | mills
| |
− | A-B 43234 [0.88]
| |
− | A&B 3740 [1.00]
| |
− | B-A 203278 [0.98]
| |
− | Precision 8.0%
| |
− | Sensitivity 1.8% <br>
| |
− | mills.chip | |
− | A-B 46847 [0.89]
| |
− | A&B 127 [0.90]
| |
− | B-A 8777 [0.93]
| |
− | Precision 0.3%
| |
− | Sensitivity 1.4% <br>
| |
− | affy.exome.chip
| |
− | A-B 46911 [0.89]
| |
− | A&B 63 [0.43]
| |
− | B-A 33997 [0.47]
| |
− | Precision 0.1%
| |
− | Sensitivity 0.2% <br>
| |
| | | |
− | # This file contains information on how to process reference data sets. | + | options : -f filter expression [] |
− | # dataset - name of data set, this label will be printed.
| + | -g file containing list of reference datasets [] |
− | # type - True Positives (TP) and False Positives (FP).
| + | -I file containing list of intervals [] |
− | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.
| + | -i intervals [] |
− | # - annotation.
| + | -r reference sequence fasta file [] |
− | # file is used for GENCODE annotation of frame shift and non frame shift Indels.
| + | -? displays help |
− | # filter - filter applied to variants for this particular data set.
| |
− | # path - path of indexed BCF file.
| |
− | #dataset type filter path
| |
− | 1000g TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf
| |
− | mills TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf
| |
− | dbsnp TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf
| |
− | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz
| |
− | DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz
| |
− | | |
− | <div class="mw-collapsible-content">
| |
− | usage : vt profile_indels [options] <in.vcf>
| |
− | | |
− | options : -g file containing list of reference datasets []
| |
− | -I file containing list of intervals [] | |
− | -i intervals [] | |
− | -r reference sequence fasta file [] | |
− | -? displays help | |
| </div> | | </div> |
| </div> | | </div> |
| | | |
− | === Profile VNTRs === | + | === Profile Indels === |
| | | |
− | Profile VNTRs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]]. | + | Profile Indels. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]]. |
| | | |
| <div class=" mw-collapsible mw-collapsed"> | | <div class=" mw-collapsible mw-collapsed"> |
| + | #profile indels found in mills.vcf |
| + | vt profile_indels -g indel.reference.txt mills.vcf -r hs37d5.fa -i 20 |
| | | |
− | #profiles a set of VNTRs | + | #this is a sample output for indel profiling. |
− | vt profile_vntrs vntrs.sites.bcf -g vntr.reference.txt | + | # square brackets contain the ins/del ratio. |
− | | + | # for the FS/NFS field, that is the proportion of coding indels that are frame shifted. |
− | | + | # The numbers in curved bracket are the counts of frame shift and non frame shift indels respectively. |
− | profile_vntrs v0.5
| + | data set |
− | | + | No Indels : 46974 [0.89] |
− | no VNTRs 5660874 #number of VNTRs in vntrs.sites.bcf
| + | FS/NFS : 0.26 (8/23) <br> |
− | no low complexity 2686460 (47.46%) #number of VNTRs in low complexity region determined by MDUST
| + | dbsnp |
− | no coding 17911 (0.32%) #number of VNTRs in coding regions determined by GENCODE v7 | + | A-B 30704 [0.92] |
− | no redundant 1312209 (23.18%) #number of VNTRs involved in overlapping with one another<br>
| + | A&B 16270 [0.83] |
− | trf_lobstr (1638516) #TRF based reference set used in lobSTR, motif lengths 1 to 6. | + | B-A 2049488 [1.52] |
− | A-B 3269285 #TRs specific to vntrs.sites.bcf | + | Precision 34.6% |
− | A-B~ 1666185 #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR. | + | Sensitivity 0.8% <br> |
− | A&B1 725404 #TRs in vntrs.sites.bcf that overlap exactly with at least one TR in TRF(lobSTR) | + | mills |
− | A&B2 723195 #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs.sites.bcf | + | A-B 43234 [0.88] |
− | B-A~ 710075 #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR. | + | A&B 3740 [1.00] |
− | B-A 205246 #TRs specific to TRF(lobSTR)
| + | B-A 203278 [0.98] |
− | #note that the first 3 rows should sum up to the number of TRs in vntrs.sites.bcf | + | Precision 8.0% |
− | #and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR)
| + | Sensitivity 1.8% <br> |
− | #This basically allows us to see the m to n overlapping in overlapping TRs<br>
| + | mills.chip |
− | trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6. | + | A-B 46847 [0.89] |
− | A-B 3291652 | + | A&B 127 [0.90] |
− | A-B~ 1650190 | + | B-A 8777 [0.93] |
− | A&B1 719032 | + | Precision 0.3% |
− | A&B2 716838 | + | Sensitivity 1.4% <br> |
− | B-A~ 703948 | + | affy.exome.chip |
− | B-A 203767 <br>
| + | A-B 46911 [0.89] |
− | trf_vntrseek (230306) #TRF based reference set used in vntrseek, motif lengths 7 to 2000. | + | A&B 63 [0.43] |
− | A-B 5384453 | + | B-A 33997 [0.47] |
− | A-B~ 271302 | + | Precision 0.1% |
− | A&B1 5119 | + | Sensitivity 0.2% <br> |
− | A&B2 4973 | |
− | B-A~ 92496 | |
− | B-A 132837 <br> | |
− | codis+ (15) #CODIS STRs + 2 STRs from PROMEGA | |
− | A-B 5660794 | |
− | A-B~ 79 | |
− | A&B1 1 | |
− | A&B2 1 | |
− | B-A~ 14 | |
− | B-A 0
| |
| | | |
| # This file contains information on how to process reference data sets. | | # This file contains information on how to process reference data sets. |
Line 1,230: |
Line 1,215: |
| # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively. | | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively. |
| # - annotation. | | # - annotation. |
− | # file is used for GENCODE annotation of coding VNTRs. | + | # file is used for GENCODE annotation of frame shift and non frame shift Indels. |
| # filter - filter applied to variants for this particular data set. | | # filter - filter applied to variants for this particular data set. |
| # path - path of indexed BCF file. | | # path - path of indexed BCF file. |
− | #dataset type filter path | + | #dataset type filter path |
− | trf_lobstr TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.lobstr.sites.bcf | + | 1000g TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf |
− | trf_repeatseq TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.repeatseq.sites.bcf | + | mills TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf |
− | trf_vntrseek TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.vntrseek.sites.bcf | + | dbsnp TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf |
− | codis+ TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/codis.strs.sites.bcf | + | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz |
− | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz | + | DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz |
− | DUST cplx_annotation .
| |
| | | |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | usage : vt profile_vntrs [options] <in.vcf> | + | usage : vt profile_indels [options] <in.vcf> |
| | | |
| options : -g file containing list of reference datasets [] | | options : -g file containing list of reference datasets [] |
Line 1,252: |
Line 1,236: |
| </div> | | </div> |
| | | |
− | === Profile Mendelian Errors === | + | === Profile VNTRs === |
| | | |
− | Profile Mendelian errors | + | Profile VNTRs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]]. |
| | | |
| <div class=" mw-collapsible mw-collapsed"> | | <div class=" mw-collapsible mw-collapsed"> |
− | #profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex.
| |
− | vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel
| |
| | | |
− | pedigree file format is described in [http://csg.sph.umich.edu//abecasis/merlin/tour/input_files.html here]
| + | #profiles a set of VNTRs |
| + | vt profile_vntrs vntrs.sites.bcf -g vntr.reference.txt |
| + | |
| | | |
− | #this is a sample output for mendelian error profiling.
| + | profile_vntrs v0.5 |
− | #R and A stand for reference and alternate allele respectively.
| + | |
− | #Error% - mendelian error (confounded with de novo mutation)
| + | no VNTRs 5660874 #number of VNTRs in vntrs.sites.bcf |
− | #HomHet - Homozygous-Heterozygous genotype ratios
| + | no low complexity 2686460 (47.46%) #number of VNTRs in low complexity region determined by MDUST |
− | #Het% - proportion of hets
| + | no coding 17911 (0.32%) #number of VNTRs in coding regions determined by GENCODE v7 |
− | Mendelian Errors <br>
| + | no redundant 1312209 (23.18%) #number of VNTRs involved in overlapping with one another<br> |
− | Father Mother R/R R/A A/A Error(%) HomHet Het(%)
| + | trf_lobstr (1638516) #TRF based reference set used in lobSTR, motif lengths 1 to 6. |
− | R/R R/R 14889 210 38 1.64 nan nan
| + | A-B 3269285 #TRs specific to vntrs.sites.bcf |
− | R/R R/A 3403 3497 74 1.06 0.97 50.68
| + | A-B~ 1666185 #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR. |
− | R/R A/A 176 1482 155 18.26 nan nan
| + | A&B1 725404 #TRs in vntrs.sites.bcf that overlap exactly with at least one TR in TRF(lobSTR) |
− | R/A R/R 3665 3652 68 0.92 1.00 49.91
| + | A&B2 723195 #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs.sites.bcf |
− | R/A R/A 1015 3151 990 0.00 0.64 61.11
| + | B-A~ 710075 #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR. |
− | R/A A/A 43 1300 1401 1.57 1.08 48.13
| + | B-A 205246 #TRs specific to TRF(lobSTR) |
− | A/A R/R 172 1365 147 18.94 nan nan
| + | #note that the first 3 rows should sum up to the number of TRs in vntrs.sites.bcf |
− | A/A R/A 47 1164 1183 1.96 1.02 49.60
| + | #and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR) |
− | A/A A/A 20 78 5637 1.71 nan nan <br>
| + | #This basically allows us to see the m to n overlapping in overlapping TRs<br> |
− | Parental R/R R/A A/A Error(%) HomHet Het(%)
| + | trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6. |
− | R/R R/R 14889 210 38 1.64 nan nan
| + | A-B 3291652 |
− | R/R R/A 7068 7149 142 0.99 0.99 50.28
| + | A-B~ 1650190 |
− | R/R A/A 348 2847 302 18.59 nan nan
| + | A&B1 719032 |
− | R/A R/A 1015 3151 990 0.00 0.64 61.11
| + | A&B2 716838 |
− | R/A A/A 90 2464 2584 1.75 1.05 48.81
| + | B-A~ 703948 |
− | A/A A/A 20 78 5637 1.71 nan nan <br>
| + | B-A 203767 <br> |
− | Parental R/R R/A A/A Error(%) HomHet Het(%)
| + | trf_vntrseek (230306) #TRF based reference set used in vntrseek, motif lengths 7 to 2000. |
− | HOM HOM 14909 288 5675 1.66 nan nan
| + | A-B 5384453 |
− | HOM HET 7158 9613 2726 1.19 1.00 49.90
| + | A-B~ 271302 |
− | HET HET 1015 3151 990 0.00 0.64 61.11
| + | A&B1 5119 |
− | HOMREF HOMALT 348 2847 302 18.59 nan nan <br>
| + | A&B2 4973 |
− | total mendelian error : 2.505%
| + | B-A~ 92496 |
− | no. of trios : 2
| + | B-A 132837 <br> |
− | no. of variants : 25346
| + | codis+ (15) #CODIS STRs + 2 STRs from PROMEGA |
− | | + | A-B 5660794 |
− | = Variant Calling =
| + | A-B~ 79 |
| + | A&B1 1 |
| + | A&B2 1 |
| + | B-A~ 14 |
| + | B-A 0 |
| | | |
| + | # This file contains information on how to process reference data sets. |
| + | # dataset - name of data set, this label will be printed. |
| + | # type - True Positives (TP) and False Positives (FP). |
| + | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively. |
| + | # - annotation. |
| + | # file is used for GENCODE annotation of coding VNTRs. |
| + | # filter - filter applied to variants for this particular data set. |
| + | # path - path of indexed BCF file. |
| + | #dataset type filter path |
| + | trf_lobstr TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.lobstr.sites.bcf |
| + | trf_repeatseq TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.repeatseq.sites.bcf |
| + | trf_vntrseek TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.vntrseek.sites.bcf |
| + | codis+ TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/codis.strs.sites.bcf |
| + | GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz |
| + | DUST cplx_annotation . |
| | | |
− | === Discover ===
| |
− |
| |
− | Discovers variants from reads in a BAM/CRAM file.
| |
− |
| |
− | <div class=" mw-collapsible mw-collapsed">
| |
− | #discover variants from NA12878.bam and write to stdout
| |
− | vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20
| |
| <div class="mw-collapsible-content"> | | <div class="mw-collapsible-content"> |
− | usage : vt discover2 [options] | + | usage : vt profile_vntrs [options] <in.vcf> |
| | | |
− | options : -b input BAM/CRAM file | + | options : -g file containing list of reference datasets [] |
− | -y soft clipped unique sequences cutoff [0]
| + | -I file containing list of intervals [] |
− | -x soft clipped mean quality cutoff [0]
| + | -i intervals [] |
− | -w insertion desired type II error [0.0]
| + | -r reference sequence fasta file [] |
− | -c insertion desired type I error [0.0]
| + | -? displays help |
− | -h insertion fractional evidence cutoff [0]
| + | </div> |
− | -g insertion count cutoff [1]
| + | </div> |
− | -n deletion desired type II error [0.0]
| + | |
− | -m deletion desired type I error [0.0]
| + | === Profile NA12878 === |
− | -v deletion fractional evidence cutoff [0]
| + | |
− | -u deletion count cutoff [1]
| + | Profile Mendelian errors |
− | -k snp desired type II error [0.0]
| + | |
− | -j snp desired type I error [0.0]
| + | <div class=" mw-collapsible mw-collapsed"> |
− | -f snp fractional evidence cutoff [0]
| + | #profile NA12878 overlap with broad knowledgebase and illumina platinum genomes for the file vt.genotypes.bcf for chromosome 20. |
− | -e snp evidence count cutoff [1]
| + | vt profile_na12878 vt.genotypes.bcf -g na12878.reference.txt -r hs37d5.fa -i 20 |
− | -q base quality cutoff for bases [0]
| + | |
− | -C likelihood ratio cutoff [0]
| + | #this is a sample output for mendelian error profiling. |
− | -B reference bias [0]
| + | #R and A stand for reference and alternate allele respectively. |
− | -a read exclude flag [0x0704]
| + | #Error% - mendelian error (confounded with de novo mutation) |
− | -l ignore overlapping reads [false] | + | #HomHet - Homozygous-Heterozygous genotype ratios |
− | -t MAPQ cutoff for alignments [0] | + | #Het% - proportion of hets |
− | -p ploidy [2] | + | data set |
− | -s sample ID
| + | No Indels : 27770 [0.94] |
− | -r reference sequence fasta file []
| + | FS/NFS : 0.26 (8/23) <br> |
− | -o output VCF file [-]
| + | broad.kb |
− | -z ignore MD tags [0]
| + | A-B 13071 [1.19] |
− | -d debug [0]
| + | A&B 14699 [0.76] |
− | -I file containing list of intervals []
| + | B-A 21546 [0.62] |
− | -i intervals []
| + | Precision 52.9% |
− | -? displays help
| + | Sensitivity 40.6% <br> |
| + | illumina.platinum |
| + | A-B 17952 [0.88] |
| + | A&B 9818 [1.07] |
| + | B-A 2418 [0.88] |
| + | Precision 35.4% |
| + | Sensitivity 80.2% <br> |
| + | broad.kb |
| + | R/R R/A A/A ./. |
| + | R/R 346 145 3 5473 |
| + | R/A 3 4133 9 758 |
| + | A/A 2 136 2186 956 |
| + | ./. 2 139 86 322 <br> |
| + | Total genotype pairs : 6963 |
| + | Concordance : 95.72% (6665) |
| + | Discordance : 4.28% (298) <br> |
| + | illumina.platinum |
| + | R/R R/A A/A ./. |
| + | R/R 1768 85 2 0 |
| + | R/A 10 4479 14 0 |
| + | A/A 13 180 3028 0 |
| + | ./. 71 98 70 0<br> |
| + | Total genotype pairs : 9579 |
| + | Concordance : 96.83% (9275) |
| + | Discordance : 3.17% (304) |
| | | |
− | </div> | + | # This file contains information on how to process reference data sets. |
− | </div> | + | # |
| + | # dataset - name of data set, this label will be printed. |
| + | # type - True Positives (TP) and False Positives (FP) |
| + | # overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively |
| + | # - annotation |
| + | # file is used for GENCODE annotation of frame shift and non frame shift Indels |
| + | # filter - filter applied to variants for this particular data set |
| + | # path - path of indexed BCF file |
| + | #dataset type filter path |
| + | broad.kb TP PASS /net/fantasia/home/atks/dev/vt/bundle/public/grch37/broad.kb.241365variants.genotypes.bcf |
| + | illumina.platinum TP PASS /net/fantasia/home/atks/dev/vt/bundle/public/grch37/NA12878.illumina.platinum.5284448variants.genotypes.bcf |
| + | #gencode.v19 annotation . /net/fantasia/home/atks/dev/vt/bundle/public/grch37/gencode.v19.annotation.gtf.gz |
| + | <div class="mw-collapsible-content"> |
| + | profile_na12878 v0.5 |
| | | |
− | === Merge candidate variants ===
| + | usage : vt profile_na12878 [options] <in.vcf> |
| | | |
− | | + | options : -g file containing list of reference datasets [] |
− | Merge candidate variants across samples. Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.
| |
− | | |
− | <div class=" mw-collapsible mw-collapsed">
| |
− | #merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf
| |
− | vt merge_candidate_variants candidates.txt -o candidate.sites.vcf
| |
− | <div class="mw-collapsible-content">
| |
− | usage : vt merge_candidate_variants [options]
| |
− | | |
− | options : -L file containing list of input VCF files | |
− | -o output VCF file [-]
| |
| -I file containing list of intervals [] | | -I file containing list of intervals [] |
− | -i intervals | + | -i intervals [] |
− | -- ignores the rest of the labeled arguments following this flag | + | -r reference sequence fasta file [] |
− | -h displays help | + | -? displays help |
| </div> | | </div> |
| </div> | | </div> |
| | | |
− | === Remove overlap === | + | = Variant Calling = |
| | | |
− | Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap.
| |
| | | |
− | <div class=" mw-collapsible mw-collapsed">
| + | === Discover === |
− | #annotates variants that are overlapping
| |
− | vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf
| |
| | | |
− | <div class="mw-collapsible-content">
| + | Discovers variants from reads in a BAM/CRAM file. |
− | usage : vt remove_overlap [options] <in.vcf>
| |
− | | |
− | options : -o output VCF file [-]
| |
− | -I file containing list of intervals []
| |
− | -i intervals []
| |
− | -? displays help
| |
− | </div>
| |
− | </div>
| |
− | | |
− | === Annotate Indels ===
| |
− | | |
− | Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs.
| |
| | | |
| <div class=" mw-collapsible mw-collapsed"> | | <div class=" mw-collapsible mw-collapsed"> |
− | #annotates indels from VCFs with VNTR information. | + | #discover variants from NA12878.bam and write to stdout |
− | vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf | + | vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20 |
| + | <div class="mw-collapsible-content"> |
| + | usage : vt discover2 [options] |
| | | |
− | <div style="height:20em; overflow:auto; border: 2px solid #FFF">
| + | options : -b input BAM/CRAM file |
− | CHROM POS ID REF ALT QUAL FILTER INFO
| + | -y soft clipped unique sequences cutoff [0] |
− | 20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG
| + | -x soft clipped mean quality cutoff [0] |
− | 20 82217 . G A 1632.77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG
| + | -w insertion desired type II error [0.0] |
− | 20 83250 . CTGTGTGTG C . . NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
| + | -c insertion desired type I error [0.0] |
− | 20 83250 . CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT
| + | -h insertion fractional evidence cutoff [0] |
− | 20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR> . . MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT
| + | -g insertion count cutoff [1] |
− | 20 83252 . G C 359.204 . NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG
| + | -n deletion desired type II error [0.0] |
− | 20 83260 . G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG
| + | -m deletion desired type I error [0.0] |
− | 20 83267 . T C 247.043 . NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
| + | -v deletion fractional evidence cutoff [0] |
− | 20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT
| + | -u deletion count cutoff [1] |
− | 20 90008 . C A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA
| + | -k snp desired type II error [0.0] |
− | 20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC
| + | -j snp desired type I error [0.0] |
− | 20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG
| + | -f snp fractional evidence cutoff [0] |
− | 20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT
| + | -e snp evidence count cutoff [1] |
− | 20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT
| + | -q base quality cutoff for bases [0] |
− | 20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT
| + | -C likelihood ratio cutoff [0] |
− | 20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR> . . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT
| + | -B reference bias [0] |
− | 20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T
| + | -a read exclude flag [0x0704] |
− | 20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA
| + | -l ignore overlapping reads [false] |
− | 20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG
| + | -t MAPQ cutoff for alignments [0] |
− | 20 100699 . C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT
| + | -p ploidy [2] |
− | 20 101362 . G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC
| + | -s sample ID |
| + | -r reference sequence fasta file [] |
| + | -o output VCF file [-] |
| + | -z ignore MD tags [0] |
| + | -d debug [0] |
| + | -I file containing list of intervals [] |
| + | -i intervals [] |
| + | -? displays help |
| | | |
| + | </div> |
| </div> | | </div> |
| | | |
− | The following shows the trace of how the algorithm works
| + | === Merge candidate variants === |
| + | |
| + | |
| + | Merge candidate variants across samples. Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample. |
| | | |
− | ============================================
| + | <div class=" mw-collapsible mw-collapsed"> |
− | ANNOTATING INDEL FUZZILY
| + | #merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf |
− | ********************************************
| + | vt merge_candidate_variants candidates.txt -o candidate.sites.vcf |
− | EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT
| + | <div class="mw-collapsible-content"> |
− |
| + | usage : vt merge_candidate_variants [options] |
− | 20:131948:C/CCA
| + | |
− | EXACT REGION 131948-131965 (18) | + | options : -L file containing list of input VCF files |
− | CCACACACACACACACAA
| + | -o output VCF file [-] |
− | FINAL EXACT REGION 131948-131965 (18)
| + | -I file containing list of intervals [] |
− | CCACACACACACACACAA
| + | -i intervals |
− | ********************************************
| + | -- ignores the rest of the labeled arguments following this flag |
− | PICK CANDIDATE MOTIFS
| + | -h displays help |
− |
| + | </div> |
− | Longest Allele : C[CA]CACACACACACACACAA
| + | </div> |
− | detecting motifs for an str
| + | |
− | seq: CCACACACACACACACACAA
| + | === Remove overlap === |
− | len : 20
| + | |
− | cmax_len : 10
| + | Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap. |
− | candidate motifs: 25
| + | |
− | AC : 0.894737 2 0
| + | <div class=" mw-collapsible mw-collapsed"> |
− | AAC : 0.5 3 0.0555556
| + | #annotates variants that are overlapping |
− | ACC : 0.5 3 0.0555556
| + | vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf |
− | AAAC : 0.0588235 4 0.125 (< 2 copies)
| + | |
− | ACCC : 0.0588235 4 0.125 (< 2 copies)
| + | <div class="mw-collapsible-content"> |
− | AACAC : 0.5 5 0.02
| + | usage : vt remove_overlap [options] <in.vcf> |
− | ACACC : 0.5 5 0.02
| + | |
− | AAACAC : 0.0666667 6 0.0555556 (< 2 copies)
| + | options : -o output VCF file [-] |
− | ACACCC : 0.0666667 6 0.0555556 (< 2 copies)
| + | -I file containing list of intervals [] |
− | AACACAC : 0.5 7 0.0102041
| + | -i intervals [] |
− | ACACACC : 0.5 7 0.0102041
| + | -? displays help |
− | AAACACAC : 0.0769231 8 0.03125 (< 2 copies) | + | </div> |
− | ACACACCC : 0.0769231 8 0.03125 (< 2 copies)
| + | </div> |
− | AACACACAC : 0.5 9 0.00617284 (< 2 copies)
| + | |
− | ACACACACC : 0.5 9 0.00617284 (< 2 copies)
| + | === Annotate Indels === |
− | AAACACACAC : 0.0909091 10 0.02 (< 2 copies)
| + | |
− | ACACACACCC : 0.0909091 10 0.02 (< 2 copies)
| + | Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs. |
− | ********************************************
| + | |
− | PICKING NEXT BEST MOTIF
| + | <div class=" mw-collapsible mw-collapsed"> |
− | | + | #annotates indels from VCFs with VNTR information. |
− | selected: AC 0.89 0.00 | + | vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf |
− | ******************************************** | + | |
− | DETECTING REPEAT TRACT FUZZILY | + | <div style="height:20em; overflow:auto; border: 2px solid #FFF"> |
− | ++++++++++++++++++++++++++++++++++++++++++++
| + | CHROM POS ID REF ALT QUAL FILTER INFO |
− | Exact left/right alignment
| + | 20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG |
− |
| + | 20 82217 . G A 1632.77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG |
− | repeat_tract : CACACACACACACACA
| + | 20 83250 . CTGTGTGTG C . . NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT |
− | position : [131949,131964]
| + | 20 83250 . CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT |
− | motif_concordance : 1
| + | 20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR> . . MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT |
− | repeat units : 8
| + | 20 83252 . G C 359.204 . NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG |
− | exact repeat units : 8
| + | 20 83260 . G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG |
− | total no. of repeat units : 8
| + | 20 83267 . T C 247.043 . NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT |
− |
| + | 20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT |
− | ++++++++++++++++++++++++++++++++++++++++++++
| + | 20 90008 . C A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA |
− | Fuzzy right alignment
| + | 20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC |
− |
| + | 20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG |
− | repeat motif : CA
| + | 20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT |
− | rflank : AACTC
| + | 20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT |
− | mlen : 2
| + | 20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT |
− | rflen : 5
| + | 20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR> . . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT |
− | plen : 111
| + | 20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T |
| + | 20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA |
| + | 20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG |
| + | 20 100699 . C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT |
| + | 20 101362 . G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC |
| + | |
| + | </div> |
| + | |
| + | The following shows the trace of how the algorithm works |
| + | |
| + | ============================================ |
| + | ANNOTATING INDEL FUZZILY |
| + | ******************************************** |
| + | EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT |
| | | |
− | read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC | + | 20:131948:C/CCA |
− | rlen : 106
| + | EXACT REGION 131948-131965 (18) |
− | | + | CCACACACACACACACAA |
− | optimal score: 50.5073
| + | FINAL EXACT REGION 131948-131965 (18) |
− | optimal state: MR | + | CCACACACACACACACAA |
− | optimal track: MR|r|0|5
| + | ******************************************** |
− | optimal probe len: 25
| + | PICK CANDIDATE MOTIFS |
− | optimal path length : 107
| |
− | max j: 106 | |
− | probe: (1~82) [1~10] (1~5) | |
− | read : (1~82) [83~101] (102~106)
| |
| | | |
− | motif # : 10 [83,101] | + | Longest Allele : C[CA]CACACACACACACACAA |
− | motif concordance : 95% (9/10) | + | detecting motifs for an str |
− | motif discordance : 0|1|0|0|0|0|0|0|0|0 | + | seq: CCACACACACACACACACAA |
− | | + | len : 20 |
− | Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC | + | cmax_len : 10 |
− | SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME
| + | candidate motifs: 25 |
− | oo++oo++oo++oo++oo++RRRRR
| + | AC : 0.894737 2 0 |
− | Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC | + | AAC : 0.5 3 0.0555556 |
| + | ACC : 0.5 3 0.0555556 |
| + | AAAC : 0.0588235 4 0.125 (< 2 copies) |
| + | ACCC : 0.0588235 4 0.125 (< 2 copies) |
| + | AACAC : 0.5 5 0.02 |
| + | ACACC : 0.5 5 0.02 |
| + | AAACAC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | ACACCC : 0.0666667 6 0.0555556 (< 2 copies) |
| + | AACACAC : 0.5 7 0.0102041 |
| + | ACACACC : 0.5 7 0.0102041 |
| + | AAACACAC : 0.0769231 8 0.03125 (< 2 copies) |
| + | ACACACCC : 0.0769231 8 0.03125 (< 2 copies) |
| + | AACACACAC : 0.5 9 0.00617284 (< 2 copies) |
| + | ACACACACC : 0.5 9 0.00617284 (< 2 copies) |
| + | AAACACACAC : 0.0909091 10 0.02 (< 2 copies) |
| + | ACACACACCC : 0.0909091 10 0.02 (< 2 copies) |
| + | ******************************************** |
| + | PICKING NEXT BEST MOTIF |
| | | |
| + | selected: AC 0.89 0.00 |
| + | ******************************************** |
| + | DETECTING REPEAT TRACT FUZZILY |
| ++++++++++++++++++++++++++++++++++++++++++++ | | ++++++++++++++++++++++++++++++++++++++++++++ |
− | Fuzzy left alignment | + | Exact left/right alignment |
| | | |
− | lflank : ATCTTA | + | repeat_tract : CACACACACACACACA |
− | repeat motif : CA | + | position : [131949,131964] |
− | lflen : 6 | + | motif_concordance : 1 |
| + | repeat units : 8 |
| + | exact repeat units : 8 |
| + | total no. of repeat units : 8 |
| + | |
| + | ++++++++++++++++++++++++++++++++++++++++++++ |
| + | Fuzzy right alignment |
| + | |
| + | repeat motif : CA |
| + | rflank : AACTC |
| mlen : 2 | | mlen : 2 |
| + | rflen : 5 |
| plen : 111 | | plen : 111 |
| | | |
− | read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT | + | read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC |
− | rlen : 105 | + | rlen : 106 |
| | | |
− | optimal score: 50.5858 | + | optimal score: 50.5073 |
− | optimal state: Z | + | optimal state: MR |
− | optimal track: Z|m|10|2 | + | optimal track: MR|r|0|5 |
− | optimal probe len: 26 | + | optimal probe len: 25 |
− | optimal path length : 106 | + | optimal path length : 107 |
− | max j: 105 | + | max j: 106 |
− | mismatch penalty: 3 | + | probe: (1~82) [1~10] (1~5) |
| + | read : (1~82) [83~101] (102~106) |
| | | |
− | model: (1~6) [1~10]
| + | motif # : 10 [83,101] |
− | read : (1~6) [7~25][26~106]
| |
− |
| |
− | motif # : 10 [7,25] | |
| motif concordance : 95% (9/10) | | motif concordance : 95% (9/10) |
| motif discordance : 0|1|0|0|0|0|0|0|0|0 | | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
| | | |
− | Model: ATCTTACACACACACACACACACACA-------------------------------------------------------------------------------- | + | Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC |
− | SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE | + | SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME |
− | LLLLLLoo++oo++oo++oo++oo++
| + | oo++oo++oo++oo++oo++RRRRR |
− | Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT | + | Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC |
| | | |
− | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | + | ++++++++++++++++++++++++++++++++++++++++++++ |
− | VNTR Summary | + | Fuzzy left alignment |
− | rid : 19
| |
− | motif : AC
| |
− | ru : CA
| |
| | | |
− | Exact | + | lflank : ATCTTA |
− | repeat_tract : CACACACACACACACA | + | repeat motif : CA |
− | position : [131949,131964] | + | lflen : 6 |
− | reference repeat unit length : 8 | + | mlen : 2 |
− | motif_concordance : 1 | + | plen : 111 |
− | repeat units : 8 | + | |
− | exact repeat units : 8 | + | read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
− | total no. of repeat units : 8 | + | rlen : 105 |
| | | |
− | Fuzzy | + | optimal score: 50.5858 |
− | repeat_tract : CACCACACACACACACACA | + | optimal state: Z |
− | position : [131946,131964] | + | optimal track: Z|m|10|2 |
− | reference repeat unit length : 19 | + | optimal probe len: 26 |
− | motif_concordance : 0.95 | + | optimal path length : 106 |
− | repeat units : 19 | + | max j: 105 |
− | exact repeat units : 9 | + | mismatch penalty: 3 |
− | total no. of repeat units : 10 | + | |
− | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | + | model: (1~6) [1~10] |
− | | + | read : (1~6) [7~25][26~106] |
− | <div class="mw-collapsible-content"> | + | |
− | usage : vt annotate_indels [options] <in.vcf> | + | motif # : 10 [7,25] |
− | | + | motif concordance : 95% (9/10) |
− | options : -v add vntr record [false] | + | motif discordance : 0|1|0|0|0|0|0|0|0|0 |
− | -x override tags [false] | + | |
− | -f filter expression [] | + | Model: ATCTTACACACACACACACACACACA-------------------------------------------------------------------------------- |
− | -d debug [false] | + | SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE |
− | -m mode [f] | + | LLLLLLoo++oo++oo++oo++oo++ |
− | e : by exact alignment f : by fuzzy alignment | + | Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT |
− | -c classification schemas of tandem repeat [6] | + | |
− | 1 : lai2003 | + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
− | 2 : kelkar2008 | + | VNTR Summary |
− | 3 : fondon2012 | + | rid : 19 |
− | 4 : ananda2013 | + | motif : AC |
− | 5 : willems2014 | + | ru : CA |
− | 6 : tan_kang2015 | + | |
− | -a annotation type [v] | + | Exact |
− | v : a. output VNTR variant (defined by classification). | + | repeat_tract : CACACACACACACACA |
− | RU repeat unit on reference sequence (CA) | + | position : [131949,131964] |
− | MOTIF canonical representation (AC) | + | reference repeat unit length : 8 |
− | RL repeat tract length in bases (11) | + | motif_concordance : 1 |
− | FLANKS flanking positions of repeat tract determined by exact alignment | + | repeat units : 8 |
− | RU_COUNTS number of exact repeat units and total number of repeat units in | + | exact repeat units : 8 |
− | repeat tract determined by exact alignment | + | total no. of repeat units : 8 |
− | FZ_RL fuzzy repeat tract length in bases (11) | + | |
− | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment | + | Fuzzy |
− | FZ_RU_COUNTS number of exact repeat units and total number of repeat units in | + | repeat_tract : CACCACACACACACACACA |
− | repeat tract determined by fuzzy alignment | + | position : [131946,131964] |
− | FLANKSEQ flanking sequence of indel | + | reference repeat unit length : 19 |
− | LARGE_REPEAT_REGION repeat region exceeding 2000bp | + | motif_concordance : 0.95 |
− | b. mark indels with overlapping VNTR. | + | repeat units : 19 |
− | FLANKS flanking positions of repeat tract determined by exact alignment | + | exact repeat units : 9 |
− | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment | + | total no. of repeat units : 10 |
− | GMOTIF generating motif used in fuzzy alignment | + | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
− | TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>) | + | |
− | a : annotate each indel with RU, RL, MOTIF, REF. | + | <div class="mw-collapsible-content"> |
− | -r reference sequence fasta file [] | + | usage : vt annotate_indels [options] <in.vcf> |
− | -o output VCF file [-] | + | |
− | -I file containing list of intervals [] | + | options : -v add vntr record [false] |
− | -i intervals | + | -x override tags [false] |
− | -? displays help | + | -f filter expression [] |
− | </div> | + | -d debug [false] |
− | </div> | + | -m mode [f] |
− | | + | e : by exact alignment f : by fuzzy alignment |
− | === Construct Probes === | + | -c classification schemas of tandem repeat [6] |
| + | 1 : lai2003 |
| + | 2 : kelkar2008 |
| + | 3 : fondon2012 |
| + | 4 : ananda2013 |
| + | 5 : willems2014 |
| + | 6 : tan_kang2015 |
| + | -a annotation type [v] |
| + | v : a. output VNTR variant (defined by classification). |
| + | RU repeat unit on reference sequence (CA) |
| + | MOTIF canonical representation (AC) |
| + | RL repeat tract length in bases (11) |
| + | FLANKS flanking positions of repeat tract determined by exact alignment |
| + | RU_COUNTS number of exact repeat units and total number of repeat units in |
| + | repeat tract determined by exact alignment |
| + | FZ_RL fuzzy repeat tract length in bases (11) |
| + | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment |
| + | FZ_RU_COUNTS number of exact repeat units and total number of repeat units in |
| + | repeat tract determined by fuzzy alignment |
| + | FLANKSEQ flanking sequence of indel |
| + | LARGE_REPEAT_REGION repeat region exceeding 2000bp |
| + | b. mark indels with overlapping VNTR. |
| + | FLANKS flanking positions of repeat tract determined by exact alignment |
| + | FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment |
| + | GMOTIF generating motif used in fuzzy alignment |
| + | TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>) |
| + | a : annotate each indel with RU, RL, MOTIF, REF. |
| + | -r reference sequence fasta file [] |
| + | -o output VCF file [-] |
| + | -I file containing list of intervals [] |
| + | -i intervals |
| + | -? displays help |
| + | </div> |
| + | </div> |
| + | |
| + | === Construct Probes === |
| | | |
| | | |
Line 1,606: |
Line 1,696: |
| #construct probes from candidate.sites.bcf and output to standard out | | #construct probes from candidate.sites.bcf and output to standard out |
| vt construct_probes candidates.sites.bcf -r ref.fa | | vt construct_probes candidates.sites.bcf -r ref.fa |
− | <div class="mw-collapsible-content"> | + | <div class="mw-collapsible-content"> |
− | usage : vt construct_probes [options] <in.vcf> | + | usage : vt construct_probes [options] <in.vcf> |
| + | |
| + | options : -o output VCF file [-] |
| + | -f minimum flank length [20] |
| + | -r reference sequence fasta file [] |
| + | -I file containing list of intervals [] |
| + | -i intervals [] |
| + | -- ignores the rest of the labeled arguments following this flag |
| + | -h displays help |
| + | </div> |
| + | </div> |
| + | |
| + | === Genotype === |
| + | |
| + | Genotypes variants for each sample. |
| + | |
| + | <div class=" mw-collapsible mw-collapsed"> |
| + | #genotypes variants found in candidate.sites.vcf from sample.bam |
| + | vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf |
| + | <div class="mw-collapsible-content"> |
| + | usage : vt genotype [options] |
| + | |
| + | options : -r reference sequence fasta file [] |
| + | -s sample ID [] |
| + | -o output VCF file [-] |
| + | -b input BAM file [] |
| + | -i input candidate VCF file [] |
| + | -- ignores the rest of the labeled arguments following this flag |
| + | -h displays help |
| + | </div> |
| + | </div> |
| + | |
| + | = Pedigree File = |
| + | |
| + | vt understands an augmented version introduced by [mailto:hmkang@umich.edu Hyun] of the PED described by [http://zzz.bwh.harvard.edu/plink/data.shtml#ped plink]. |
| + | The pedigree file format is as follows with the following mandatory fields: |
| + | |
| + | {| class="wikitable" |
| + | |- |
| + | ! scope="col"| Field |
| + | ! scope="col"| Description |
| + | ! scope="col"| Valid Values |
| + | ! scope="col"| Missing Values |
| + | |- |
| + | |Family ID<br> |
| + | Individual ID<br> |
| + | Paternal ID<br> |
| + | Maternal ID<br> |
| + | Sex<br> |
| + | Phenotype |
| + | |ID of this family <br> |
| + | ID(s) of this individual (comma separated) <br> |
| + | ID of the father <br> |
| + | ID of the mother <br> |
| + | Sex of the individual<br> |
| + | Phenotype |
| + | |[A-Za-z0-9_]+<br> |
| + | [A-Za-z0-9_]+(,[A-Za-z0-9_]+)* <br> |
| + | [A-Za-z0-9_]+ <br> |
| + | [A-Za-z0-9_]+<br> |
| + | 1=male, 2=female, other, male, female<br> |
| + | [A-Za-z0-9_]+ |
| + | | 0 <br> |
| + | cannot be missing <br> |
| + | 0 <br> |
| + | 0 <br> |
| + | other<br> |
| + | -9 |
| + | |} |
| | | |
− | options : -o output VCF file [-] | + | Examples: |
− | -f minimum flank length [20]
| |
− | -r reference sequence fasta file []
| |
− | -I file containing list of intervals []
| |
− | -i intervals []
| |
− | -- ignores the rest of the labeled arguments following this flag
| |
− | -h displays help
| |
− | </div>
| |
− | </div>
| |
| | | |
− | === Genotype ===
| + | ceu NA12878 NA12891 NA12892 female -9 |
| + | yri NA19240 NA19239 NA19238 female -9 |
| | | |
− | Genotypes variants for each sample.
| + | ceu NA12878 NA12891 NA12892 2 -9 |
| + | yri NA19240 NA19239 NA19238 2 -9 |
| | | |
− | <div class=" mw-collapsible mw-collapsed">
| + | #allows tools like profile_mendelian to detect duplicates and check for concordance |
− | #genotypes variants found in candidate.sites.vcf from sample.bam
| + | ceu NA12878,NA12878A NA12891 NA12892 female case |
− | vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf | + | yri NA19240 NA19239 NA19238 female control |
− | <div class="mw-collapsible-content">
| |
− | usage : vt genotype [options] | |
| | | |
− | options : -r reference sequence fasta file []
| + | #allows tools like profile_mendelian to detect duplicates and check for concordance |
− | -s sample ID []
| + | ceu NA12412 0 0 female case |
− | -o output VCF file [-]
| + | yri NA19650 0 0 female control |
− | -b input BAM file []
| |
− | -i input candidate VCF file []
| |
− | -- ignores the rest of the labeled arguments following this flag
| |
− | -h displays help
| |
− | </div>
| |
− | </div>
| |
| | | |
| = Resource Bundle = | | = Resource Bundle = |