Changes

Vt (view source)

Revision as of 01:03, 2 March 2018

5,118 bytes added , 01:03, 2 March 2018

→‎Mac

Line 54: Line 54:

</div>

−

~~Building has been tested on Linux and~~ Mac ~~systems on gcc 4.8.1 and clang 3.4. <br>~~

+

=== Mac ===

−

~~Some features of C++11 is used, thus there is a~~ need ~~for newer versions of gcc and clang~~.

+

You will need to install the package xz prior to installing vt.

−

~~== Mac ==~~

+

homebrew install xz

−

~~You may also install vt on mac via homebrew.~~

−

~~brew install homebrew/science/vt~~

+

Building has been tested on Linux and Mac systems on gcc 4.8.1 and clang 3.4. <br>

+

Some features of C++11 are used, thus there is a need for newer versions of gcc and clang.

= Updating =

Line 719: Line 720:

−

#converts in.bcf to tab format with selected INFO fields

+

#converts in.bcf to tab format with selected INFO and FILTER fields

−

vt info2tab in.bcf -v -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT

+

vt info2tab in.bcf -u PASS -t EX_RL,FZ_RL,MDUST,LOBSTR,VNTRSEEK,RMSK,EX_REPEAT_TRACT

−

+

INPUT

+

=====

20 17548608 . A AC . PASS CENTERS=vbi;NCENTERS=1;OLD_MULTIALLELIC=20:17548598:GAAAAAAAAAAAAA/GAAAAAAAAAAAA/GAAAAAAAAAAAAAA/GAAAAAAAAAA/GAAAAAAAAAAA/GAAAAAAAAAACAAA;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAACAAAG;EX_MOTIF=C;EX_MLEN=1;EX_RU=C;EX_BASIS=C;EX_BLEN=1;EX_REPEAT_TRACT=17548608,17548609;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=2;EX_RL=2;EX_LL=3;EX_RU_COUNTS=0,2;EX_SCORE=0;EX_TRF_SCORE=-14;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=14;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[A]AAAGAAGGAA;MDUST;LOBSTR

20 17548608 . AAAAG A . PASS CENTERS=ox1;NCENTERS=1;EX_MOTIF=AAAG;EX_MLEN=4;EX_RU=AAAG;EX_BASIS=AG;EX_BLEN=2;EX_REPEAT_TRACT=17548609,17548612;EX_COMP=100,0,0,0;EX_ENTROPY=0;EX_ENTROPY2=0;EX_KL_DIVERGENCE=2;EX_KL_DIVERGENCE2=4;EX_REF=0.75;EX_RL=4;EX_LL=4;EX_RU_COUNTS=0,1;EX_SCORE=0.75;EX_TRF_SCORE=-1;FZ_MOTIF=A;FZ_MLEN=1;FZ_RU=A;FZ_BASIS=A;FZ_BLEN=1;FZ_REPEAT_TRACT=17548599,17548611;FZ_COMP=100,0,0,0;FZ_ENTROPY=0;FZ_ENTROPY2=0;FZ_KL_DIVERGENCE=2;FZ_KL_DIVERGENCE2=4;FZ_REF=13;FZ_RL=13;FZ_LL=13;FZ_RU_COUNTS=13,13;FZ_SCORE=1;FZ_TRF_SCORE=26;FLANKSEQ=GAAAAAAAAA[AAAAG]AAGGAACTAC;MDUST;LOBSTR;OLD_VARIANT=20:17548598:GAAAAAAAAAAAAAG/GAAAAAAAAAA

−

</div>

−

+

OUTPUT

−

CHROM POS REF ALT N_ALLELE EX_RL FZ_RL MDUST LOBSTR VNTRSEEK RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2

+

======

−

20 17548608 A AC 2 2 13 1 1 0 0 17548608 17548608

+

CHROM POS REF ALT N_ALLELE PASS EX_RL FZ_RL MDUST LOBSTR VNTRSEEK RMSK EX_REPEAT_TRACT_1 EX_REPEAT_TRACT_2

−

20 17548608 AAAAG A 2 4 13 1 1 0 0 17548609 17548609

+

20 17548608 A AC 2 1 2 13 1 1 0 0 17548608 17548608

+

20 17548608 AAAAG A 2 1 4 13 1 1 0 0 17548609 17548609

usage : vt info2tab [options] <in.vcf>

−

options : ~~-v print variant CHROM,POS,REF,ALT,N_ALLELE [false]~~

+

options : -d debug [false]

−

-d debug [false]

-f filter expression []

−

-t list of info tags to be extracted []

+

-u list of filter tags to be extracted []-t list of info tags to be extracted []

-o output tab delimited file [-]

-I file containing list of intervals []

Line 1,056: Line 1,057:

</div>

−

=== Profile ~~SNPs~~ ===

+

=== Profile Mendelian Errors ===

−

Profile ~~SNPs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].~~

+

Profile Mendelian errors

−

#profile ~~snps~~ found in 20.~~sites~~.~~vcf~~

+

#profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex.

−

vt ~~profile_snps -g snp~~.~~reference~~.~~txt 20~~.~~sites.vcf~~ -~~r hs37d5~~.~~fa -i 20~~

+

vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel

+

pedigree file format is described in [[Vt#Pedigree File|here]].

−

#this is a sample output for ~~indel~~ profiling.

+

#this is a sample output for mendelian error profiling.

−

# ~~square brackets contain the ts/tv ratio~~.

+

#R and A stand for reference and alternate allele respectively.

−

# ~~The numbers in curved bracket are the counts of ts and tv SNPs respectively.~~

+

#Error% - mendelian error (confounded with de novo mutation)

−

# ~~Low complexity shows what percent~~ of ~~the SNPs are in low complexity regions.~~

+

#HomHet - Homozygous-Heterozygous genotype ratios

−

~~data set~~

+

#Het% - proportion of hets

−

~~No. SNPs~~ : ~~508603 [2~~.~~09]~~

+

Mendelian Errors <br>

−

~~Low complexity :~~ 0.~~08 (39837~~/~~508603) <br>~~

+

Father Mother R/R R/A A/A Error(%) HomHet Het(%)

−

~~1000g~~

+

R/R R/R 14889 210 38 1.64 nan nan

−

A-B ~~109970 [~~1.~~39]~~

+

R/R R/A 3403 3497 74 1.06 0.97 50.68

−

A&B ~~398633 [2~~.~~37]~~

+

R/R A/A 176 1482 155 18.26 nan nan

−

B-A ~~1340682 [2~~.~~26]~~

+

R/A R/R 3665 3652 68 0.92 1.00 49.91

−

~~Precision~~ 78~~.4%~~

+

R/A R/A 1015 3151 990 0.00 0.64 61.11

−

~~Sensitivity 22~~.9% <br>

+

R/A A/A 43 1300 1401 1.57 1.08 48.13

−

~~dbsnp~~

+

A/A R/R 172 1365 147 18.94 nan nan

−

A-B ~~324063 [1~~.99]

+

A/A R/A 47 1164 1183 1.96 1.02 49.60

−

A&B ~~184540 [2~~.~~29]~~

+

A/A A/A 20 78 5637 1.71 nan nan <br>

−

B-A ~~103893 [2~~.~~60]~~

+

Parental R/R R/A A/A Error(%) HomHet Het(%)

−

~~Precision~~ 36.3%

+

R/R R/R 14889 210 38 1.64 nan nan

−

~~Sensitivity~~ 64.0%

+

R/R R/A 7068 7149 142 0.99 0.99 50.28

+

R/R A/A 348 2847 302 18.59 nan nan

+

R/A R/A 1015 3151 990 0.00 0.64 61.11

+

R/A A/A 90 2464 2584 1.75 1.05 48.81

+

A/A A/A 20 78 5637 1.71 nan nan <br>

+

Parental R/R R/A A/A Error(%) HomHet Het(%)

+

HOM HOM 14909 288 5675 1.66 nan nan

+

HOM HET 7158 9613 2726 1.19 1.00 49.90

+

HET HET 1015 3151 990 0.00 0.64 61.11

+

HOMREF HOMALT 348 2847 302 18.59 nan nan <br>

+

total mendelian error : 2.505%

+

no. of trios : 2

+

no. of variants : 25346

+

+

profile_mendelian v0.5

−

~~# This file contains information on how to process reference data sets~~.

+

usage : vt profile_mendelian [options] <in.vcf>

−

#

+

−

~~# dataset~~ - ~~name of data set, this label will be printed.~~

+

options : -q minimum genotype quality

−

~~# type~~ - ~~True Positives (TP) and False Positives (FP)~~

+

-d minimum depth

−

~~# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively~~

+

-r reference sequence fasta file []

−

# - ~~annotation~~

+

-x output latex directory []

−

# file ~~is used for GENCODE annotation of frame shift and non frame shift Indels~~

+

-p pedigree file

−

~~# filter~~ - ~~filter applied to variants for this particular data set~~

+

-I file containing list of intervals []

−

~~# path~~ - ~~path~~ of ~~indexed BCF file~~

+

-i intervals

−

~~#dataset type~~ ~~filter path~~

+

-? displays help

−

~~1000g TP N_ALLELE==2&&VTYPE==SNP~~ /~~net~~/~~fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf~~

+

</div>

−

~~dbsnp TP N_ALLELE~~==~~2&&VTYPE~~==~~SNP /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf~~

+

</div>

−

~~GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz~~

+

−

~~DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz~~

+

=== Profile SNPs ===

−

~~<div class="mw-collapsible-content">~~

+

Profile SNPs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].

−

~~usage :~~ vt ~~profile_snps [options~~] ~~<in~~.~~vcf>~~

−

~~options :~~ -~~f filter expression []~~

+

−

-g ~~file containing list of~~ reference ~~datasets []~~

+

#profile snps found in 20.sites.vcf

−

-I ~~file containing list of intervals []~~

+

vt profile_snps -g snp.reference.txt 20.sites.vcf -r hs37d5.fa -i 20

−

-i ~~intervals []~~

−

~~-r reference sequence fasta file []~~

−

~~-? displays help~~

−

~~</div>~~

−

~~</div>~~

−

~~=== Profile Indels ===~~

+

#this is a sample output for indel profiling.

−

+

# square brackets contain the ts/tv ratio.

−

~~Profile Indels~~. The ~~reference~~ data ~~sets can be obtained from~~ [[~~Vt#Resource_Bundle|vt resource bundle~~]].

+

# The numbers in curved bracket are the counts of ts and tv SNPs respectively.

+

# Low complexity shows what percent of the SNPs are in low complexity regions.

+

data set

+

No. SNPs : 508603 [2.09]

+

Low complexity : 0.08 (39837/508603) <br>

+

1000g

+

A-B 109970 [1.39]

+

A&B 398633 [2.37]

+

B-A 1340682 [2.26]

+

Precision 78.4%

+

Sensitivity 22.9% <br>

+

dbsnp

+

A-B 324063 [1.99]

+

A&B 184540 [2.29]

+

B-A 103893 [2.60]

+

Precision 36.3%

+

Sensitivity 64.0%

−

~~<div class=" mw-collapsible mw-collapsed">~~

+

# This file contains information on how to process reference data sets.

−

#~~profile indels found in mills.vcf~~

+

#

−

~~vt profile_indels -g indel.~~reference.~~txt mills.vcf -r hs37d5.fa -i 20~~

+

# dataset - name of data set, this label will be printed.

−

+

# type - True Positives (TP) and False Positives (FP)

−

#this ~~is a sample output for indel profiling~~.

+

# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively

−

# ~~square brackets contain the ins/del ratio.~~

+

# - annotation

−

# ~~for the FS/NFS field~~, ~~that is the proportion of coding indels that are frame shifted.~~

+

# file is used for GENCODE annotation of frame shift and non frame shift Indels

−

# ~~The numbers in curved bracket are the counts~~ of frame shift and non frame shift ~~indels respectively.~~

+

# filter - filter applied to variants for this particular data set

−

data set

+

# path - path of indexed BCF file

−

~~No Indels : 46974 [0.89]~~

+

#dataset type filter path

−

FS/~~NFS : 0~~.~~26 (8/23) <br>~~

+

1000g TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/1000G.v5.snps.indels.complex.svs.sites.bcf

−

~~dbsnp~~

+

dbsnp TP N_ALLELE==2&&VTYPE==SNP /net/fantasia/home/atks/ref/vt/grch37/dbSNP138.snps.indels.complex.sites.bcf

−

~~A-B 30704 [0~~.~~92]~~

+

GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz

−

~~A&B 16270 [0~~.~~83]~~

+

DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz

−

~~B-A 2049488 [1~~.~~52]~~

+

−

~~Precision 34~~.6%

+

−

~~Sensitivity~~ 0.~~8% <br>~~

+

usage : vt profile_snps [options] <in.vcf>

−

~~mills~~

−

~~A-B 43234 [0~~.~~88]~~

−

~~A&B 3740 [1~~.~~00]~~

−

~~B-A 203278 [0~~.~~98]~~

−

~~Precision 8~~.0%

−

~~Sensitivity~~ ~~1.8% <br>~~

−

~~mills~~.~~chip~~

−

~~A-B 46847 [0~~.~~89]~~

−

~~A&B 127 [0~~.~~90]~~

−

~~B-A 8777 [0~~.~~93]~~

−

~~Precision 0~~.3%

−

~~Sensitivity~~ 1.~~4% <br>~~

−

~~affy~~.~~exome~~.~~chip~~

−

A-~~B 46911 [0.89]~~

−

~~A&B 63 [0.43]~~

−

~~B-A 33997~~ [~~0.47~~]

−

~~Precision 0~~.1%

−

~~Sensitivity 0.2% <br~~>

−

~~# This file contains information on how to process reference data sets.~~

+

options : -f filter expression []

−

~~# dataset~~ - ~~name of data set, this label will be printed.~~

+

-g file containing list of reference datasets []

−

~~# type - True Positives (TP) and False Positives (FP).~~

+

-I file containing list of intervals []

−

~~# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.~~

+

-i intervals []

−

~~# - annotation.~~

+

-r reference sequence fasta file []

−

~~# file is used for GENCODE annotation of frame shift and non frame shift Indels.~~

+

-? displays help

−

~~# filter~~ ~~- filter applied to variants for this particular data set.~~

−

~~# path - path of indexed BCF file.~~

−

~~#dataset type~~ filter ~~path~~

−

~~1000g TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf~~

−

~~mills TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf~~

−

~~dbsnp TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf~~

−

~~GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz~~

−

~~DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz~~

−

~~<div class="mw-collapsible-content">~~

−

~~usage : vt profile_indels~~ [~~options~~] ~~<in.vcf>~~

−

~~options :~~ -g file containing list of reference datasets []

−

-I file containing list of intervals []

−

-i intervals []

−

-r reference sequence fasta file []

−

-? displays help

</div>

−

=== Profile ~~VNTRs~~ ===

+

=== Profile Indels ===

−

Profile ~~VNTRs~~. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].

+

Profile Indels. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].

+

#profile indels found in mills.vcf

+

vt profile_indels -g indel.reference.txt mills.vcf -r hs37d5.fa -i 20

−

#~~profiles~~ a ~~set of VNTRs~~

+

#this is a sample output for indel profiling.

−

~~vt profile_vntrs vntrs~~.~~sites.bcf -g vntr.reference.txt~~

+

# square brackets contain the ins/del ratio.

−

+

# for the FS/NFS field, that is the proportion of coding indels that are frame shifted.

−

+

# The numbers in curved bracket are the counts of frame shift and non frame shift indels respectively.

−

~~profile_vntrs v0~~.5

+

data set

−

+

No Indels : 46974 [0.89]

−

~~no VNTRs 5660874~~ #~~number~~ of ~~VNTRs in vntrs~~.~~sites.bcf~~

+

FS/NFS : 0.26 (8/23) <br>

−

~~no low complexity 2686460 (47.46%) #number of VNTRs in low complexity region determined by MDUST~~

+

dbsnp

−

~~no coding 17911 (~~0.~~32%) #number of VNTRs in coding regions determined by GENCODE v7~~

+

A-B 30704 [0.92]

−

~~no redundant~~ ~~1312209~~ (23~~.18%~~) ~~#number of VNTRs involved in overlapping with one another~~<br>

+

A&B 16270 [0.83]

−

~~trf_lobstr (1638516) #TRF based reference set used in lobSTR, motif lengths 1 to 6.~~

+

B-A 2049488 [1.52]

−

A-B ~~3269285 #TRs specific to vntrs~~.~~sites.bcf~~

+

Precision 34.6%

−

A-B~~~ 1666185 #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR~~.

+

Sensitivity 0.8% <br>

−

A~~&B1 725404 #TRs in vntrs~~.~~sites.bcf that overlap exactly with at least one TR in TRF(lobSTR)~~

+

mills

−

~~A&B2 723195 #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs~~.~~sites.bcf~~

+

A-B 43234 [0.88]

−

~~B-A~ 710075 #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR.~~

+

A&B 3740 [1.00]

−

~~B-A 205246 #TRs specific to TRF(lobSTR)~~

+

B-A 203278 [0.98]

−

~~#note that the first 3 rows should sum up to the number of TRs in vntrs~~.~~sites.bcf~~

+

Precision 8.0%

−

~~#and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR)~~

+

Sensitivity 1.8% <br>

−

~~#This basically allows us to see the m to n overlapping in overlapping TRs~~<br>

+

mills.chip

−

~~trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6.~~

+

A-B 46847 [0.89]

−

A-B ~~3291652~~

+

A&B 127 [0.90]

−

A-B~~~ 1650190~~

+

B-A 8777 [0.93]

−

A~~&B1~~ ~~719032~~

+

Precision 0.3%

−

~~A&B2~~ ~~716838~~

+

Sensitivity 1.4% <br>

−

~~B-A~ 703948~~

+

affy.exome.chip

−

~~B-A 203767~~ <br>

+

A-B 46911 [0.89]

−

~~trf_vntrseek (230306) #TRF based reference set used in vntrseek, motif lengths 7 to 2000~~.

+

A&B 63 [0.43]

−

A-B ~~5384453~~

+

B-A 33997 [0.47]

−

A-B~~~ 271302~~

+

Precision 0.1%

−

A~~&B1~~ ~~5119~~

+

Sensitivity 0.2% <br>

−

~~A&B2 4973~~

−

~~B-A~ 92496~~

−

~~B-A 132837~~ <br>

−

~~codis+ (15) #CODIS STRs + 2 STRs from PROMEGA~~

−

A-B ~~5660794~~

−

A-B~ 79

−

A~~&B1 1~~

−

~~A&B2~~ 1

−

~~B-A~ 14~~

−

~~B-A~~ 0

# This file contains information on how to process reference data sets.

Line 1,230: Line 1,215:

# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.

# - annotation.

−

# file is used for GENCODE annotation of ~~coding VNTRs~~.

+

# file is used for GENCODE annotation of frame shift and non frame shift Indels.

# filter - filter applied to variants for this particular data set.

# path - path of indexed BCF file.

−

#dataset type filter path

+

#dataset type filter path

−

~~trf_lobstr~~ TP VTYPE==~~VNTR~~ /net/fantasia/home/atks/ref/vt/grch37/~~trf~~.~~lobstr~~.sites.bcf

+

1000g TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/1000G.snps_indels.sites.bcf

−

~~trf_repeatseq~~ TP VTYPE==~~VNTR~~ /net/fantasia/home/atks/ref/vt/grch37/~~trf~~.~~repeatseq~~.sites.bcf

+

mills TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/mills.208620indels.sites.bcf

−

~~trf_vntrseek~~ TP VTYPE==~~VNTR~~ /net/fantasia/home/atks/ref/vt/grch37/~~trf~~.~~vntrseek~~.sites.bcf

+

dbsnp TP N_ALLELE==2&&VTYPE==INDEL /net/fantasia/home/atks/ref/vt/grch37/dbsnp.13147541variants.sites.bcf

−

~~codis+ TP VTYPE==VNTR~~ /net/fantasia/home/atks/ref/vt/grch37/~~codis~~.~~strs~~.~~sites~~.~~bcf~~

+

GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.cds.bed.gz

−

~~GENCODE_V19 cds_annotation~~ . /net/fantasia/home/atks/ref/vt/grch37/~~gencode.v19.cds~~.bed.gz

+

DUST cplx_annotation . /net/fantasia/home/atks/ref/vt/grch37/mdust.bed.gz

−

~~DUST cplx_annotation .~~

−

usage : vt ~~profile_vntrs~~ [options] <in.vcf>

+

usage : vt profile_indels [options] <in.vcf>

options : -g file containing list of reference datasets []

Line 1,252: Line 1,236:

</div>

−

=== Profile ~~Mendelian Errors~~ ===

+

=== Profile VNTRs ===

−

Profile ~~Mendelian errors~~

+

Profile VNTRs. The reference data sets can be obtained from [[Vt#Resource_Bundle|vt resource bundle]].

−

~~#profile mendelian errors found in vt.genotypes.bcf, generate [[media:mendel.pdf|tables]] in the directory mendel, requires pdflatex.~~

−

~~vt profile_mendelian vt.genotypes.bcf -p trios.ped -x mendel~~

−

~~pedigree file format is described in [http://csg~~.~~sph~~.~~umich~~.~~edu//abecasis/merlin/tour/input_files~~.~~html here]~~

+

#profiles a set of VNTRs

+

vt profile_vntrs vntrs.sites.bcf -g vntr.reference.txt

+

−

~~#this is a sample output for mendelian error profiling~~.

+

profile_vntrs v0.5

−

#~~R and A stand for reference and alternate allele respectively~~.

+

−

#~~Error~~% ~~- mendelian error (confounded with de novo mutation~~)

+

no VNTRs 5660874 #number of VNTRs in vntrs.sites.bcf

−

#~~HomHet - Homozygous-Heterozygous genotype ratios~~

+

no low complexity 2686460 (47.46%) #number of VNTRs in low complexity region determined by MDUST

−

#~~Het% - proportion~~ of ~~hets~~

+

no coding 17911 (0.32%) #number of VNTRs in coding regions determined by GENCODE v7

−

~~Mendelian Errors~~ <br>

+

no redundant 1312209 (23.18%) #number of VNTRs involved in overlapping with one another<br>

−

~~Father Mother R/R R/A A/A Error~~(%) ~~HomHet Het(%)~~

+

trf_lobstr (1638516) #TRF based reference set used in lobSTR, motif lengths 1 to 6.

−

~~R/R R/R 14889 210 38~~ 1.~~64 nan nan~~

+

A-B 3269285 #TRs specific to vntrs.sites.bcf

−

~~R/R R/~~A ~~3403 3497 74~~ 1.~~06 0~~.~~97 50.68~~

+

A-B~ 1666185 #TRs in vntrs.sites.bcf that overlap partially with at least one TR in TRF(lobSTR) but does not overlap exactly with another TR.

−

~~R/R~~ A~~/A 176 1482 155 18.26 nan~~ ~~nan~~

+

A&B1 725404 #TRs in vntrs.sites.bcf that overlap exactly with at least one TR in TRF(lobSTR)

−

~~R/A R/R 3665 3652 68~~ 0.~~92 1~~.~~00 49~~.91

+

A&B2 723195 #TRs in TRF(lobSTR) that overlap exactly with at least one TR in vntrs.sites.bcf

−

R/A ~~R/A 1015 3151 990~~ 0.~~00 0~~.~~64 61.11~~

+

B-A~ 710075 #TRs in TRF(lobSTR) that overlap partially with at least one TR in vntrs.sites.bcf but does not overlap exactly with another TR.

−

R/A ~~A/A 43 1300 1401~~ 1.~~57 1~~.~~08 48.13~~

+

B-A 205246 #TRs specific to TRF(lobSTR)

−

A~~/A R/R 172 1365 147 18.94 nan nan~~

+

#note that the first 3 rows should sum up to the number of TRs in vntrs.sites.bcf

−

~~A/A R/A 47 1164 1183~~ 1.~~96 1~~.~~02 49~~.60

+

#and the 4th to 6th rows should sum up to the number of TRs in TRF( lobSTR)

−

A~~/A A/A 20 78 5637~~ 1.~~71 nan nan~~ <br>

+

#This basically allows us to see the m to n overlapping in overlapping TRs<br>

−

~~Parental R/R R/A A/A Error~~(%) ~~HomHet Het(%)~~

+

trf_repeatseq (1624553) #TRF based reference set used in repeatseq, motif lengths 1 to 6.

−

~~R/R R/R 14889 210 38~~ 1.~~64 nan nan~~

+

A-B 3291652

−

~~R/R R/~~A ~~7068 7149 142~~ ~~0.99 0.99 50.28~~

+

A-B~ 1650190

−

~~R/R~~ A~~/A 348 2847 302 18.59 nan~~ ~~nan~~

+

A&B1 719032

−

R/A R/A ~~1015 3151 990~~ ~~0.00 0.64 61.11~~

+

A&B2 716838

−

R/A ~~A/A 90 2464 2584~~ ~~1.75 1.05 48.81~~

+

B-A~ 703948

−

A~~/A A/A 20 78 5637 1.71 nan nan~~ <br>

+

B-A 203767 <br>

−

~~Parental R/R R/~~A A/A ~~Error(%) HomHet Het(%)~~

+

trf_vntrseek (230306) #TRF based reference set used in vntrseek, motif lengths 7 to 2000.

−

~~HOM HOM 14909 288 5675~~ ~~1.66~~ ~~nan nan~~

+

A-B 5384453

−

~~HOM HET 7158 9613 2726~~ ~~1.19~~ ~~1.00 49.90~~

+

A-B~ 271302

−

~~HET HET 1015 3151 990~~ ~~0.00~~ ~~0.64 61.11~~

+

A&B1 5119

−

~~HOMREF HOMALT 348 2847 302 18.59 nan nan~~ <br>

+

A&B2 4973

−

~~total mendelian error :~~ 2~~.505%~~

+

B-A~ 92496

−

~~no. of trios~~ ~~: 2~~

+

B-A 132837 <br>

−

~~no. of variants : 25346~~

+

codis+ (15) #CODIS STRs + 2 STRs from PROMEGA

−

+

A-B 5660794

−

~~= Variant Calling =~~

+

A-B~ 79

+

A&B1 1

+

A&B2 1

+

B-A~ 14

+

B-A 0

+

# This file contains information on how to process reference data sets.

+

# dataset - name of data set, this label will be printed.

+

# type - True Positives (TP) and False Positives (FP).

+

# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively.

+

# - annotation.

+

# file is used for GENCODE annotation of coding VNTRs.

+

# filter - filter applied to variants for this particular data set.

+

# path - path of indexed BCF file.

+

#dataset type filter path

+

trf_lobstr TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.lobstr.sites.bcf

+

trf_repeatseq TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.repeatseq.sites.bcf

+

trf_vntrseek TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/trf.vntrseek.sites.bcf

+

codis+ TP VTYPE==VNTR /net/fantasia/home/atks/ref/vt/grch37/codis.strs.sites.bcf

+

GENCODE_V19 cds_annotation . /net/fantasia/home/atks/ref/vt/grch37/gencode.v19.cds.bed.gz

+

DUST cplx_annotation .

−

~~=== Discover ===~~

−

~~Discovers variants from reads in a BAM/CRAM file.~~

−

~~<div class=" mw-collapsible mw-collapsed">~~

−

~~#discover variants from NA12878.bam and write to stdout~~

−

~~vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20~~

−

usage : vt ~~discover2~~ [options]

+

usage : vt profile_vntrs [options] <in.vcf>

−

options : -b ~~input BAM/CRAM~~ file

+

options : -g file containing list of reference datasets []

−

~~-y soft clipped unique sequences cutoff~~ [0]

+

-I file containing list of intervals []

−

-x ~~soft clipped mean quality cutoff~~ [0]

+

-i intervals []

−

-w ~~insertion desired type II error~~ [~~0.0~~]

+

-r reference sequence fasta file []

−

-c ~~insertion desired type I error~~ [~~0.0~~]

+

-? displays help

−

-h ~~insertion fractional evidence cutoff [0]~~

+

</div>

−

-g ~~insertion count cutoff [1]~~

+

</div>

−

-n ~~deletion desired type II~~ error [0.0]

+

−

-~~m deletion desired type I~~ error [0.0]

+

=== Profile NA12878 ===

−

~~-v deletion fractional evidence cutoff [~~0]

+

−

-~~u deletion count cutoff~~ [1]

+

Profile Mendelian errors

−

~~-k snp desired type II error~~ [0.0]

+

−

-~~j snp desired type I error~~ [0.0]

+

−

-~~f snp fractional evidence cutoff~~ [0]

+

#profile NA12878 overlap with broad knowledgebase and illumina platinum genomes for the file vt.genotypes.bcf for chromosome 20.

−

~~-e snp evidence count cutoff~~ [1]

+

vt profile_na12878 vt.genotypes.bcf -g na12878.reference.txt -r hs37d5.fa -i 20

−

-~~q base quality cutoff for bases~~ [0]

+

−

-C ~~likelihood ratio cutoff [0]~~

+

#this is a sample output for mendelian error profiling.

−

~~-B reference bias [0]~~

+

#R and A stand for reference and alternate allele respectively.

−

~~-a read exclude flag [0x0704]~~

+

#Error% - mendelian error (confounded with de novo mutation)

−

~~-l ignore overlapping reads [false]~~

+

#HomHet - Homozygous-Heterozygous genotype ratios

−

~~-t MAPQ cutoff for alignments [0]~~

+

#Het% - proportion of hets

−

~~-p ploidy [~~2]

+

data set

−

-s ~~sample ID~~

+

No Indels : 27770 [0.94]

−

~~-r reference sequence fasta file []~~

+

FS/NFS : 0.26 (8/23) <br>

−

~~-o output VCF file [-]~~

+

broad.kb

−

~~-z ignore MD tags [~~0]

+

A-B 13071 [1.19]

−

~~-d debug [~~0]

+

A&B 14699 [0.76]

−

~~-I file containing list of intervals []~~

+

B-A 21546 [0.62]

−

-i ~~intervals []~~

+

Precision 52.9%

−

~~-? displays help~~

+

Sensitivity 40.6% <br>

+

illumina.platinum

+

A-B 17952 [0.88]

+

A&B 9818 [1.07]

+

B-A 2418 [0.88]

+

Precision 35.4%

+

Sensitivity 80.2% <br>

+

broad.kb

+

R/R R/A A/A ./.

+

R/R 346 145 3 5473

+

R/A 3 4133 9 758

+

A/A 2 136 2186 956

+

./. 2 139 86 322 <br>

+

Total genotype pairs : 6963

+

Concordance : 95.72% (6665)

+

Discordance : 4.28% (298) <br>

+

illumina.platinum

+

R/R R/A A/A ./.

+

R/R 1768 85 2 0

+

R/A 10 4479 14 0

+

A/A 13 180 3028 0

+

./. 71 98 70 0<br>

+

Total genotype pairs : 9579

+

Concordance : 96.83% (9275)

+

Discordance : 3.17% (304)

−

</~~div>~~

+

# This file contains information on how to process reference data sets.

−

</div>

+

#

+

# dataset - name of data set, this label will be printed.

+

# type - True Positives (TP) and False Positives (FP)

+

# overlap percentages labeled as (Precision, Sensitivity) and (False Discovery Rate, Type I Error) respectively

+

# - annotation

+

# file is used for GENCODE annotation of frame shift and non frame shift Indels

+

# filter - filter applied to variants for this particular data set

+

# path - path of indexed BCF file

+

#dataset type filter path

+

broad.kb TP PASS /net/fantasia/home/atks/dev/vt/bundle/public/grch37/broad.kb.241365variants.genotypes.bcf

+

illumina.platinum TP PASS /net/fantasia/home/atks/dev/vt/bundle/public/grch37/NA12878.illumina.platinum.5284448variants.genotypes.bcf

+

#gencode.v19 annotation . /net/fantasia/home/atks/dev/vt/bundle/public/grch37/gencode.v19.annotation.gtf.gz

+

+

profile_na12878 v0.5

−

~~=== Merge candidate variants ===~~

+

usage : vt profile_na12878 [options] <in.vcf>

−

+

options : -g file containing list of reference datasets []

−

~~Merge candidate variants across samples. Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.~~

−

~~<div class=" mw-collapsible mw-collapsed">~~

−

~~#merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf~~

−

~~vt merge_candidate_variants candidates.txt -o candidate.sites.vcf~~

−

~~<div class="mw-collapsible-content">~~

−

~~usage : vt merge_candidate_variants [options]~~

−

options : -L file containing list of ~~input VCF files~~

−

~~-o output VCF file~~ [-]

-I file containing list of intervals []

−

-i intervals

+

-i intervals []

−

-- ~~ignores the rest of the labeled arguments following this flag~~

+

-r reference sequence fasta file []

−

-h displays help

+

-? displays help

</div>

−

=~~== Remove overlap ==~~=

+

= Variant Calling =

−

~~Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap.~~

−

~~<div class~~=~~" mw-collapsible mw-collapsed">~~

+

=== Discover ===

−

~~#annotates variants that are overlapping~~

−

~~vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf~~

−

~~<div class="mw-collapsible-content">~~

+

Discovers variants from reads in a BAM/CRAM file.

−

~~usage : vt remove_overlap [options] <~~in~~.vcf>~~

−

~~options : -o output VCF file [-]~~

−

-I file ~~containing list of intervals []~~

−

~~-i intervals []~~

−

~~-? displays help~~

−

~~</div>~~

−

~~</div>~~

−

~~=== Annotate Indels ===~~

−

~~Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs~~.

−

#~~annotates indels~~ from ~~VCFs with VNTR information~~.

+

#discover variants from NA12878.bam and write to stdout

−

vt ~~annotate_indels in~~.~~vcf~~ -r hs37d5.fa -~~o annotated.sites.vcf~~

+

vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20

+

+

usage : vt discover2 [options]

−

~~<div style="height~~:~~20em; overflow:auto; border: 2px solid #FFF">~~

+

options : -b input BAM/CRAM file

−

~~CHROM POS ID REF ALT QUAL FILTER~~ ~~INFO~~

+

-y soft clipped unique sequences cutoff [0]

−

~~20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC~~[~~G/A~~]~~CCATGCCCGG~~

+

-x soft clipped mean quality cutoff [0]

−

~~20 82217~~ . ~~G A 1632~~.~~77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC~~[~~G/A~~]~~CCCGGCCCAG~~

+

-w insertion desired type II error [0.0]

−

~~20 83250 . CTGTGTGTG C . . NSAMPLES=~~1~~;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT~~]~~TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT~~

+

-c insertion desired type I error [0.0]

−

~~20 83250~~ . ~~CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT~~]~~TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT~~

+

-h insertion fractional evidence cutoff [0]

−

~~20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR>~~ . ~~. MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=~~0~~;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC~~[~~TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG~~]~~TTTAGTATTT~~

+

-g insertion count cutoff [1]

−

~~20 83252 . G C 359.204 . NSAMPLES=~~1~~;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT~~[~~G/C~~]~~TGTGTGTGTG~~

+

-n deletion desired type II error [0.0]

−

~~20 83260~~ . ~~G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT~~[~~G/C~~]~~TGTGTGTGTG~~

+

-m deletion desired type I error [0.0]

−

~~20 83267 . T C 247.043 . NSAMPLES=~~1~~;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C~~]~~GTGTGTGTGT~~

+

-v deletion fractional evidence cutoff [0]

−

~~20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG~~[~~T/C~~]~~GTGTGTGTGT~~

+

-u deletion count cutoff [1]

−

~~20 90008 .~~ C ~~A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC~~[~~C/A~~]~~AAATACTGTA~~

+

-k snp desired type II error [0.0]

−

~~20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC~~[~~C/T~~]~~ATGGTTGTGC~~

+

-j snp desired type I error [0.0]

−

~~20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG~~[~~G/A~~]~~CTTACGTAAG~~

+

-f snp fractional evidence cutoff [0]

−

~~20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA~~[~~C/T~~]~~AGCAGGACCT~~

+

-e snp evidence count cutoff [1]

−

~~20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC~~[~~A/G~~]~~TTCTCTCTTT~~

+

-q base quality cutoff for bases [0]

−

~~20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT~~[~~A/G~~]~~GTCTGTAAAT~~

+

-C likelihood ratio cutoff [0]

−

~~20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR>~~ ~~. . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC~~[~~TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT~~]~~GAGATGGAGT~~

+

-B reference bias [0]

−

~~20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC~~[~~TTTTTTTTTTTTTTTTTTTTTTTT~~]~~GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T~~

+

-a read exclude flag [0x0704]

−

~~20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG~~[~~A/T~~]~~TCTTAGATCA~~

+

-l ignore overlapping reads [false]

−

~~20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT~~[~~G/A~~]~~TGAGCCGCTG~~

+

-t MAPQ cutoff for alignments [0]

−

~~20 100699~~ ~~. C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT~~[~~C/T~~]~~ACCTGTCAGT~~

+

-p ploidy [2]

−

~~20 101362~~ ~~. G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA~~[~~G/A~~]~~TTTACTTCTC~~

+

-s sample ID

+

-r reference sequence fasta file []

+

-o output VCF file [-]

+

-z ignore MD tags [0]

+

-d debug [0]

+

-I file containing list of intervals []

+

-i intervals []

+

-? displays help

+

</div>

−

~~The following shows~~ the ~~trace of how the algorithm works~~

+

=== Merge candidate variants ===

+

Merge candidate variants across samples. Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.

−

========================================~~====~~

+

−

~~ANNOTATING INDEL FUZZILY~~

+

#merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf

−

~~********************************************~~

+

vt merge_candidate_variants candidates.txt -o candidate.sites.vcf

−

~~EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT~~

+

−

+

usage : vt merge_candidate_variants [options]

−

20:~~131948~~:~~C/CCA~~

+

−

~~EXACT REGION 131948-131965 (18)~~

+

options : -L file containing list of input VCF files

−

~~CCACACACACACACACAA~~

+

-o output VCF file [-]

−

~~FINAL EXACT REGION 131948-131965 (18)~~

+

-I file containing list of intervals []

−

~~CCACACACACACACACAA~~

+

-i intervals

−

~~********************************************~~

+

-- ignores the rest of the labeled arguments following this flag

−

~~PICK CANDIDATE MOTIFS~~

+

-h displays help

−

+

</div>

−

~~Longest Allele :~~ C[CA]~~CACACACACACACACAA~~

+

</div>

−

~~detecting motifs for an str~~

+

−

~~seq~~: ~~CCACACACACACACACACAA~~

+

=== Remove overlap ===

−

~~len~~ : 20

+

−

~~cmax_len~~ : 10

+

Removes overlapping variants in a VCF file by tagging such variants with the FILTER flag overlap.

−

~~candidate motifs~~: 25

+

−

~~AC : 0~~.~~894737 2~~ 0

+

−

~~AAC : 0~~.~~5 3 0~~.~~0555556~~

+

#annotates variants that are overlapping

−

~~ACC : 0~~.~~5 3 0~~.~~0555556~~

+

vt remove_overlap in.vcf -r hs37d5.fa -o overlapped.tagged..vcf

−

~~AAAC : 0~~.~~0588235 4 0~~.~~125 (< 2 copies)~~

+

−

~~ACCC : 0~~.~~0588235 4 0~~.~~125 (< 2 copies)~~

+

−

~~AACAC : 0~~.~~5 5 0~~.02

+

usage : vt remove_overlap [options] <in.vcf>

−

~~ACACC : 0~~.~~5 5 0~~.02

+

−

~~AAACAC : 0~~.~~0666667 6 0~~.~~0555556 (< 2 copies)~~

+

options : -o output VCF file [-]

−

~~ACACCC : 0~~.~~0666667 6 0~~.~~0555556 (< 2 copies)~~

+

-I file containing list of intervals []

−

~~AACACAC : 0~~.~~5 7 0~~.~~0102041~~

+

-i intervals []

−

~~ACACACC : 0~~.~~5 7 0~~.~~0102041~~

+

-? displays help

−

~~AAACACAC :~~ 0.~~0769231 8~~ 0.~~03125 (<~~ 2 ~~copies)~~

+

</div>

−

~~ACACACCC~~ : ~~0.0769231 8 0.03125 (~~< ~~2 copies)~~

+

</div>

−

~~AACACACAC : 0~~.~~5 9 0~~.~~00617284 (<~~ 2 ~~copies)~~

+

−

~~ACACACACC : 0~~.~~5 9 0~~.~~00617284 (< 2 copies)~~

+

=== Annotate Indels ===

−

~~AAACACACAC : 0~~.~~0909091 10 0~~.~~02 (< 2 copies)~~

+

−

~~ACACACACCC : 0~~.~~0909091 10 0~~.~~02 (~~< ~~2 copies)~~

+

Annotates indels with VNTR information and adds a VNTR record. Facilitates the simultaneous calling of VNTR together with Indels and SNPs.

−

~~********************************************~~

+

−

~~PICKING NEXT BEST MOTIF~~

+

−

+

#annotates indels from VCFs with VNTR information.

−

~~selected: AC 0.89 0.00~~

+

vt annotate_indels in.vcf -r hs37d5.fa -o annotated.sites.vcf

−

********************************************

+

−

~~DETECTING REPEAT TRACT FUZZILY~~

+

−

~~++++++++++++++++++++++++++++++++++++++++++++~~

+

CHROM POS ID REF ALT QUAL FILTER INFO

−

~~Exact left/right alignment~~

+

20 82079 . G A 1255.98 . NSAMPLES=1;E=43;N=51;ESUM=43;NSUM=51;FLANKSEQ=GGAGCACGCC[G/A]CCATGCCCGG

−

+

20 82217 . G A 1632.77 . NSAMPLES=1;E=56;N=61;ESUM=56;NSUM=61;FLANKSEQ=GAGCCACCGC[G/A]CCCGGCCCAG

−

~~repeat_tract : CACACACACACACACA~~

+

20 83250 . CTGTGTGTG C . . NSAMPLES=1;E=18;N=35;ESUM=18;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT

−

~~position : [131949,131964]~~

+

20 83250 . CTGTGTGTGTG C . . NSAMPLES=1;E=3;N=35;ESUM=3;NSUM=35;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT]TTAGTATTTG;GMOTIF=GT;TR=20:83251:TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG:<VNTR>:GT

−

~~motif_concordance : 1~~

+

20 83251 . TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG <VNTR> . . MOTIF=GT;RU=TG;FZ_CONCORDANCE=1;FZ_RL=52;FZ_LL=0;FLANKS=83250,83304;FZ_FLANKS=83250,83303;FZ_RU_COUNTS=26,26;FLANKSEQ=TCTCTCTCTC[TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTG]TTTAGTATTT

−

~~repeat units : 8~~

+

20 83252 . G C 359.204 . NSAMPLES=1;E=13;N=14;ESUM=13;NSUM=14;FLANKSEQ=CTCTCTCTCT[G/C]TGTGTGTGTG

−

~~exact repeat units : 8~~

+

20 83260 . G C 500.163 . NSAMPLES=1;E=18;N=34;ESUM=18;NSUM=34;FLANKSEQ=CTGTGTGTGT[G/C]TGTGTGTGTG

−

~~total no. of repeat units : 8~~

+

20 83267 . T C 247.043 . NSAMPLES=1;E=11;N=43;ESUM=11;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT

−

+

20 83275 . T C 609.669 . NSAMPLES=1;E=24;N=43;ESUM=24;NSUM=43;FLANKSEQ=TGTGTGTGTG[T/C]GTGTGTGTGT

−

~~++++++++++++++++++++++++++++++++++++++++++++~~

+

20 90008 . C A 1546.88 . NSAMPLES=1;E=52;N=60;ESUM=52;NSUM=60;FLANKSEQ=AACAGAAAAC[C/A]AAATACTGTA

−

~~Fuzzy right alignment~~

+

20 91088 . C T 1766.04 . NSAMPLES=1;E=58;N=66;ESUM=58;NSUM=66;FLANKSEQ=CCCAGCATAC[C/T]ATGGTTGTGC

−

+

20 91508 . G A 1266.93 . NSAMPLES=1;E=44;N=53;ESUM=44;NSUM=53;FLANKSEQ=AATTAGTAAG[G/A]CTTACGTAAG

−

~~repeat motif : CA~~

+

20 91707 . C T 888.134 . NSAMPLES=1;E=30;N=53;ESUM=30;NSUM=53;FLANKSEQ=TGATTTTCTA[C/T]AGCAGGACCT

−

~~rflank : AACTC~~

+

20 92527 . A G 828.593 . NSAMPLES=1;E=34;N=40;ESUM=34;NSUM=40;FLANKSEQ=ATTAATTGCC[A/G]TTCTCTCTTT

−

~~mlen : 2~~

+

20 93440 . A G 688.144 . NSAMPLES=1;E=24;N=58;ESUM=24;NSUM=58;FLANKSEQ=TTGGATGCAT[A/G]GTCTGTAAAT

−

~~rflen : 5~~

+

20 93636 . TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT <VNTR> . . MOTIF=T;RU=T;FZ_CONCORDANCE=0.939394;FZ_RL=35;FZ_LL=0;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FZ_RU_COUNTS=31,33;FLANKSEQ=TCTAGGATTC[TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT

−

~~plen : 111~~

+

20 93646 . C CT . . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKS=93646,93671;FZ_FLANKS=93635,93671;FLANKSEQ=TTTTTCTTTC[TTTTTTTTTTTTTTTTTTTTTTTT]GAGATGGAGT;GMOTIF=T;TR=20:93636:TTTTTTCTTTCTTTTTTTTTTTTTTTTTTTTTTTT:<VNTR>:T

+

20 93717 . A T 31.7622 . NSAMPLES=1;E=2;N=29;ESUM=2;NSUM=29;FLANKSEQ=CAGTGGCGTG[A/T]TCTTAGATCA

+

20 93931 . G A 628.149 . NSAMPLES=1;E=22;N=53;ESUM=22;NSUM=53;FLANKSEQ=GATTACAGGT[G/A]TGAGCCGCTG

+

20 100699 . C T 809.09 . NSAMPLES=1;E=28;N=61;ESUM=28;NSUM=61;FLANKSEQ=GGTGAAAAAT[C/T]ACCTGTCAGT

+

20 101362 . G A 1087.13 . NSAMPLES=1;E=36;N=67;ESUM=36;NSUM=67;FLANKSEQ=TAATACTGAA[G/A]TTTACTTCTC

+

</div>

+

The following shows the trace of how the algorithm works

+

============================================

+

ANNOTATING INDEL FUZZILY

+

********************************************

+

EXTRACTIING REGION BY EXACT LEFT AND RIGHT ALIGNMENT

−

~~read~~ : ~~AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC~~

+

20:131948:C/CCA

−

~~rlen~~ : ~~106~~

+

EXACT REGION 131948-131965 (18)

−

+

CCACACACACACACACAA

−

~~optimal score: 50.5073~~

+

FINAL EXACT REGION 131948-131965 (18)

−

~~optimal state: MR~~

+

CCACACACACACACACAA

−

~~optimal track: MR|r|0|5~~

+

********************************************

−

~~optimal probe len: 25~~

+

PICK CANDIDATE MOTIFS

−

~~optimal path length : 107~~

−

~~max j: 106~~

−

~~probe: (1~82) [1~10] (1~5)~~

−

~~read : (1~82) [83~101] (102~106)~~

−

~~motif #~~ : 10 [~~83,101~~]

+

Longest Allele : C[CA]CACACACACACACACAA

−

~~motif concordance~~ : ~~95%~~ (~~9/10~~)

+

detecting motifs for an str

−

~~motif discordance~~ : 0~~|1|~~0|0|0|0|0|0|0|0

+

seq: CCACACACACACACACACAA

−

+

len : 20

−

~~Model~~: ~~----------------------------------------------------------------------------------CACACACACACACACACACAAACTC~~

+

cmax_len : 10

−

~~SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME~~

+

candidate motifs: 25

−

~~oo++oo++oo++oo++oo++RRRRR~~

+

AC : 0.894737 2 0

−

~~Read~~: ~~AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC~~

+

AAC : 0.5 3 0.0555556

+

ACC : 0.5 3 0.0555556

+

AAAC : 0.0588235 4 0.125 (< 2 copies)

+

ACCC : 0.0588235 4 0.125 (< 2 copies)

+

AACAC : 0.5 5 0.02

+

ACACC : 0.5 5 0.02

+

AAACAC : 0.0666667 6 0.0555556 (< 2 copies)

+

ACACCC : 0.0666667 6 0.0555556 (< 2 copies)

+

AACACAC : 0.5 7 0.0102041

+

ACACACC : 0.5 7 0.0102041

+

AAACACAC : 0.0769231 8 0.03125 (< 2 copies)

+

ACACACCC : 0.0769231 8 0.03125 (< 2 copies)

+

AACACACAC : 0.5 9 0.00617284 (< 2 copies)

+

ACACACACC : 0.5 9 0.00617284 (< 2 copies)

+

AAACACACAC : 0.0909091 10 0.02 (< 2 copies)

+

ACACACACCC : 0.0909091 10 0.02 (< 2 copies)

+

********************************************

+

PICKING NEXT BEST MOTIF

+

selected: AC 0.89 0.00

+

********************************************

+

DETECTING REPEAT TRACT FUZZILY

++++++++++++++++++++++++++++++++++++++++++++

−

~~Fuzzy~~ left alignment

+

Exact left/right alignment

−

~~lflank~~ : ~~ATCTTA~~

+

repeat_tract : CACACACACACACACA

−

repeat motif : CA

+

position : [131949,131964]

−

~~lflen~~ : 6

+

motif_concordance : 1

+

repeat units : 8

+

exact repeat units : 8

+

total no. of repeat units : 8

+

++++++++++++++++++++++++++++++++++++++++++++

+

Fuzzy right alignment

+

repeat motif : CA

+

rflank : AACTC

mlen : 2

+

rflen : 5

plen : 111

−

read : ~~ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT~~

+

read : AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACACCACACACACACACACAAACTC

−

rlen : ~~105~~

+

rlen : 106

−

optimal score: 50.~~5858~~

+

optimal score: 50.5073

−

optimal state: Z

+

optimal state: MR

−

optimal track: Z|m|10|2

+

optimal track: MR|r|0|5

−

optimal probe len: 26

+

optimal probe len: 25

−

optimal path length : ~~106~~

+

optimal path length : 107

−

max j: ~~105~~

+

max j: 106

−

~~mismatch penalty~~: 3

+

probe: (1~82) [1~10] (1~5)

+

read : (1~82) [83~101] (102~106)

−

~~model: (1~6) [1~10]~~

+

motif # : 10 [83,101]

−

~~read : (1~6) [7~25][26~106]~~

−

motif # : 10 [7,25]

motif concordance : 95% (9/10)

motif discordance : 0|1|0|0|0|0|0|0|0|0

−

Model: ~~ATCTTACACACACACACACACACACA~~--------------------------------------------------------------------------------

+

Model: ----------------------------------------------------------------------------------CACACACACACACACACACAAACTC

−

~~SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE~~

+

SYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMDMMMMMMMMMMMMMMMMMMMMME

−

~~LLLLLLoo~~++oo++oo++oo++oo++

+

oo++oo++oo++oo++oo++RRRRR

−

Read: ~~ATCTTACAC~~-~~CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT~~

+

Read: AGAAATGATAGTCACTTCAACAGATGGTGTTGGGAAAACTGGATTTCCACAGGCAGAACAAATGAAATGGATCCTTATCTTACAC-CACACACACACACACAAACTC

−

~~xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx~~

+

++++++++++++++++++++++++++++++++++++++++++++

−

~~VNTR Summary~~

+

Fuzzy left alignment

−

~~rid : 19~~

−

~~motif : AC~~

−

~~ru : CA~~

−

~~Exact~~

+

lflank : ATCTTA

−

~~repeat_tract~~ : ~~CACACACACACACACA~~

+

repeat motif : CA

−

~~position~~ : ~~[131949,131964]~~

+

lflen : 6

−

~~reference repeat unit length~~ : 8

+

mlen : 2

−

~~motif_concordance~~ : 1

+

plen : 111

−

~~repeat units : 8~~

+

−

~~exact repeat units~~ : 8

+

read : ATCTTACACCACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT

−

~~total no. of repeat units~~ : 8

+

rlen : 105

−

Fuzzy

+

optimal score: 50.5858

−

repeat_tract : CACCACACACACACACACA

+

optimal state: Z

−

position : [131946,131964]

+

optimal track: Z|m|10|2

−

reference repeat unit length : 19

+

optimal probe len: 26

−

motif_concordance : 0.95

+

optimal path length : 106

−

repeat units : 19

+

max j: 105

−

exact repeat units : 9

+

mismatch penalty: 3

−

total no. of repeat units : 10

+

−

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

+

model: (1~6) [1~10]

−

+

read : (1~6) [7~25][26~106]

−

+

−

usage : vt annotate_indels [options] <in.vcf>

+

motif # : 10 [7,25]

−

+

motif concordance : 95% (9/10)

−

options : -v add vntr record [false]

+

motif discordance : 0|1|0|0|0|0|0|0|0|0

−

-x override tags [false]

+

−

-f filter expression []

+

Model: ATCTTACACACACACACACACACACA--------------------------------------------------------------------------------

−

-d debug [false]

+

SMMMMMMMMMDMMMMMMMMMMMMMMMMZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZE

−

-m mode [f]

+

LLLLLLoo++oo++oo++oo++oo++

−

e : by exact alignment f : by fuzzy alignment

+

Read: ATCTTACAC-CACACACACACACACAAACTCAAAATGGATTTAAAGACTTAAATGTGAGCCTGGCAAACTTAAAACTCCTAAAATAAAACAGAAGGGAATATCTTT

−

-c classification schemas of tandem repeat [6]

+

−

1 : lai2003

+

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

−

2 : kelkar2008

+

VNTR Summary

−

3 : fondon2012

+

rid : 19

−

4 : ananda2013

+

motif : AC

−

5 : willems2014

+

ru : CA

−

6 : tan_kang2015

+

−

-a annotation type [v]

+

Exact

−

v : a. output VNTR variant (defined by classification).

+

repeat_tract : CACACACACACACACA

−

RU repeat unit on reference sequence (CA)

+

position : [131949,131964]

−

MOTIF canonical representation (AC)

+

reference repeat unit length : 8

−

RL repeat tract length in bases (11)

+

motif_concordance : 1

−

FLANKS flanking positions of repeat tract determined by exact alignment

+

repeat units : 8

−

RU_COUNTS number of exact repeat units and total number of repeat units in

+

exact repeat units : 8

−

repeat tract determined by exact alignment

+

total no. of repeat units : 8

−

FZ_RL fuzzy repeat tract length in bases (11)

+

−

FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment

+

Fuzzy

−

FZ_RU_COUNTS number of exact repeat units and total number of repeat units in

+

repeat_tract : CACCACACACACACACACA

−

repeat tract determined by fuzzy alignment

+

position : [131946,131964]

−

FLANKSEQ flanking sequence of indel

+

reference repeat unit length : 19

−

LARGE_REPEAT_REGION repeat region exceeding 2000bp

+

motif_concordance : 0.95

−

b. mark indels with overlapping VNTR.

+

repeat units : 19

−

FLANKS flanking positions of repeat tract determined by exact alignment

+

exact repeat units : 9

−

FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment

+

total no. of repeat units : 10

−

GMOTIF generating motif used in fuzzy alignment

+

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

−

TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>)

+

−

a : annotate each indel with RU, RL, MOTIF, REF.

+

−

-r reference sequence fasta file []

+

usage : vt annotate_indels [options] <in.vcf>

−

-o output VCF file [-]

+

−

-I file containing list of intervals []

+

options : -v add vntr record [false]

−

-i intervals

+

-x override tags [false]

−

-? displays help

+

-f filter expression []

−

</div>

+

-d debug [false]

−

</div>

+

-m mode [f]

−

+

e : by exact alignment f : by fuzzy alignment

−

=== Construct Probes ===

+

-c classification schemas of tandem repeat [6]

+

1 : lai2003

+

2 : kelkar2008

+

3 : fondon2012

+

4 : ananda2013

+

5 : willems2014

+

6 : tan_kang2015

+

-a annotation type [v]

+

v : a. output VNTR variant (defined by classification).

+

RU repeat unit on reference sequence (CA)

+

MOTIF canonical representation (AC)

+

RL repeat tract length in bases (11)

+

FLANKS flanking positions of repeat tract determined by exact alignment

+

RU_COUNTS number of exact repeat units and total number of repeat units in

+

repeat tract determined by exact alignment

+

FZ_RL fuzzy repeat tract length in bases (11)

+

FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment

+

FZ_RU_COUNTS number of exact repeat units and total number of repeat units in

+

repeat tract determined by fuzzy alignment

+

FLANKSEQ flanking sequence of indel

+

LARGE_REPEAT_REGION repeat region exceeding 2000bp

+

b. mark indels with overlapping VNTR.

+

FLANKS flanking positions of repeat tract determined by exact alignment

+

FZ_FLANKS flanking positions of repeat tract determined by fuzzy alignment

+

GMOTIF generating motif used in fuzzy alignment

+

TR position and alleles of VNTR (20:23413:CACACACACAC:<VNTR>)

+

a : annotate each indel with RU, RL, MOTIF, REF.

+

-r reference sequence fasta file []

+

-o output VCF file [-]

+

-I file containing list of intervals []

+

-i intervals

+

-? displays help

+

</div>

+

</div>

+

=== Construct Probes ===

Line 1,606: Line 1,696:

#construct probes from candidate.sites.bcf and output to standard out

vt construct_probes candidates.sites.bcf -r ref.fa

−

+

−

usage : vt construct_probes [options] <in.vcf>

+

usage : vt construct_probes [options] <in.vcf>

+

options : -o output VCF file [-]

+

-f minimum flank length [20]

+

-r reference sequence fasta file []

+

-I file containing list of intervals []

+

-i intervals []

+

-- ignores the rest of the labeled arguments following this flag

+

-h displays help

+

</div>

+

</div>

+

=== Genotype ===

+

Genotypes variants for each sample.

+

+

#genotypes variants found in candidate.sites.vcf from sample.bam

+

vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf

+

+

usage : vt genotype [options]

+

options : -r reference sequence fasta file []

+

-s sample ID []

+

-o output VCF file [-]

+

-b input BAM file []

+

-i input candidate VCF file []

+

-- ignores the rest of the labeled arguments following this flag

+

-h displays help

+

</div>

+

</div>

+

= Pedigree File =

+

vt understands an augmented version introduced by [mailto:hmkang@umich.edu Hyun] of the PED described by [http://zzz.bwh.harvard.edu/plink/data.shtml#ped plink].

+

The pedigree file format is as follows with the following mandatory fields:

+

{| class="wikitable"

+

|-

+

! scope="col"| Field

+

! scope="col"| Description

+

! scope="col"| Valid Values

+

! scope="col"| Missing Values

+

|-

+

|Family ID<br>

+

Individual ID<br>

+

Paternal ID<br>

+

Maternal ID<br>

+

Sex<br>

+

Phenotype

+

|ID of this family <br>

+

ID(s) of this individual (comma separated) <br>

+

ID of the father <br>

+

ID of the mother <br>

+

Sex of the individual<br>

+

Phenotype

+

|[A-Za-z0-9_]+<br>

+

[A-Za-z0-9_]+(,[A-Za-z0-9_]+)* <br>

+

[A-Za-z0-9_]+ <br>

+

[A-Za-z0-9_]+<br>

+

1=male, 2=female, other, male, female<br>

+

[A-Za-z0-9_]+

+

| 0 <br>

+

cannot be missing <br>

+

0 <br>

+

0 <br>

+

other<br>

+

-9

+

|}

−

~~options~~ : ~~-o output VCF file [-]~~

+

Examples:

−

~~-f minimum flank length [20]~~

−

~~-r reference sequence fasta file []~~

−

~~-I file containing list of intervals []~~

−

~~-i intervals []~~

−

~~-- ignores the rest of the labeled arguments following this flag~~

−

~~-h displays help~~

−

~~</div>~~

−

~~</div>~~

−

~~=== Genotype ===~~

+

ceu NA12878 NA12891 NA12892 female -9

+

yri NA19240 NA19239 NA19238 female -9

−

~~Genotypes variants for each sample.~~

+

ceu NA12878 NA12891 NA12892 2 -9

+

yri NA19240 NA19239 NA19238 2 -9

−

~~<div class=" mw-collapsible mw-collapsed">~~

+

#allows tools like profile_mendelian to detect duplicates and check for concordance

−

#~~genotypes variants found in candidate.sites.vcf from sample.bam~~

+

ceu NA12878,NA12878A NA12891 NA12892 female case

−

~~vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf~~

+

yri NA19240 NA19239 NA19238 female control

−

~~<div class="mw-collapsible-content">~~

−

~~usage : vt genotype [options]~~

−

~~options : -r reference sequence fasta file []~~

+

#allows tools like profile_mendelian to detect duplicates and check for concordance

−

~~-s sample ID []~~

+

ceu NA12412 0 0 female case

−

~~-o output VCF file [-]~~

+

yri NA19650 0 0 female control

−

-b ~~input BAM file []~~

−

-i ~~input candidate VCF file []~~

−

-- ~~ignores the rest of the labeled arguments following this flag~~

−

-h ~~displays help~~

−

~~</div>~~

−

~~</div>~~

= Resource Bundle =

Atks

1,102

edits

Changes

Vt (view source)

Revision as of 01:03, 2 March 2018

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools