Changes

1,743 bytes added , 18:38, 2 February 2012

→‎In development/Pending update

Line 6: Line 6:

arf [options] <vcf-file>

−

Here ~~is an example~~ of how <code>arf</code> works:

+

Here are examples of how <code>arf</code> works:

+

#-c option directs the output to STDOUT

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -c

+

#-o option specifies an output file name

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf

+

#input VCF file can be gzipped

+

arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf

+

#multiple analyses/annotations at once is possible

+

arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz

+

#estimates allele and genotype frequencies from genotype likelihoods.

+

#AF - Allele frequency estimates of alternate alleles (EM)

+

#HWEAF - Allele frequency estimates of alternate alleles under the assumption of HWE equilibrium (EM)

+

#GF - Genotype frequency estimates (EM)

+

arf -a freq 1000g.vcf

#conducts HWE LRT test from genotype likelihoods (multiallelic)

#adds the info tags

#HWP - HWE P-value

−

#HWCHISQ - HWE ~~Chisquare~~ value

+

#HWCHISQ - HWE Chi-square value

−

#HWDOF - Degrees of ~~Freedom~~ for test

+

#HWDOF - Degrees of freedom for test

−

#AF - ~~Allele frequency~~ estimates ~~of alternate alleles (EM)~~

+

#will generate frequency tags.

−

#~~HWEAF~~ - ~~Allele frequency estimates of alternate alleles under~~ the ~~assumption of HWE equilibrium (EM)~~

+

arf -a hwe 1000g.vcf

−

#~~GF - Genotype frequency~~ estimates ~~(EM)~~

+

−

arf -s hwe 1000g.vcf

+

#estimates Inbreeding coefficient F from genotype likelihood

+

#adds the info tag

+

#F - Inbreeding coefficient

+

arf -a f 1000g.vcf

+

#you can also do both analysis at the same time

+

#performs both HWE test and estimates F

+

arf -a hwe,f 1000g.vcf

−

#~~estimates Inbreeding Coefficient F from genotype likelihood~~

+

#annotates exonic regions

#adds the info tag

−

#F - ~~Inbreeding Coefficient~~

+

#EXON - flag

−

arf -~~s f~~ 1000g.vcf

+

arf -a exons 1000g.vcf -f refGene.txt

−

# ~~performs both HWE test and estimates F~~

+

#reference file can be gzipped up

−

arf -~~s hwe,f~~ 1000g.vcf

+

arf -a exons 1000g.vcf -f refGene.txt.gz

−

+

−

# ~~annotates exonic regions~~

+

#computes extracts flanking sequence around a variant

#adds the info tag

−

#~~EXON~~ - ~~flag~~

+

#FLANKS - 5' sequence, reference allele, 3' sequence up to length n defined by option -l, default is 25

−

arf -a ~~exon~~ 1000g.vcf

+

arf -a flanks 1000g.vcf -g genome.fa -l 30

#computes a complexity measure for flanking sequences around a variant

#adds the info tag

−

#C - complexity measure

+

#CPXY - complexity measure for flanks of length l defined by option -l, default is 25

−

arf -a c 1000g.vcf -g genome.fa

+

arf -a complexity 1000g.vcf -g genome.fa -l 30

+

== In development/Pending update ==

−

== ~~Command Line Options~~ ==

+

#annotates variants

+

##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR">

+

#-l option defines the length in which to differentiate INDELs and SVs

+

arf -a vartype 1000g.vcf -l 30

−

~~vcf-file VCF file (can be gzipped or bgzipped)~~

+

#compute [[Genotype Likelihood Based Allele Balance]]

−

~~g genome-file (Memory Mapped Sequence file)~~

+

##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods">

−

~~(note that if genome.fa is specified,~~ the ~~actual file looked for is genome~~-bs.~~umfa)~~

+

#requires PL/GL and DP in the genotype fields

−

~~s statistical analysis~~

+

arf -a ab 1000g.vcf

−

~~a annotation~~

−

~~Here is~~ an ~~example of how~~ arf ~~works:~~

+

#-e option

+

#when used in conjunction with an analysis that requires allele or genotype frequency estimates,

+

#will attempt to find estimates in the AF, GF and HWEAF fields

+

arf -a ab 1000g.vcf -e

−

~~#computes HWE and F statistics from genotype likelihoods~~

+

== Command Line Options ==

−

~~arf~~ -~~s hwe~~,f ~~1kg~~.vcf

+

vcf-file VCF file (can be gzipped or bgzipped)

+

h help page

+

g genome-file (fasta file)

+

(note that if genome.fa is specified, the actual file looked

+

for is genome-bs.umfa, if the memory mapped file is not

+

found, it will be automatically generated from the fasta file)

+

l length of flanking sequence (default is 25)

+

a analysis/annotation

+

o output file name (default is arf.vcf)

+

f annotation file name (can be gzipped)

== Output ==

−

~~user@host:~$ vmatch gatk~~.vcf ~~samtools~~.~~vcf -w 10 -d~~

+

An output file is generated with the name arf.vcf

−

+

The file name can be specified with the -o option.

+

Log files are generated in arf.log

== Description ==

−

~~Outputs 2~~ files

+

Basically deals with VCF files, generate additional info tags in an output VCF file.

−

~~match.txt : gives the matched pairs~~

−

~~1)id1~~

−

~~2)id2~~

−

~~3)match type~~

−

~~4)extended no of bases~~

−

~~5)normalized~~

−

~~match.log : Details of the extension and normalization process for all compared pairs~~

−

~~vmatch matches the variants~~ in ~~2 VCF files by choosing the best match for every~~

−

~~possible variant pair. The percentage of matches is given at 3 levels for each~~

−

~~variant total of both~~ VCF ~~files~~.

−

+

== Download ==

−

~~== Download ==~~

+

For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215].

+

You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.

−

~~For arf 0~~.~~557215, we provide binaries for linux machines~~.

+

You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt].

−

You will also need a copy of the memory mapped file: [http://www-personal.umich.edu/~atks/human.g1k.v37-bs.umfa.gz human.g1k.v37-bs.umfa]. Please gunzip it before usage. Note that to use it, please refer to the file as human.g1k.v37.fa, it will be automatically renamed as human.g1k.v37-bs.umfa by arf.

This page is maintained by [mailto:atks@umich.edu Adrian].

Atks

1,102

edits

Changes

Arf (view source)

Revision as of 18:38, 2 February 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools