Changes

1,373 bytes added , 18:38, 2 February 2012

→‎In development/Pending update

Line 6: Line 6:

arf [options] <vcf-file>

−

Here ~~is an example~~ of how <code>arf</code> works:

+

Here are examples of how <code>arf</code> works:

+

#-c option directs the output to STDOUT

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -c

+

#-o option specifies an output file name

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf

+

#input VCF file can be gzipped

+

arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf

+

#multiple analyses/annotations at once is possible

+

arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz

#estimates allele and genotype frequencies from genotype likelihoods.

Line 17: Line 29:

#adds the info tags

#HWP - HWE P-value

−

#HWCHISQ - HWE ~~Chisquare~~ value

+

#HWCHISQ - HWE Chi-square value

−

#HWDOF - Degrees of ~~Freedom~~ for test

+

#HWDOF - Degrees of freedom for test

#will generate frequency tags.

arf -a hwe 1000g.vcf

−

~~#conducts HWE LRT test from genotype likelihoods (multiallelic)~~

+

#estimates Inbreeding coefficient F from genotype likelihood

−

~~#will attempt to use existing allele frequency estimates in the info~~

−

~~arf -a hwe 1000g.vcf -e~~

−

#estimates Inbreeding ~~Coefficient~~ F from genotype likelihood

#adds the info tag

−

#F - Inbreeding ~~Coefficient~~

+

#F - Inbreeding coefficient

arf -a f 1000g.vcf

Line 34: Line 42:

#performs both HWE test and estimates F

arf -a hwe,f 1000g.vcf

−

+

#annotates exonic regions

#adds the info tag

−

#EXON - flag

+

#EXON - flag

−

arf -a ~~exon~~ 1000g.vcf

+

arf -a exons 1000g.vcf -f refGene.txt

+

#reference file can be gzipped up

+

arf -a exons 1000g.vcf -f refGene.txt.gz

#computes extracts flanking sequence around a variant

Line 47: Line 58:

#computes a complexity measure for flanking sequences around a variant

#adds the info tag

−

#~~FLANKS~~ - ~~5' sequence, reference allele, 3' sequence up to~~ length n defined by option -l, default is 25

+

#CPXY - complexity measure for flanks of length l defined by option -l, default is 25

−

#~~CPXY~~ - ~~complexity measure~~

+

arf -a complexity 1000g.vcf -g genome.fa -l 30

−

arf -a c 1000g.vcf -~~g genome~~.fa

+

== In development/Pending update ==

+

#annotates variants

+

##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR">

+

#-l option defines the length in which to differentiate INDELs and SVs

+

arf -a vartype 1000g.vcf -l 30

+

#compute [[Genotype Likelihood Based Allele Balance]]

+

##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods">

+

#requires PL/GL and DP in the genotype fields

+

arf -a ab 1000g.vcf

+

#-e option

+

#when used in conjunction with an analysis that requires allele or genotype frequency estimates,

+

#will attempt to find estimates in the AF, GF and HWEAF fields

+

arf -a ab 1000g.vcf -e

== Command Line Options ==

−

vcf-file VCF file (can be gzipped or bgzipped)

+

vcf-file VCF file (can be gzipped or bgzipped)

+

h help page

g genome-file (fasta file)

(note that if genome.fa is specified, the actual file looked

for is genome-bs.umfa, if the memory mapped file is not

found, it will be automatically generated from the fasta file)

+

l length of flanking sequence (default is 25)

a analysis/annotation

−

o output file name

+

o output file name (default is arf.vcf)

+

f annotation file name (can be gzipped)

== Output ==

Line 71: Line 101:

Basically deals with VCF files, generate additional info tags in an output VCF file.

−

~~Deals with hard calls as well as genotype likelihoods.~~

== Download ==

−

For arf 0.557215, we provide binaries for linux machines.

+

For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215].

+

You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.

+

You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt].

−

You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.

This page is maintained by [mailto:atks@umich.edu Adrian].

Atks

1,102

edits

Changes

Arf (view source)

Revision as of 18:38, 2 February 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools