Changes

1,507 bytes added , 18:38, 2 February 2012

→‎In development/Pending update

Line 6: Line 6:

arf [options] <vcf-file>

−

Here ~~is an example~~ of how <code>arf</code> works:

+

Here are examples of how <code>arf</code> works:

+

#-c option directs the output to STDOUT

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -c

+

#-o option specifies an output file name

+

arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf

+

#input VCF file can be gzipped

+

arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf

+

#multiple analyses/annotations at once is possible

+

arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz

#estimates allele and genotype frequencies from genotype likelihoods.

Line 34: Line 46:

#adds the info tag

#EXON - flag

−

arf -a ~~exon~~ 1000g.vcf -f refGene.txt

+

arf -a exons 1000g.vcf -f refGene.txt

+

#reference file can be gzipped up

+

arf -a exons 1000g.vcf -f refGene.txt.gz

#computes extracts flanking sequence around a variant

Line 45: Line 60:

#CPXY - complexity measure for flanks of length l defined by option -l, default is 25

arf -a complexity 1000g.vcf -g genome.fa -l 30

+

== In development/Pending update ==

+

#annotates variants

+

##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR">

+

#-l option defines the length in which to differentiate INDELs and SVs

+

arf -a vartype 1000g.vcf -l 30

+

#compute [[Genotype Likelihood Based Allele Balance]]

+

##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods">

+

#requires PL/GL and DP in the genotype fields

+

arf -a ab 1000g.vcf

+

#-e option

+

#when used in conjunction with an analysis that requires allele or genotype frequency estimates,

+

#will attempt to find estimates in the AF, GF and HWEAF fields

+

arf -a ab 1000g.vcf -e

== Command Line Options ==

−

vcf-file VCF file (can be gzipped or bgzipped)

+

vcf-file VCF file (can be gzipped or bgzipped)

+

h help page

g genome-file (fasta file)

(note that if genome.fa is specified, the actual file looked

for is genome-bs.umfa, if the memory mapped file is not

found, it will be automatically generated from the fasta file)

−

l length of flanking sequence

+

l length of flanking sequence (default is 25)

a analysis/annotation

−

o output file name

+

o output file name (default is arf.vcf)

f annotation file name (can be gzipped)

Line 71: Line 104:

== Download ==

−

For arf 0.557215, we provide binaries for linux machines.

+

For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215].

+

You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.

+

You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt].

−

You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.

This page is maintained by [mailto:atks@umich.edu Adrian].

Atks

1,102

edits

Changes

Arf (view source)

Revision as of 18:38, 2 February 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools