Difference between revisions of "Arf"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 53: | Line 53: | ||
vcf-file VCF file (can be gzipped or bgzipped) | vcf-file VCF file (can be gzipped or bgzipped) | ||
− | g genome-file ( | + | g genome-file (fasta file) |
− | (note that if genome.fa is specified, the actual file looked for is genome-bs.umfa) | + | (note that if genome.fa is specified, the actual file looked |
+ | for is genome-bs.umfa, if the memory mapped file is not | ||
+ | found, it will be automatically generated from the fasta file) | ||
s statistical analysis | s statistical analysis | ||
a annotation | a annotation |
Revision as of 17:02, 17 January 2012
arf is a genetic analysis program for sequencing data.
Basic Usage Example
arf [options] <vcf-file>
Here is an example of how arf
works:
#estimates allele and genotype frequencies from genotype likelihoods. #AF - Allele frequency estimates of alternate alleles (EM) #HWEAF - Allele frequency estimates of alternate alleles under the assumption of HWE equilibrium (EM) #GF - Genotype frequency estimates (EM) arf -s freq 1000g.vcf
#conducts HWE LRT test from genotype likelihoods (multiallelic) #adds the info tags #HWP - HWE P-value #HWCHISQ - HWE Chisquare value #HWDOF - Degrees of Freedom for test #will generate frequency tags. arf -s hwe 1000g.vcf
#conducts HWE LRT test from genotype likelihoods (multiallelic) #adds the info tags #HWP - HWE P-value #HWCHISQ - HWE Chisquare value #HWDOF - Degrees of Freedom for test #will attempt to use existing allele frequency estimates in the info #fields if they exist, otherwise it will estimate the frequencies from the data. arf -s hwe 1000g.vcf -e
#estimates Inbreeding Coefficient F from genotype likelihood #adds the info tag #F - Inbreeding Coefficient arf -s f 1000g.vcf
#you can also do both analysis at the same time #performs both HWE test and estimates F arf -s hwe,f 1000g.vcf #annotates exonic regions #adds the info tag #EXON - flag arf -a exon 1000g.vcf
#computes a complexity measure for flanking sequences around a variant #adds the info tag #C - complexity measure arf -a c 1000g.vcf -g genome.fa
Command Line Options
vcf-file VCF file (can be gzipped or bgzipped) g genome-file (fasta file) (note that if genome.fa is specified, the actual file looked for is genome-bs.umfa, if the memory mapped file is not found, it will be automatically generated from the fasta file) s statistical analysis a annotation
Output
user@host:~$ vmatch gatk.vcf samtools.vcf -w 10 -d
Description
Outputs 2 files match.txt : gives the matched pairs 1)id1 2)id2 3)match type 4)extended no of bases 5)normalized match.log : Details of the extension and normalization process for all compared pairs vmatch matches the variants in 2 VCF files by choosing the best match for every possible variant pair. The percentage of matches is given at 3 levels for each variant total of both VCF files.
Download
For arf 0.557215, we provide binaries for linux machines.
You will also need a copy of the memory mapped file: human.g1k.v37-bs.umfa. Please gunzip it before usage. Note that to use it, please refer to the file as human.g1k.v37.fa, it will be automatically renamed as human.g1k.v37-bs.umfa by arf.
This page is maintained by Adrian.