Line 6: |
Line 6: |
| arf [options] <vcf-file> | | arf [options] <vcf-file> |
| | | |
− | Here is an example of how <code>arf</code> works: | + | Here are examples of how <code>arf</code> works: |
| + | |
| + | #-c option directs the output to STDOUT |
| + | arf -a complexity 1000g.vcf -g genome.fa -l 30 -c |
| + | |
| + | #-o option specifies an output file name |
| + | arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf |
| + | |
| + | #input VCF file can be gzipped |
| + | arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf |
| + | |
| + | #multiple analyses/annotations at once is possible |
| + | arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz |
| | | |
| #estimates allele and genotype frequencies from genotype likelihoods. | | #estimates allele and genotype frequencies from genotype likelihoods. |
Line 17: |
Line 29: |
| #adds the info tags | | #adds the info tags |
| #HWP - HWE P-value | | #HWP - HWE P-value |
− | #HWCHISQ - HWE Chisquare value | + | #HWCHISQ - HWE Chi-square value |
− | #HWDOF - Degrees of Freedom for test | + | #HWDOF - Degrees of freedom for test |
| #will generate frequency tags. | | #will generate frequency tags. |
| arf -a hwe 1000g.vcf | | arf -a hwe 1000g.vcf |
| | | |
− | #conducts HWE LRT test from genotype likelihoods (multiallelic)
| + | #estimates Inbreeding coefficient F from genotype likelihood |
− | #will attempt to use existing allele frequency estimates in the info
| |
− | arf -a hwe 1000g.vcf -e
| |
− | | |
− | #estimates Inbreeding Coefficient F from genotype likelihood | |
| #adds the info tag | | #adds the info tag |
− | #F - Inbreeding Coefficient | + | #F - Inbreeding coefficient |
| arf -a f 1000g.vcf | | arf -a f 1000g.vcf |
| | | |
Line 34: |
Line 42: |
| #performs both HWE test and estimates F | | #performs both HWE test and estimates F |
| arf -a hwe,f 1000g.vcf | | arf -a hwe,f 1000g.vcf |
− |
| + | |
| #annotates exonic regions | | #annotates exonic regions |
| #adds the info tag | | #adds the info tag |
− | #EXON - flag | + | #EXON - flag |
− | arf -a exon 1000g.vcf | + | arf -a exons 1000g.vcf -f refGene.txt |
| + | |
| + | #reference file can be gzipped up |
| + | arf -a exons 1000g.vcf -f refGene.txt.gz |
| | | |
| #computes extracts flanking sequence around a variant | | #computes extracts flanking sequence around a variant |
Line 47: |
Line 58: |
| #computes a complexity measure for flanking sequences around a variant | | #computes a complexity measure for flanking sequences around a variant |
| #adds the info tag | | #adds the info tag |
− | #FLANKS - 5' sequence, reference allele, 3' sequence up to length n defined by option -l, default is 25 | + | #CPXY - complexity measure for flanks of length l defined by option -l, default is 25 |
− | #CPXY - complexity measure | + | arf -a complexity 1000g.vcf -g genome.fa -l 30 |
− | arf -a c 1000g.vcf -g genome.fa | + | |
| + | == In development/Pending update == |
| + | |
| + | #annotates variants |
| + | ##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR"> |
| + | #-l option defines the length in which to differentiate INDELs and SVs |
| + | arf -a vartype 1000g.vcf -l 30 |
| + | |
| + | #compute [[Genotype Likelihood Based Allele Balance]] |
| + | ##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods"> |
| + | #requires PL/GL and DP in the genotype fields |
| + | arf -a ab 1000g.vcf |
| + | |
| + | #-e option |
| + | #when used in conjunction with an analysis that requires allele or genotype frequency estimates, |
| + | #will attempt to find estimates in the AF, GF and HWEAF fields |
| + | arf -a ab 1000g.vcf -e |
| | | |
| == Command Line Options == | | == Command Line Options == |
| | | |
− | vcf-file VCF file (can be gzipped or bgzipped) | + | vcf-file VCF file (can be gzipped or bgzipped) |
| + | h help page |
| g genome-file (fasta file) | | g genome-file (fasta file) |
| (note that if genome.fa is specified, the actual file looked | | (note that if genome.fa is specified, the actual file looked |
| for is genome-bs.umfa, if the memory mapped file is not | | for is genome-bs.umfa, if the memory mapped file is not |
| found, it will be automatically generated from the fasta file) | | found, it will be automatically generated from the fasta file) |
| + | l length of flanking sequence (default is 25) |
| a analysis/annotation | | a analysis/annotation |
− | o output file name | + | o output file name (default is arf.vcf) |
| + | f annotation file name (can be gzipped) |
| | | |
| == Output == | | == Output == |
Line 71: |
Line 101: |
| | | |
| Basically deals with VCF files, generate additional info tags in an output VCF file. | | Basically deals with VCF files, generate additional info tags in an output VCF file. |
− | Deals with hard calls as well as genotype likelihoods.
| |
| | | |
| == Download == | | == Download == |
| | | |
− | For arf 0.557215, we provide binaries for linux machines. | + | For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215]. |
| + | |
| + | You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa. |
| + | |
| + | You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt]. |
| | | |
− | You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.
| |
| | | |
| This page is maintained by [mailto:atks@umich.edu Adrian]. | | This page is maintained by [mailto:atks@umich.edu Adrian]. |