Line 6: |
Line 6: |
| arf [options] <vcf-file> | | arf [options] <vcf-file> |
| | | |
− | Here is an example of how <code>arf</code> works: | + | Here are examples of how <code>arf</code> works: |
| + | |
| + | #-c option directs the output to STDOUT |
| + | arf -a complexity 1000g.vcf -g genome.fa -l 30 -c |
| + | |
| + | #-o option specifies an output file name |
| + | arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf |
| + | |
| + | #input VCF file can be gzipped |
| + | arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf |
| + | |
| + | #multiple analyses/annotations at once is possible |
| + | arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz |
| | | |
| #estimates allele and genotype frequencies from genotype likelihoods. | | #estimates allele and genotype frequencies from genotype likelihoods. |
Line 35: |
Line 47: |
| #EXON - flag | | #EXON - flag |
| arf -a exons 1000g.vcf -f refGene.txt | | arf -a exons 1000g.vcf -f refGene.txt |
| + | |
| + | #reference file can be gzipped up |
| + | arf -a exons 1000g.vcf -f refGene.txt.gz |
| | | |
| #computes extracts flanking sequence around a variant | | #computes extracts flanking sequence around a variant |
Line 45: |
Line 60: |
| #CPXY - complexity measure for flanks of length l defined by option -l, default is 25 | | #CPXY - complexity measure for flanks of length l defined by option -l, default is 25 |
| arf -a complexity 1000g.vcf -g genome.fa -l 30 | | arf -a complexity 1000g.vcf -g genome.fa -l 30 |
| + | |
| + | == In development/Pending update == |
| + | |
| + | #annotates variants |
| + | ##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR"> |
| + | #-l option defines the length in which to differentiate INDELs and SVs |
| + | arf -a vartype 1000g.vcf -l 30 |
| + | |
| + | #compute [[Genotype Likelihood Based Allele Balance]] |
| + | ##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods"> |
| + | #requires PL/GL and DP in the genotype fields |
| + | arf -a ab 1000g.vcf |
| + | |
| + | #-e option |
| + | #when used in conjunction with an analysis that requires allele or genotype frequency estimates, |
| + | #will attempt to find estimates in the AF, GF and HWEAF fields |
| + | arf -a ab 1000g.vcf -e |
| | | |
| == Command Line Options == | | == Command Line Options == |
| | | |
− | vcf-file VCF file (can be gzipped or bgzipped) | + | vcf-file VCF file (can be gzipped or bgzipped) |
| + | h help page |
| g genome-file (fasta file) | | g genome-file (fasta file) |
| (note that if genome.fa is specified, the actual file looked | | (note that if genome.fa is specified, the actual file looked |
Line 71: |
Line 104: |
| == Download == | | == Download == |
| | | |
− | For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf]. | + | For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215]. |
| | | |
| You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa. | | You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa]. Please gunzip it before usage. arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa. |
| | | |
− | You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt.gz]. | + | You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt]. |
| | | |
| | | |
| This page is maintained by [mailto:atks@umich.edu Adrian]. | | This page is maintained by [mailto:atks@umich.edu Adrian]. |