Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,507 bytes added ,  18:38, 2 February 2012
Line 6: Line 6:  
   arf [options] <vcf-file>
 
   arf [options] <vcf-file>
   −
Here is an example of how <code>arf</code> works:
+
Here are examples of how <code>arf</code> works:
 +
 
 +
  #-c option directs the output to STDOUT
 +
  arf -a complexity 1000g.vcf -g genome.fa -l 30 -c
 +
 
 +
  #-o option specifies an output file name
 +
  arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf
 +
 
 +
  #input VCF file can be gzipped
 +
  arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf
 +
 
 +
  #multiple analyses/annotations at once is possible
 +
  arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz
    
   #estimates allele and genotype frequencies from genotype likelihoods.
 
   #estimates allele and genotype frequencies from genotype likelihoods.
Line 34: Line 46:  
   #adds the info tag
 
   #adds the info tag
 
   #EXON - flag   
 
   #EXON - flag   
   arf -a exon 1000g.vcf -f refGene.txt
+
   arf -a exons 1000g.vcf -f refGene.txt
 +
 
 +
  #reference file can be gzipped up
 +
  arf -a exons 1000g.vcf -f refGene.txt.gz
    
   #computes extracts flanking sequence around a variant
 
   #computes extracts flanking sequence around a variant
Line 45: Line 60:  
   #CPXY - complexity measure for flanks of length l defined by option -l, default is 25
 
   #CPXY - complexity measure for flanks of length l defined by option -l, default is 25
 
   arf -a complexity 1000g.vcf -g genome.fa -l 30
 
   arf -a complexity 1000g.vcf -g genome.fa -l 30
 +
 +
== In development/Pending update ==
 +
 +
  #annotates variants
 +
  ##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR">
 +
  #-l option defines the length in which to differentiate INDELs and SVs
 +
  arf -a vartype 1000g.vcf -l 30
 +
 +
  #compute [[Genotype Likelihood Based Allele Balance]]
 +
  ##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods">
 +
  #requires PL/GL and DP in the genotype fields
 +
  arf -a ab 1000g.vcf
 +
 +
  #-e option
 +
  #when used in conjunction with an analysis that requires allele or genotype frequency estimates,
 +
  #will attempt to  find estimates in the AF, GF and HWEAF fields
 +
  arf -a ab 1000g.vcf -e
    
== Command Line Options ==
 
== Command Line Options ==
   −
     vcf-file     VCF file (can be gzipped or bgzipped)
+
     vcf-file       VCF file (can be gzipped or bgzipped)
 +
    h              help page
 
     g              genome-file (fasta file)  
 
     g              genome-file (fasta file)  
 
                   (note that if genome.fa is specified, the actual file looked  
 
                   (note that if genome.fa is specified, the actual file looked  
 
                     for is genome-bs.umfa, if the memory mapped file is not  
 
                     for is genome-bs.umfa, if the memory mapped file is not  
 
                     found, it will be automatically generated from the fasta file)
 
                     found, it will be automatically generated from the fasta file)
     l              length of flanking sequence      
+
     l              length of flanking sequence (default is 25)
 
     a              analysis/annotation
 
     a              analysis/annotation
     o              output file name  
+
     o              output file name (default is arf.vcf)
 
     f              annotation file name (can be gzipped)
 
     f              annotation file name (can be gzipped)
   Line 71: Line 104:  
== Download ==
 
== Download ==
   −
For arf 0.557215, we provide binaries for linux machines.
+
For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215].
 +
     
 +
You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa].  Please gunzip it before usage.  arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.
 +
 
 +
You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt].  
   −
You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa].  Please gunzip it before usage.  arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.
      
This page is maintained by  [mailto:atks@umich.edu Adrian].
 
This page is maintained by  [mailto:atks@umich.edu Adrian].
1,102

edits

Navigation menu