Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,743 bytes added ,  18:38, 2 February 2012
Line 6: Line 6:  
   arf [options] <vcf-file>
 
   arf [options] <vcf-file>
   −
Here is an example of how <code>arf</code> works:
+
Here are examples of how <code>arf</code> works:
 +
 
 +
  #-c option directs the output to STDOUT
 +
  arf -a complexity 1000g.vcf -g genome.fa -l 30 -c
 +
 
 +
  #-o option specifies an output file name
 +
  arf -a complexity 1000g.vcf -g genome.fa -l 30 -o paltum.vcf
 +
 
 +
  #input VCF file can be gzipped
 +
  arf -a complexity 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf
 +
 
 +
  #multiple analyses/annotations at once is possible
 +
  arf -a complexity,f,hwe,exons 1000g.vcf.gz -g genome.fa -l 30 -o paltum.vcf -f refGene.txt.gz
 +
 
 +
  #estimates allele and genotype frequencies from genotype likelihoods.
 +
  #AF - Allele frequency estimates of alternate alleles (EM)
 +
  #HWEAF - Allele frequency estimates of alternate alleles under the assumption of HWE equilibrium (EM)
 +
  #GF - Genotype frequency estimates (EM)
 +
  arf -a freq 1000g.vcf
    
   #conducts HWE LRT test from genotype likelihoods (multiallelic)
 
   #conducts HWE LRT test from genotype likelihoods (multiallelic)
 
   #adds the info tags
 
   #adds the info tags
 
   #HWP - HWE P-value
 
   #HWP - HWE P-value
   #HWCHISQ - HWE Chisquare value
+
   #HWCHISQ - HWE Chi-square value
   #HWDOF - Degrees of Freedom for test
+
   #HWDOF - Degrees of freedom for test
   #AF - Allele frequency estimates of alternate alleles (EM)
+
   #will generate frequency tags.
   #HWEAF - Allele frequency estimates of alternate alleles under the assumption of HWE equilibrium (EM)
+
  arf -a hwe 1000g.vcf
   #GF - Genotype frequency estimates (EM)
+
 
   arf -s hwe 1000g.vcf  
+
  #estimates Inbreeding coefficient F from genotype likelihood
 +
  #adds the info tag
 +
   #F - Inbreeding coefficient
 +
  arf -a f 1000g.vcf
 +
 
 +
  #you can also do both analysis at the same time
 +
   #performs both HWE test and estimates F
 +
   arf -a hwe,f 1000g.vcf
   −
   #estimates Inbreeding Coefficient F from genotype likelihood
+
   #annotates exonic regions
 
   #adds the info tag
 
   #adds the info tag
   #F - Inbreeding Coefficient
+
   #EXON - flag 
   arf -s f 1000g.vcf
+
   arf -a exons 1000g.vcf -f refGene.txt
   −
   # performs both HWE test and estimates F
+
   #reference file can be gzipped up
   arf -s hwe,f 1000g.vcf
+
   arf -a exons 1000g.vcf -f refGene.txt.gz
+
 
   # annotates exonic regions
+
   #computes extracts flanking sequence around a variant
 
   #adds the info tag
 
   #adds the info tag
   #EXON - flag
+
   #FLANKS - 5' sequence, reference allele, 3' sequence up to length n defined by option -l, default is 25
   arf -a exon 1000g.vcf
+
   arf -a flanks 1000g.vcf -g genome.fa -l 30
    
   #computes a complexity measure for flanking sequences around a variant
 
   #computes a complexity measure for flanking sequences around a variant
 
   #adds the info tag
 
   #adds the info tag
   #C - complexity measure
+
   #CPXY - complexity measure for flanks of length l defined by option -l, default is 25
   arf -a c 1000g.vcf -g genome.fa
+
   arf -a complexity 1000g.vcf -g genome.fa -l 30
 +
 
 +
== In development/Pending update ==
   −
== Command Line Options ==
+
  #annotates variants
 +
  ##INFO=<ID=VTYPE,Number=1,Type=string,Description="Annotates variant by types SNP, MNP, INDEL, SV, CR">
 +
  #-l option defines the length in which to differentiate INDELs and SVs
 +
  arf -a vartype 1000g.vcf -l 30
   −
    vcf-file    VCF file (can be gzipped or bgzipped)
+
  #compute [[Genotype Likelihood Based Allele Balance]]
    g              genome-file (Memory Mapped Sequence file)
+
  ##INFO=<ID=AB,Number=1,Type=float,Description="Allele Balance computed from genotype likelihoods">
                  (note that if genome.fa is specified, the actual file looked for is genome-bs.umfa)
+
  #requires PL/GL and DP in the genotype fields
    s              statistical analysis
+
  arf -a ab 1000g.vcf
    a              annotation
     −
Here is an example of how arf works:
+
  #-e option
 +
  #when used in conjunction with an analysis that requires allele or genotype frequency estimates,
 +
  #will attempt to  find estimates in the AF, GF and HWEAF fields
 +
  arf -a ab 1000g.vcf -e
   −
     #computes HWE and F statistics from genotype likelihoods
+
== Command Line Options ==
     arf -s hwe,f 1kg.vcf
+
 
 +
     vcf-file      VCF file (can be gzipped or bgzipped)
 +
    h              help page
 +
     g              genome-file (fasta file)
 +
                  (note that if genome.fa is specified, the actual file looked
 +
                    for is genome-bs.umfa, if the memory mapped file is not
 +
                    found, it will be automatically generated from the fasta file)
 +
    l              length of flanking sequence (default is 25)
 +
    a              analysis/annotation
 +
    o              output file name (default is arf.vcf)
 +
    f              annotation file name (can be gzipped)
    
==  Output ==
 
==  Output ==
      −
     user@host:~$ vmatch gatk.vcf samtools.vcf -w 10 -d 
+
     An output file is generated with the name arf.vcf
      
+
    The file name can be specified with the -o option.
 +
     Log files are generated in arf.log
    
== Description ==
 
== Description ==
   −
     Outputs 2 files
+
     Basically deals with VCF files, generate additional info tags in an output VCF file.
      match.txt : gives the matched pairs
  −
                  1)id1
  −
                  2)id2
  −
                  3)match type
  −
                  4)extended no of bases
  −
                  5)normalized
  −
      match.log : Details of the extension and normalization process for all compared pairs
  −
    vmatch matches the variants in 2 VCF files by choosing the best match for every
  −
    possible variant pair.  The percentage of matches is given at 3 levels for each
  −
    variant total of both VCF files.
     −
 
+
== Download ==
   −
== Download ==
+
For arf 0.557215, we provide binaries for linux machines [http://www-personal.umich.edu/~atks/arf arf 0.557215].
 +
     
 +
You will also need a copy of human genome assembly fasta file: [http://www-personal.umich.edu/~atks/human.g1k.v37.fa.gz human.g1k.v37.fa].  Please gunzip it before usage.  arf will generate a memory mapped file from the fasta file named human.g1k.v37-bs.umfa.
   −
For arf 0.557215, we provide binaries for linux machines.
+
You will also need a copy of UCSC refGene text file: [http://www-personal.umich.edu/~atks/refGene.txt.gz refGene.txt].  
   −
You will also need a copy of the memory mapped file: [http://www-personal.umich.edu/~atks/human.g1k.v37-bs.umfa.gz human.g1k.v37-bs.umfa].  Please gunzip it before usage. Note that to use it, please refer to the file as human.g1k.v37.fa, it will be automatically renamed as human.g1k.v37-bs.umfa by arf.
      
This page is maintained by  [mailto:atks@umich.edu Adrian].
 
This page is maintained by  [mailto:atks@umich.edu Adrian].
1,102

edits

Navigation menu