FamLDCaller

From Genome Analysis Wiki
Revision as of 13:43, 4 December 2013 by Weich (talk | contribs)
Jump to navigationJump to search

Download here. Version.

Major updates.

1. More flexible loading functions for VCF files.

2.

3.

4.


The initial set of genotype calls is generated examining a single individual at a time. These calls are typically quite good for deep sequencing data, but much less accurate for low pass sequence data. In either case, they can be greatly improved by models that combine information across sites and individuals and consider the contraints imposed by parent-offspring trios.

Note: The current version only supports SNP data, so please filter indels before running TrioCaller. It supports VCF 4.0 and 4.1 formats with the exception of dropped missing trailing fields (e.g. use complete missing notation ./.:.:.:.,.,. rather than ./. for the genotype field)

Here is a summary of the TrioCaller command line options (these are also listed whenever you run the program with no parameters):

Available Options
   Shotgun Sequences: --vcf [], --pedfile [] 
      Markov Sampler: --seed [], --burnin [], --rounds [] 
          Haplotyper: --states [],  --errorRate [], --compact
             Phasing: --randomPhase , --inputPhased, --refPhased
        Output Files: --prefix [], --phase,  --interimInterval []


Explanation of Options
                 --vcf:	   Standard VCF file (4.0 and above).        
             --pedfile:    Pedigree file in MERLIN format.
                --seed:    Seed for sampling, default 123456.
              --burnin:    The number of rounds ignored at the beginning of sampling.
              --rounds:    The total number of iterations.
              --states:    The number of haplotyes used in the state space. The default is the maximum number:  2*(number of founders -1).
           --errorRate:    The pre-defined base error rate. Default 0.01.
         --randomPhase:    The initial haplotypes are inferred from the single marker. Default option.
         --inputPhased:	   The initial haplotypes are directly from input VCF file (with "|" as separator in the genotype column).
           --refPhased:	   The initial haplotypes are built on reference alleles from VCF file.
              --prefix:    The prefix of output file   
     --interimInterval:    The number of rounds for interim outputs

Note: The pedigree files require complete trio structures (all three members of the trio exist in the file). For parent-offspring pair, create a "fake" parent in the pedigree file or code them as unrelated individuals. The order of the names in the pedigree file is NOT necessary to be consistent with that in .vcf file. The computation will be intensive if the number of samples are large. You can use option "--states" to reduce the computation cost (e.g. start with "--states 50")

To complete our example analysis, we could run:

 bin/TrioCaller --vcf result/chr20.mpileup.vcf --pedfile ped/triocaller.ped --states 50 --rounds 10 --prefix result/chr20.triocaller