From Genome Analysis Wiki
Jump to: navigation, search

A general guideline for genotyping calling in families.

Polymutt: Small to big pedigrees, modest to high depth FamLDCaller: many small pedigrees, low to modest depths.

FamLDCaller is an extension of TrioCaller to handle nuclear and general family structure.

Download: Binary file: FamLDCaller. [Last update: 08/15/2014]

More details will come soon. Please contact Wei Chen at weichen.mich@gmail.com for any questions.

Major updates.

1. Update the algorithm to allow nuclear and multi-generational pedigrees

2. Add a feature to use reference panel

3. More flexible loading functions for VCF files (no need to remove non-SNP variant)

The initial set of genotype calls is generated examining a single individual at a time. These calls are typically quite good for deep sequencing data, but much less accurate for low pass sequence data. In either case, they can be greatly improved by models that combine information across sites and individuals and consider the contraints imposed by parent-offspring trios.

Here is a summary of the FamLDCaller command line options (these are also listed whenever you run the program with no parameters):

Available Options
   Shotgun Sequences: --vcf [], --pedfile [] 
      Markov Sampler: --seed [], --burnin [], --rounds [] 
          Haplotyper: --states [],  --errorRate []
             Phasing: --randomPhase , --inputPhased, --refPhased
        Output Files: --prefix [], --phase,  --interimInterval []

Explanation of Options
                 --vcf:	   Standard VCF file (4.0 and above).        
             --pedfile:    Pedigree file in MERLIN format.
                --seed:    Seed for sampling, default 123456.
              --burnin:    The number of rounds ignored at the beginning of sampling.
              --rounds:    The total number of iterations.
              --states:    The number of haplotyes used in the state space. The default is the maximum number:  2*(number of founders -1).
           --errorRate:    The pre-defined base error rate. Default 0.01.
         --randomPhase:    The initial haplotypes are inferred from the single marker. Default option.
         --inputPhased:	   The initial haplotypes are directly from input VCF file (with "|" as separator in the genotype column).
           --refPhased:	   The initial haplotypes are built on reference alleles from VCF file.
              --prefix:    The prefix of output file   
     --interimInterval:    The number of rounds for interim outputs

Note: The pedigree files require complete family structures (both parents must exist in the pedigree file, e.g. for parent-offspring pair, create a "fake" parent in the pedigree file or code them as unrelated individuals). The order of the names in the pedigree file is NOT necessary to be consistent with that in .vcf file. The computation will be intensive if the number of samples are large. You can use option "--states" to reduce the computation cost (e.g. start with "--states 50")

To complete our example analysis, we could run:

 FamLDCaller --vcf test.vcf --pedfile test.ped --states 50 --rounds 10 --prefix test.famldcaller