Changes

From Genome Analysis Wiki
Jump to navigationJump to search
312 bytes added ,  10:05, 18 August 2014
no edit summary
Line 1: Line 1:    −
Download here. Version.  
+
A general guideline for genotyping calling in families.
 +
 
 +
Polymutt:  Small to big pedigrees, modest to high depth
 +
FamLDCaller:  many small pedigrees, low to modest depths.
 +
 
 +
FamLDCaller is an extension of [http://genome.sph.umich.edu/wiki/TrioCaller TrioCaller] to handle nuclear and general family structure.
 +
 
 +
'''Download''':
 +
[[Binary file:]]  [http://www.pitt.edu/~wec47/Files/FamLDCaller FamLDCaller].  [Last update: 08/15/2014]
 +
 
 +
More details will come soon. Please contact Wei Chen at weichen.mich@gmail.com for any questions.  
 +
 
 +
 
    
Major updates.  
 
Major updates.  
   −
1. More flexible loading functions for VCF files.
+
1. Update the algorithm to allow nuclear and multi-generational pedigrees
 
  −
2.
     −
3.
+
2. Add a feature to use reference panel
   −
4.
+
3. More flexible loading functions for VCF files (no need to remove non-SNP variant)
       
The initial set of genotype calls is generated examining a single individual at a time. These calls are typically quite good for deep sequencing data, but much less accurate for low pass sequence data. In either case, they can be greatly improved by models that combine information across sites and individuals and consider the contraints imposed by parent-offspring trios.  
 
The initial set of genotype calls is generated examining a single individual at a time. These calls are typically quite good for deep sequencing data, but much less accurate for low pass sequence data. In either case, they can be greatly improved by models that combine information across sites and individuals and consider the contraints imposed by parent-offspring trios.  
   −
Note: The current version only supports SNP data, so please '''filter indels''' before running TrioCaller. It supports VCF 4.0 and 4.1 formats with the '''exception of dropped missing trailing fields''' (e.g. use complete missing notation ./.:.:.:.,.,. rather than ./. for the genotype field)
+
Here is a summary of the FamLDCaller command line options (these are also listed whenever you run the program with no parameters):
 
  −
Here is a summary of the TrioCaller command line options (these are also listed whenever you run the program with no parameters):
      
<source lang="text">
 
<source lang="text">
Line 23: Line 31:  
   Shotgun Sequences: --vcf [], --pedfile []  
 
   Shotgun Sequences: --vcf [], --pedfile []  
 
       Markov Sampler: --seed [], --burnin [], --rounds []  
 
       Markov Sampler: --seed [], --burnin [], --rounds []  
           Haplotyper: --states [],  --errorRate [], --compact
+
           Haplotyper: --states [],  --errorRate []
 
             Phasing: --randomPhase , --inputPhased, --refPhased
 
             Phasing: --randomPhase , --inputPhased, --refPhased
 
         Output Files: --prefix [], --phase,  --interimInterval []
 
         Output Files: --prefix [], --phase,  --interimInterval []
Line 43: Line 51:  
</source>
 
</source>
   −
Note: The pedigree files require complete trio structures (all three members of the trio exist in the file). For parent-offspring pair, create a "fake" parent in the pedigree file or code them as unrelated individuals. The order of the names in the pedigree file is NOT necessary to be consistent with that in .vcf file. The computation will be intensive if the number of samples are large.  
+
Note: The pedigree files require complete family structures (both parents must exist in the pedigree file, e.g. for parent-offspring pair, create a "fake" parent in the pedigree file or code them as unrelated individuals). The order of the names in the pedigree file is NOT necessary to be consistent with that in .vcf file. The computation will be intensive if the number of samples are large.  
 
You can use option "--states" to reduce the computation cost (e.g. start with "--states 50")  
 
You can use option "--states" to reduce the computation cost (e.g. start with "--states 50")  
    
To complete our example analysis, we could run:
 
To complete our example analysis, we could run:
 
    
 
    
   bin/TrioCaller --vcf result/chr20.mpileup.vcf --pedfile ped/triocaller.ped --states 50 --rounds 10 --prefix result/chr20.triocaller
+
   FamLDCaller --vcf test.vcf --pedfile test.ped --states 50 --rounds 10 --prefix test.famldcaller
533

edits

Navigation menu