Changes

From Genome Analysis Wiki
Jump to navigationJump to search
528 bytes added ,  18:01, 19 May 2015
Line 1: Line 1:  
== Introduction  ==
 
== Introduction  ==
   −
We will illustrate how TrioCaller works in sequence data including trios and unrelated samples. We will start from the scratch and walk through all necessary steps from raw sequence data to called genotypes. If you are new to sequence data, please be patient to go through every step. If you are experienced, you may directly jump to the section of [http://genome.sph.umich.edu/wiki/TrioCaller#Genotype_Refinement_Using_Linkage_Disequilibrium_Information_.28TrioCaller.29 TrioCaller].  
+
We will illustrate how TrioCaller works in sequence data including trios and unrelated samples. We will walk through all necessary steps to move from raw sequence data to called genotypes.  
 +
If you are new to sequence data, please review every step. If you are experienced, you may directly jump to [http://genome.sph.umich.edu/wiki/TrioCaller#Genotype_Refinement_Using_Linkage_Disequilibrium_Information_.28TrioCaller.29 TrioCaller] specific section.  
   −
We will start with a set of short sequence reads and associated base quality scores (stored in a fastq file), find the most likely genomic location for each read (producing a BAM file), generate an initial list of polymorphic sites and genotypes (stored in a VCF file) and use haplotype information to refine these genotypes (resulting in an updated VCF file).  
+
We will start with a set of short sequence reads and associated base quality scores (stored in a fastq file), find the most likely genomic location for each read (producing a BAM file), generate an initial list of polymorphic sites and genotypes (stored in a VCF file) and use haplotype information to refine these genotypes (resulting in an updated VCF file).
   −
== '''Note:''' if you are interesting in detecting '''de novo mutations''', or are working on '''a small number of families''' with '''high coverage data''' (e.g. exome sequencing), please first try our sister program [http://genome.sph.umich.edu/wiki/Polymutt Polymutt] . ==
+
=== Note ===
 +
 
 +
If you are interested in ''de novo'' mutations or are working on one or two families with deep sequence data (>30X), you should first consider our sister program, [http://genome.sph.umich.edu/wiki/Polymutt Polymutt], which ignores linkage disequilibrium information but can handle more complex pedigrees.
    
=== Download  ===
 
=== Download  ===
Line 11: Line 14:  
Before downloading the program, we appreciate if you could email [mailto:weichen.mich@gmail.com weichen.mich@gmail.com] (Subject: TrioCaller, with/without a little descriptive information (e.g. Affiliation, depth, the number of samples and family structure). We will notify you if there is any update. 
 
Before downloading the program, we appreciate if you could email [mailto:weichen.mich@gmail.com weichen.mich@gmail.com] (Subject: TrioCaller, with/without a little descriptive information (e.g. Affiliation, depth, the number of samples and family structure). We will notify you if there is any update. 
   −
<br> Binary file only: [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.binary.tgz TrioCaller.06262012.binary.tgz].  
+
'''A recent extension of TrioCaller: [http://genome.sph.umich.edu/wiki/FamLDCaller FamLDCaller] is coming soon with major updates (better processing function,  handling general families and reference panels). Please try the beta version below. Contact weichen.mich@gmail.com for any questions.''' 
 +
 
 +
[[Binary file:]]  [http://www.pitt.edu/~wec47/Files/FamLDCaller FamLDCaller].  [Last update: 08/15/2014]
 +
 
 +
 
 +
'''TrioCaller''' : the version we used in the paper.
 +
<br>  
 +
[[Binary file only:]] [http://csg.sph.umich.edu/weich/TrioCaller.06262012.binary.tgz TrioCaller.06262012.binary.tgz].  
   −
Binary file with example datasets&nbsp;: [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.tgz TrioCaller.06262012.tgz].  
+
[[Binary file with example datasets&nbsp;:]] [http://csg.sph.umich.edu/weich/TrioCaller.06262012.tgz TrioCaller.06262012.tgz].  
    
[http://genome.sph.umich.edu/wiki/TrioCaller:Archive Archive].  
 
[http://genome.sph.umich.edu/wiki/TrioCaller:Archive Archive].  
 +
 +
'''
 +
 +
== An example from sequence data to genotypes ==
 +
'''
    
The example dataset demonstrated here is also included. Our dataset consists of 40 individuals, including 10 parent-offspring trios and 10 unrelated individuals. The average sequence depth is ~3x. README.txt describes the structure of the package. Pipeline.csh (C shell) and pipeline.bash (bash shell) are two scripts for you to run all commands listed here in batch.  
 
The example dataset demonstrated here is also included. Our dataset consists of 40 individuals, including 10 parent-offspring trios and 10 unrelated individuals. The average sequence depth is ~3x. README.txt describes the structure of the package. Pipeline.csh (C shell) and pipeline.bash (bash shell) are two scripts for you to run all commands listed here in batch.  
Line 137: Line 152:     
    
 
    
   bin/samtools mpileup -Iuf ref/human_g1k_v37_chr20.fa bams/SAMPLE*bam | bcftools view -bvcg - > result/chr20.mpileup.bcf
+
   bin/samtools mpileup -Iuf ref/human_g1k_v37_chr20.fa bams/SAMPLE*bam | bin/bcftools view -bvcg - > result/chr20.mpileup.bcf
    
   bin/bcftools view result/chr20.mpileup.bcf  > result/chr20.mpileup.vcf
 
   bin/bcftools view result/chr20.mpileup.bcf  > result/chr20.mpileup.vcf
533

edits

Navigation menu