Changes

TrioCaller (view source)

Revision as of 18:43, 17 February 2013

15 bytes added , 18:43, 17 February 2013

→‎Introduction

Line 1: Line 1: −

== Introduction ==

+

== Introduction ==

−

We will illustrate how TrioCaller works in sequence data including trios and unrelated samples. We will start from the scratch and walk through all necessary steps

+

We will illustrate how TrioCaller works in sequence data including trios and unrelated samples. We will start from the scratch and walk through all necessary steps from raw sequence data to called genotypes. If you are new to sequence data, please be patient to go through every step. If you are experienced, you may directly jump to the section of [http://genome.sph.umich.edu/wiki/TrioCaller#Genotype_Refinement_Using_Linkage_Disequilibrium_Information_.28TrioCaller.29 TrioCaller].

−

from raw sequence data to called genotypes. If you are new to sequence data, please be patient to go through every step. If you are experienced, you may directly jump to the section of

−

[http://genome.sph.umich.edu/wiki/TrioCaller#Genotype_Refinement_Using_Linkage_Disequilibrium_Information_.28TrioCaller.29 TrioCaller].

−

We will start with a set of short sequence reads and associated base quality scores (stored in a fastq file), find the most likely genomic location for each read (producing a BAM file), generate an initial list of polymorphic sites and genotypes (stored in a VCF file) and use haplotype information to refine these genotypes (resulting in an updated VCF file).

+

We will start with a set of short sequence reads and associated base quality scores (stored in a fastq file), find the most likely genomic location for each read (producing a BAM file), generate an initial list of polymorphic sites and genotypes (stored in a VCF file) and use haplotype information to refine these genotypes (resulting in an updated VCF file).

−

'''Note:''' if you are interesting in detecting '''de novo mutations''', or are working on '''a small number of families''' with '''high coverage data''' (e.g. exome sequencing),

+

== '''Note:''' if you are interesting in detecting '''de novo mutations''', or are working on '''a small number of families''' with '''high coverage data''' (e.g. exome sequencing), please first try the other program [http://genome.sph.umich.edu/wiki/Polymutt Polymutt] we developed. ==

−

please first try the other program [http://genome.sph.umich.edu/wiki/Polymutt Polymutt] we developed.

=== Download ===

−

Before downloading the program, we appreciate if you could email [mailto:weichen.mich@gmail.com weichen.mich@gmail.com] with a little descriptive information (e.g. Affiliation, depth, the number of samples and family structure).

+

Before downloading the program, we appreciate if you could email [mailto:weichen.mich@gmail.com weichen.mich@gmail.com] with a little descriptive information (e.g. Affiliation, depth, the number of samples and family structure).

−

+

<br> Binary file only: [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.binary.tgz TrioCaller.06262012.binary.tgz].

−

Binary file only: [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.binary.tgz TrioCaller.06262012.binary.tgz].

−

Binary file with example datasets : [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.tgz TrioCaller.06262012.tgz].

+

Binary file with example datasets : [http://www.sph.umich.edu/csg/weich/TrioCaller.06262012.tgz TrioCaller.06262012.tgz].

[http://genome.sph.umich.edu/wiki/TrioCaller:Archive Archive].

−

The example dataset demonstrated here is also included. Our dataset consists of 40 individuals, including 10 parent-offspring trios and 10 unrelated individuals.

+

The example dataset demonstrated here is also included. Our dataset consists of 40 individuals, including 10 parent-offspring trios and 10 unrelated individuals. The average sequence depth is ~3x. README.txt describes the structure of the package. Pipeline.csh (C shell) and pipeline.bash (bash shell) are two scripts for you to run all commands listed here in batch.

−

The average sequence depth is ~3x. README.txt describes the structure of the package. Pipeline.csh (C shell) and pipeline.bash (bash shell) are two scripts for you to run all commands listed here in batch.

−

To conserve time and disk-space, our analysis will focus on a small region on chromosome 20 around position 2,000,000. We will first map reads for a single individual (labeled SAMPLE1). Then we combine the results with mapped reads from all individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

+

To conserve time and disk-space, our analysis will focus on a small region on chromosome 20 around position 2,000,000. We will first map reads for a single individual (labeled SAMPLE1). Then we combine the results with mapped reads from all individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

−

=== Required Software ===

+

=== Required Software ===

In addition to TrioCaller, you will need BWA ([http://bio-bwa.sourceforge.net available from Sourceforge]) and samtools ([http://samtools.sourceforge.net also from Sourceforge]) installed to run this exercise. The examples are tested in in bwa 0.6.1, samtools 0.1.18, TrioCaller 0.1.1; we expect newer versions should also work. We assume all executables are in your path.

Weich

533

edits

Changes

TrioCaller (view source)

Revision as of 18:43, 17 February 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools