Changes

TrioCaller (view source)

Revision as of 17:28, 27 January 2012

66 bytes removed , 17:28, 27 January 2012

no edit summary

Line 6: Line 6:

Test

−

~~In this workshop, we~~ will illustrate ~~some of~~ the ~~essential~~ steps ~~in the analysis of next generation~~ sequence data. ~~As part of the process,~~ you ~~will learn about many of the file formats commonly used~~ to ~~store next generation~~ sequence data.

+

We will illustrate how TrioCaller works in sequence data including trios and unrelated samples. We will start from the scratch and walk through all necessary steps

+

from raw sequence data to called genotypes. If you are new to sequence data, please be patient to go through every step. If you are experienced, you may jump to the section of TrioCaller.

We will start with a set of short sequence reads and associated base quality scores (stored in a fastq file), find the most likely genomic location for each read (producing a BAM file), generate an initial list of polymorphic sites and genotypes (stored in a VCF file) and use haplotype information to refine these genotypes (resulting in an updated VCF file).

Line 12: Line 13:

== Example Dataset ==

−

Our dataset consists of 31 individuals ~~from Tuscany (in Italy) sequenced by the [http://www.1000genomes.org 1000 Genomes Project]. As with other 1000 Genomes Project samples~~, ~~these individuals~~ have been sequenced to an average depth of about 4x.

+

Our dataset consists of 40 individuals, which have been sequenced at an average depth of about 4x.

−

To conserve time and disk-space, our analysis will focus on a small region ~~surrounding the HNF4A gene~~ on chromosome 20. We will first map reads for a single individual (labeled NA20589), combine the results with mapped reads from the other 30 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

+

To conserve time and disk-space, our analysis will focus on a small region on chromosome 20. We will first map reads for a single individual (labeled NA20589), combine the results with mapped reads from the other 30 individuals to generate a list of polymorphic sites and estimate accurate genotypes at each of these sites.

−

The example dataset we'll be using is included in this tar-ball [http://www.sph.umich.edu/csg/abecasis/downloads/~~lowPassWorkshop~~-2012-01-23.tar.gz ~~lowPassWorkshop~~-2012-01-23.tar.gz].

+

The example dataset we'll be using is included in this tar-ball [http://www.sph.umich.edu/csg/abecasis/downloads/TrioCaller-2012-01-28.tar.gz) [TrioCaller-2012-01-28.tar.gz].

=== Required Software ===

Weich

533

edits

Changes

TrioCaller (view source)

Revision as of 17:28, 27 January 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools