Generic Exome Analysis Plan

From Genome Analysis Wiki
Revision as of 22:34, 20 April 2010 by Goncalo (talk | contribs) (Created page with 'This page outlines a generic plan for analysis of a whole exome sequencing project. The idea is that the points listed here might serve as a starting point for discussion of the …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page outlines a generic plan for analysis of a whole exome sequencing project. The idea is that the points listed here might serve as a starting point for discussion of the analyses needed in a specific project.

Read Mapping and Variant Calling

The first step in any analysis is to map sequence reads, callibrate base qualities, and call variants. Even at this stage, some simple quality metrics can be evaluated and will help identify potentially problematic samples.

Prior to Mapping

Evaluate Base Composition Along Reads
Calculate the proportion of A, C, G, T bases along each read. Flag runs with evidence of unusual patterns of base composition compared to the target genome.
Evaluate Machine Quality Scores Along Reads
Calculate average quality scores per position. Flag runs with evidence of unusual quality score distributions.
Calculate Number of Reads
Calculate the input number of reads and number of bases for each sequenced sample

Read Mapping

Map Reads with Appropriate Read Mapper
Currently, [bio-bwa.sourceforge.net/bwa.shtm BWA] is a convenient, widely used read mapper.
Basic Mapping Statistics
We should tally the overall proportion of mapped reads.
We should also tally the proportion of reads that map
  • Inside the target regions
  • Near the target regions
  • Elsewhere in the genome
Recalibrate Base Quality Scores
Base quality scores can be updated by comparing sites that are unlikely to vary (such as those not currently reported as variants in dbSNP or in the most recent 1000 Genome Project analyses.
Update Base Quality Score Metrics
Generate new curves with base quality scores per position.
Calculate the number of mapped bases that reach at least Q20. Potentially, calculate Q20 equivalent bases by summing the quality scores for bases with base quality >Q20 and dividing the total by 20.