Generic Exome Analysis Plan
This page outlines a generic plan for analysis of a whole exome sequencing project. The idea is that the points listed here might serve as a starting point for discussion of the analyses needed in a specific project.
Read Mapping and Variant Calling
The first step in any analysis is to map sequence reads, callibrate base qualities, and call variants. Even at this stage, some simple quality metrics can be evaluated and will help identify potentially problematic samples.
Prior to Mapping
- Evaluate Base Composition Along Reads
- Calculate the proportion of A, C, G, T bases along each read. Flag runs with evidence of unusual patterns of base composition compared to the target genome.
- Evaluate Machine Quality Scores Along Reads
- Calculate average quality scores per position. Flag runs with evidence of unusual quality score distributions.
- Calculate Number of Reads
- Calculate the input number of reads and number of bases for each sequenced sample
- Map Reads with Appropriate Read Mapper
- Currently, [bio-bwa.sourceforge.net/bwa.shtm BWA] is a convenient, widely used read mapper.
- Basic Mapping Statistics
- We should tally the overall proportion of mapped reads.
- We should also tally the proportion of reads that map
- Inside the target regions
- Near the target regions
- Elsewhere in the genome
- Recalibrate Base Quality Scores
- Base quality scores can be updated by comparing sites that are unlikely to vary (such as those not currently reported as variants in dbSNP or in the most recent 1000 Genome Project analyses.
- Update Base Quality Score Metrics
- Generate new curves with base quality scores per position.
- Calculate the number of mapped bases that reach at least Q20. Potentially, calculate Q20 equivalent bases by summing the quality scores for bases with base quality >Q20 and dividing the total by 20.