Changes

From Genome Analysis Wiki
Jump to navigationJump to search
6,577 bytes added ,  07:42, 6 August 2013
Line 161: Line 161:  
== Non-synonymous Variants ==
 
== Non-synonymous Variants ==
   −
We tallied 1,107,05 nonsynonymous variants seen at least once across ~12,000 sequenced samples. Among the non-synonymous variants, the majority were seen only once (646,888) or twice (163,044), as expected. Of the remaining variants (297,119), a total of 260,054 were seen in at least 2 datasets and are considered as candidates for inclusion in exome SNP arrays. The transition transversion ratio of this class of variants was 2.0 in the full set of variants and 2.49 in the set of variants that were seen at least three times and in two or more studies.
+
We tallied 1,107,051 nonsynonymous variants seen at least once across ~12,000 sequenced samples. Among the non-synonymous variants, the majority were seen only once (646,888) or twice (163,044), as expected. Of the remaining variants (297,119), a total of 260,054 were seen in at least 2 datasets and are considered as candidates for inclusion in exome SNP arrays. The transition transversion ratio of this class of variants was 2.0 in the full set of variants and 2.49 in the set of variants that were seen at least three times and in two or more studies.
    
The set of variants selected for array design is estimated to include 97-98% of the nonsynonymous variants detected in average genome through exome sequencing.
 
The set of variants selected for array design is estimated to include 97-98% of the nonsynonymous variants detected in average genome through exome sequencing.
Line 233: Line 233:  
Paul De Bakker's HLA tag SNPs are listed here:
 
Paul De Bakker's HLA tag SNPs are listed here:
 
http://www.broadinstitute.org/~debakker/hla_tags_exome.txt
 
http://www.broadinstitute.org/~debakker/hla_tags_exome.txt
 +
 +
= Second Generation Arrays =
 +
 +
A second generation of exome arrays will be available in 2013, from both Illumina and Affymetrix. In addition to the original exome array content, each of these includes a grid of SNPs across the genome that facilitates analysis of common variants when a suitably large reference panel is available. These grids were selected both to ensure good coverage of the genome and to ensure that assays could be manufactured inexpensively, taking into account proprietary platform specific constraints.
 +
 +
To evaluate the accuracy of these grids, we have used data from the Go T2D project (led by David Altshuler, Mike Boehnke and Mark McCarthy). The dataset includes ~2,650 individuals that have been whole genome sequenced (depth ~4x) and whole exome sequenced (depth ~80-100x). We focused on chr20 and 600 samples from the UK and, for each of these in turn, tried to impute missing genotypes using the remaining sequenced individuals as a reference panel. The results show both the Affymetrix and Illumina arrays are expected to provide excellent coverage of the genome provided that a suitably large reference panel is available.
 +
 +
The specific numbers are that:
 +
 +
* For variants with MAF >1%, the average r<sup>2</sup> correlation between imputed and true genotypes will be 0.8879 (Affymetrix) and 0.8590 (Illumina).
 +
* For variants with MAF >5%, the average r<sup>2</sup> correlation between imputed and true genotypes will be 0.9458 (Affymetrix) and 0.9256 (Illumina).
 +
 +
The fraction of variants imputed with r<sup>2</sup> > 0.80 will be:
 +
 +
* For variants with MAF >1%, 82.1% (Affymetrix) and 76.5% (Illumina)
 +
* For variants with MAF >5%, 94.6% (Affymetrix) and 89.2% (Illumina)
 +
 +
The evaluation is based on imputation of ~600 UK samples that have been whole genome and whole exome sequenced (comparing imputed genotypes and the sequenced based calls) and using a panel of 2,650 sequenced individuals from the T2D-Go Project (Altshuler, Boehnke, McCarthy) as a reference.
 +
 +
All evaluations used [[Minimac]] and were carried out by [mailto:cfuchsb@umich.edu Christian Fuchsberger].
    
= Illumina Exome Arrays =
 
= Illumina Exome Arrays =
Line 316: Line 336:  
| align="right" | 181  
 
| align="right" | 181  
 
|}
 
|}
 +
 +
&nbsp;
 +
 +
 +
== Sites to be Careful About ==
 +
 +
Peter Chines, working with Francis Collins, provided a list of 333 exome chip variants sites that should be treated with caution. The sites include variants for which the SNP probe differs from the expected reference genome sequence, could not be mapped back to the reference, mapped to multiple places, or where neither allele matches the reference genome.
 +
 +
A plain text list of these sites [ftp://share.sph.umich.edu/exomeChip/IlluminaDesigns/cautiousSites/cautiousSite.sorted.sites list] and corresponding [ftp://share.sph.umich.edu/exomeChip/IlluminaDesigns/cautiousSites/cautiousSite.sorted.README descriptions] are available.
    
= Affymetrix Exome Arrays =
 
= Affymetrix Exome Arrays =
   −
Information on assay design is not available at this point.
+
== Coding Variants: Design Criteria ==
 +
Probe sequences were a priori excluded if there was an adjacent polymorphism within 5bp of the target variant or if the cumulative genome-frequency count of each 16-mer in the probe exceeded 300.
 +
The array was wet-lab validated against HapMap 270 and ~1000 Genomes Sample Collections.
 +
 
 +
{| cellpadding="2" cellspacing="1" border="0" summary="Summarizes the Number of SNPs in Each Category tat were attempted and those that passed wet lab validation. Note that Categories Overlap."
 +
|+ '''Affymetrix Assay Design Summary'''
 +
|-
 +
! bgcolor="lightblue" scope="col" |  Categories
 +
! bgcolor="lightblue" scope="col"  align="right"| Candidates
 +
! bgcolor="lightblue" scope="col" align="center" | &nbsp; # wet lab validated  <br>&nbsp;&nbsp;& working on Axiom
 +
! bgcolor="lightblue" scope="col"  | Comments
 +
|-
 +
! scope="row" align="left" |  Non-synomynous Coding SNPs<br>&nbsp;/splice & stop
 +
| align="right" |259,976<br>&nbsp;/19,672
 +
| align="right" | 247,546<br>&nbsp;/17,066
 +
| &nbsp;Includes  16K additional non-synonymous coding variants from <br> &nbsp; the  Axiom Genomic Database. .
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" |  GWAS
 +
| bgcolor="lightgray" scope="row"  align="right" | 5,542
 +
| bgcolor="lightgray" scope="row" align="right" | 5,053
 +
| bgcolor="lightgray" scope="row" |
 +
|-
 +
! scope="row" align="left"  | Grid
 +
| align="right" | 5,719
 +
| align="right" | 5,478
 +
|
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" | Synonymous cSNPs
 +
| bgcolor="lightgray" scope="row"  align="right" | 5,000
 +
| bgcolor="lightgray" scope="row"  align="right" | 4,367
 +
| bgcolor="lightgray" scope="row"  align="right" |
 +
|-
 +
!  scope="row" align="left" |  AIMs  (Eur/African Ancestry)
 +
| align="right" | 3,388
 +
| align="right" | 3,283
 +
|
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" |  AIMs  (Native American Ancestry)
 +
| bgcolor="lightgray" scope="row"  align="right" | 1,000
 +
| bgcolor="lightgray" scope="row"  align="right" | 962
 +
| bgcolor="lightgray" scope="row"  align="right" |
 +
|-
 +
! scope="row" align="left"  |  AIMs  (Other)
 +
| align="right" | 271
 +
| align="right" | 271
 +
|  &nbsp;Includes  supplemental AIMs from the Latin American Cancer <br> &nbsp; Epidemiology  (LACE) Consortium. 
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" |  HLA
 +
| bgcolor="lightgray" scope="row"  align="right" | 2,536
 +
| bgcolor="lightgray" scope="row"  align="right"| 2,262
 +
| bgcolor="lightgray" scope="row"  align="right" |
 +
|-
 +
! scope="row" align="left"  |  ESP
 +
| align="right" | 1,003
 +
| align="right" | 952
 +
|
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" | Fingerprint
 +
| bgcolor="lightgray" scope="row"  align="right" | 285
 +
| bgcolor="lightgray" scope="row"  align="right" | 268
 +
| bgcolor="lightgray" scope="row"  align="right" |
 +
|-
 +
! scope="row" align="left"  |  miRNA
 +
| align="right" | 285
 +
| align="right" | 250
 +
|
 +
|-
 +
! bgcolor="lightgray" scope="row"  align="left" |  Mitochondrial DNA
 +
| bgcolor="lightgray" scope="row"  align="right"| 246
 +
| bgcolor="lightgray" scope="row"  align="right"| 207
 +
| bgcolor="lightgray" scope="row"  align="right" |
 +
|-
 +
! scope="row" align="left" |  Chromosome Y
 +
| align="right" | 232
 +
| align="right" |  161
 +
|
 +
|-
 +
! bgcolor="lightgray" scope="row" align="left" |  Indels
 +
| bgcolor="lightgray" scope="row"  align="right" | 56,095
 +
| bgcolor="lightgray" scope="row"  align="right" | 35,137
 +
| bgcolor="lightgray" scope="row"  | &nbsp; Includes biallelic indels from the draft Phase 1 1000 Genomes Project and  previously validated indels in the Axiom Genomic Database;  <br> &nbsp; indel size ranges from 1-138bp. 
 +
|-
 +
! scope="row" |
 +
| align="right" |
 +
| align="right" |
 +
|
 +
|-
 +
! bgcolor="lightblue" scope="row" border = "1"| Total Number Target Variants
 +
| bgcolor="lightblue"  align="right" | 369,656
 +
| bgcolor="lightblue"  align="right" | 318,983
 +
| bgcolor="lightblue" |
 +
|}
550

edits

Navigation menu