Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,412 bytes added ,  10:01, 2 February 2017
Line 1: Line 1: −
'''ExomePicks''' is a program that suggests individuals to be sequenced in a large pedigree. '''ExomePicks''' assumes that a genotyping chip or another cost effective means will be used to determine IBD sharing in the pedigree and that, subsequently, one would like to sequence a minimal number of individuals and use their sequences together with IBD information to deduce the sequence of other individuals in the pedigree.
+
[[Category:Software]]
 +
'''ExomePicks''' is a program that suggests individuals to be sequenced in a large pedigree. '''ExomePicks''' assumes that a genotyping chip or another cost effective means will be used to determine IBD sharing in the pedigree and that, subsequently, one would like to sequence a minimal number of individuals and use their sequences together with IBD information to deduce the sequence of other individuals in the pedigree. We are currently using it in the context of whole exome and whole genome sequencing studies to pick individuals to be sequenced from large family collections.
    
== Download ==
 
== Download ==
   −
A source code package can be downloaded from [http://www.sph.umich.edu/csg/abecasis/generic-ExomePicks--2010-03-03.tar.gz here].
+
A source code package can be downloaded from [http://csg.sph.umich.edu//abecasis/downloads/generic-ExomePicks-2010-04-12.tar.gz here]. A Windows version can be found [http://csg.sph.umich.edu//cfuchsb/ExomePicks_Windows.zip here‎]
 +
 
 +
An extended version of the package which produces output for the visualization with ExomePicksViewer is under development; for details contact [mailto:cfuchsb@umich.edu Christian Fuchsberger].
    
== Input Files ==
 
== Input Files ==
Line 40: Line 43:  
ExomePicks currently ignores any information on twin status that may be present.
 
ExomePicks currently ignores any information on twin status that may be present.
   −
== Instructions ==
+
== Command Line ==
    
The only essential command line options are those that specify input file names, thus:
 
The only essential command line options are those that specify input file names, thus:
    
   ExomePicks -d small.dat -p small.ped
 
   ExomePicks -d small.dat -p small.ped
 +
 +
== Algorithm ==
 +
 +
The program loops through sibships, starting at the top of the the pedigree and
 +
suggests individuals for sequencing as it moves through. In pedigrees
 +
where DNA samples are available for everyone, it selects every founder
 +
(to identify all segregating chromosomes) plus at least one offspring
 +
per founder (to determine phase). When founder DNA is missing, it selects
 +
additional offspring in for each founder couple (if possible) or in sibships
 +
internal to the pedigree (if a DNA sample is not available for founder couple
 +
offspring, for example).
 +
 +
== Outputs ==
 +
 +
ExomePicks summarizes its suggestions in three files.
 +
 +
=== Per Nuclear Family ===
 +
 +
Because suggestions are made one nuclear family at a time, this file should be considered the primary output of the program. The default name for the file is '''perFamily-sequencing.txt'''. Here is an example:
 +
 +
  FAMID  ID1    ID2    ID3    TYPE    SEQ    RETURN  RATIO
 +
  1      I1      I2      II1    TRIO    3      5.50    1.83
 +
  1      I3      I4      II2    TRIO    3      5.50    1.83
 +
  1      III1    IV1    --      FATHER  2      2.00    1.00
 +
 +
In this particular pedigree, individuals for sequencing were selected from three nuclear families. In the first two
 +
nuclear families, a parent-offspring trio was selected (resulting in three genotyped individuals). In each of these
 +
cases sequencing 3 individuals will provide information on approximately 5.5 genomes. In the final nuclear family,
 +
a father-offspring pair was selected for sequencing. This particular pair requires 2 individuals to be sequenced and
 +
provides information on only 2 genomes (if other nuclear families in the pedigree are sequenced as suggested).
 +
 +
=== Per Individual ===
 +
 +
The program also summarizes its suggestions on a per individual basis. Although it is attractive to select one
 +
individual at a time for sequencing, it is important to note that some individuals (e.g. the offspring of a trio) don't
 +
contribute information on new genomes (their genome is contained in their parent's genome) but do provide essential information about phase. In general, it is probably safer to select individuals for sequencing based on the per nuclear family output.
 +
 +
Here is an example:
 +
 +
FAMID  ID      WHO    VALUE  AFFVALUE
 +
1      I1      FATHER  2.75    0.00
 +
1      I2      MOTHER  2.75    0.00
 +
1      II1    CHILD  0.00    0.00
 +
1      I3      FATHER  2.75    0.00
 +
1      I4      MOTHER  2.75    0.00
 +
1      II2    CHILD  0.00    0.00
 +
1      III1    FATHER  2.00    0.00
 +
1      IV1    CHILD  0.00    0.00
 +
 +
One useful feature of this file is that it also includes information on how many ''disease'' genomes can be deduced from sequencing each individual.
 +
 +
=== Per Family ===
 +
 +
A final output file provides summary statistics for each extended family. Here is an example:
 +
 +
FAMID  DNA    SEQ    RETURN  RATIO
 +
1      13      8      13.00  1.62
 +
 +
In this case, sequencing 8 individuals would provide information on 13 genomes.
 +
 +
== Improvements Under Consideration ==
 +
 +
* Change sorting so that the most valuable individuals for each pedigree are picked first. When resources are limited, it might not be affordable to sequence enough individuals to completely impute each pedigree. This scoring order would allow one to more easily focus on the highest value individuals in each pedigree.
 +
 +
* Implement the ability to evaluate only nuclear families, ignoring the possibility of imputing more distant relatives.
 +
 +
* Add the summed number of sequenced or imputed affecteds to the per family output.
 +
 +
* Unrelated single individuals (''founders'' without offspring) are currently scored as zero value. These should more accurately be scored as single genomes.
 +
 +
== Acknowledgements ==
 +
 +
Weimin Chen and Serena Sanna for discussions that contributed to the initial version of this program. John Blangero at the Southwest Foundation for testing initial version.
96

edits

Navigation menu