Changes

From Genome Analysis Wiki
Jump to navigationJump to search
931 bytes added ,  11:11, 2 February 2017
Line 1: Line 1: −
'''genotypeIdCheck''' is a program that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals).
+
{| style="width:100%; background:#FF8989; margin-top:1.2em; border:1px solid #ccc;" |
 +
| style="width:100%; text-align:center; white-space:nowrap; color:#000;" |
 +
<div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This tool has been DEPRECATED, and replaced by [[VerifyBamID]]</div>
 +
|}
 +
 
 +
'''bamGenotypeCheck''' is a program that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals).
 +
 
 +
 
 +
== Download bamGenotypeCheck  ==
 +
 
 +
To get a copy go to the [http://csg.sph.umich.edu//pha/karma/download/ Karma Download] download page.
 +
 
 +
== Build bamGenotypeCheck  ==
 +
 
 +
Karma (which includes bamGenotypeCheck) is designed to be reasonably portable.
 +
 
 +
However, since development occurs only on Ubuntu 9.10 x86 and x64 platforms, and later, there are likely other portability issues.
 +
 
 +
We support Karma only on Ubuntu 9.10 and later on 64-bit processors.
    
== Usage ==
 
== Usage ==
Line 5: Line 23:  
A key step in any genetic analysis is to verify whether data being generated matches expectations. This program checks whether reads in a BAM file match previous genotypes for a specific sample.  
 
A key step in any genetic analysis is to verify whether data being generated matches expectations. This program checks whether reads in a BAM file match previous genotypes for a specific sample.  
   −
Using a mathematical model that relates observed sequence reads to an hypothetical true genotype, genotypeIdCheck tries to decide whether sequence reads match a particular individual or are more likely to be contaminated (including a small proportion of foreign DNA), derived from a closely related individual, or derived from a completely different individual.
+
Using a mathematical model that relates observed sequence reads to an hypothetical true genotype, bamGenotypeCheck tries to decide whether sequence reads match a particular individual or are more likely to be contaminated (including a small proportion of foreign DNA), derived from a closely related individual, or derived from a completely different individual.
    
== Basic Usage Example ==
 
== Basic Usage Example ==
Line 11: Line 29:  
Here is a typical command line:
 
Here is a typical command line:
   −
   genotypeIDcheck -r /data/local/ref/karma.ref/human.g1k.v37.fa \
+
   bamGenotypeCheck -r /data/local/ref/karma.ref/human.g1k.v37.fa \
 
               -k BAMfiles.txt -p test.ped -d test.dat -m test.map
 
               -k BAMfiles.txt -p test.ped -d test.dat -m test.map
   Line 18: Line 36:  
=== Input Files ===
 
=== Input Files ===
   −
  -r  ''FASTA format genome reference''
+
  -r  ''genome reference in [http://en.wikipedia.org/wiki/Fasta_format simplified FASTA format]''
  -a  ''allele Frequency file''
+
  -a  ''allele Frequency file in [[MERLIN format]]''
 
  -p  ''pedigree file in [[MERLIN format]]''
 
  -p  ''pedigree file in [[MERLIN format]]''
 
  -d  ''data file in [[MERLIN format]]''
 
  -d  ''data file in [[MERLIN format]]''
Line 34: Line 52:  
=== Filtering ===
 
=== Filtering ===
   −
  -b [int] ''exclude bases with quality less than [int]''
+
  -b [int]   ''exclude bases with quality less than [int]''
  -M [int] ''exclude reads with map quality less than [int]''
+
  -M [int]   ''exclude reads with map quality less than [int]''
  -F [int] ''set custom BAM flags filter (not implemented at the moment)''
+
-f [float] ''drop markers with minor allele frequency smaller than [float]''
 +
  -F [int]   ''set custom BAM flags filter (not implemented at the moment)''
    
=== Other Options ===
 
=== Other Options ===
Line 48: Line 67:  
For each aligned base that overlaps a known genotype, we calculate the probability the probability that it was derived from a particular known genotype. This comparison considers only bases that overlap previously known genotypes and that meet the base quality and mapping quality thresholds.
 
For each aligned base that overlaps a known genotype, we calculate the probability the probability that it was derived from a particular known genotype. This comparison considers only bases that overlap previously known genotypes and that meet the base quality and mapping quality thresholds.
   −
Each individual in a pedigree has a different combination of genotypes, and genotypeIdCheck will systematically search for the individual whose genotypes best match the observed read data.
+
Each individual in a pedigree has a different combination of genotypes, and bamGenotypeCheck will systematically search for the individual whose genotypes best match the observed read data.
    
For more about the technical details, see the page [[Verifying Sample Identities - Implementation]]
 
For more about the technical details, see the page [[Verifying Sample Identities - Implementation]]
    
== TODO ==
 
== TODO ==
96

edits

Navigation menu