Line 1: |
Line 1: |
− | '''genotypeIdCheck''' is a program that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals). | + | {| style="width:100%; background:#FF8989; margin-top:1.2em; border:1px solid #ccc;" | |
| + | | style="width:100%; text-align:center; white-space:nowrap; color:#000;" | |
| + | <div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">This tool has been DEPRECATED, and replaced by [[VerifyBamID]]</div> |
| + | |} |
| + | |
| + | '''bamGenotypeCheck''' is a program that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals). |
| + | |
| + | |
| + | == Download bamGenotypeCheck == |
| + | |
| + | To get a copy go to the [http://csg.sph.umich.edu//pha/karma/download/ Karma Download] download page. |
| + | |
| + | == Build bamGenotypeCheck == |
| + | |
| + | Karma (which includes bamGenotypeCheck) is designed to be reasonably portable. |
| + | |
| + | However, since development occurs only on Ubuntu 9.10 x86 and x64 platforms, and later, there are likely other portability issues. |
| + | |
| + | We support Karma only on Ubuntu 9.10 and later on 64-bit processors. |
| | | |
| == Usage == | | == Usage == |
Line 5: |
Line 23: |
| A key step in any genetic analysis is to verify whether data being generated matches expectations. This program checks whether reads in a BAM file match previous genotypes for a specific sample. | | A key step in any genetic analysis is to verify whether data being generated matches expectations. This program checks whether reads in a BAM file match previous genotypes for a specific sample. |
| | | |
− | Using a mathematical model that relates observed sequence reads to an hypothetical true genotype, genotypeIdCheck tries to decide whether sequence reads match a particular individual or are more likely to be contaminated (including a small proportion of foreign DNA), derived from a closely related individual, or derived from a completely different individual. | + | Using a mathematical model that relates observed sequence reads to an hypothetical true genotype, bamGenotypeCheck tries to decide whether sequence reads match a particular individual or are more likely to be contaminated (including a small proportion of foreign DNA), derived from a closely related individual, or derived from a completely different individual. |
| | | |
| == Basic Usage Example == | | == Basic Usage Example == |
Line 11: |
Line 29: |
| Here is a typical command line: | | Here is a typical command line: |
| | | |
− | genotypeIDcheck -r /data/local/ref/karma.ref/human.g1k.v37.fa \ | + | bamGenotypeCheck -r /data/local/ref/karma.ref/human.g1k.v37.fa \ |
| -k BAMfiles.txt -p test.ped -d test.dat -m test.map | | -k BAMfiles.txt -p test.ped -d test.dat -m test.map |
| | | |
Line 18: |
Line 36: |
| === Input Files === | | === Input Files === |
| | | |
− | -r ''FASTA format genome reference'' | + | -r ''genome reference in [http://en.wikipedia.org/wiki/Fasta_format simplified FASTA format]'' |
− | -a ''allele Frequency file'' | + | -a ''allele Frequency file in [[MERLIN format]]'' |
| -p ''pedigree file in [[MERLIN format]]'' | | -p ''pedigree file in [[MERLIN format]]'' |
| -d ''data file in [[MERLIN format]]'' | | -d ''data file in [[MERLIN format]]'' |
Line 34: |
Line 52: |
| === Filtering === | | === Filtering === |
| | | |
− | -b [int] ''exclude bases with quality less than [int]'' | + | -b [int] ''exclude bases with quality less than [int]'' |
− | -M [int] ''exclude reads with map quality less than [int]'' | + | -M [int] ''exclude reads with map quality less than [int]'' |
− | -F [int] ''set custom BAM flags filter (not implemented at the moment)'' | + | -f [float] ''drop markers with minor allele frequency smaller than [float]'' |
| + | -F [int] ''set custom BAM flags filter (not implemented at the moment)'' |
| | | |
| === Other Options === | | === Other Options === |
Line 48: |
Line 67: |
| For each aligned base that overlaps a known genotype, we calculate the probability the probability that it was derived from a particular known genotype. This comparison considers only bases that overlap previously known genotypes and that meet the base quality and mapping quality thresholds. | | For each aligned base that overlaps a known genotype, we calculate the probability the probability that it was derived from a particular known genotype. This comparison considers only bases that overlap previously known genotypes and that meet the base quality and mapping quality thresholds. |
| | | |
− | Each individual in a pedigree has a different combination of genotypes, and genotypeIdCheck will systematically search for the individual whose genotypes best match the observed read data. | + | Each individual in a pedigree has a different combination of genotypes, and bamGenotypeCheck will systematically search for the individual whose genotypes best match the observed read data. |
| | | |
| For more about the technical details, see the page [[Verifying Sample Identities - Implementation]] | | For more about the technical details, see the page [[Verifying Sample Identities - Implementation]] |
| | | |
| == TODO == | | == TODO == |