Difference between revisions of "BamGenotypeCheck"
Line 55: | Line 55: | ||
"Informative" locations are those where the candidate individual is homozygous, according to the HapMap genotype information, and base calls are compared to the HapMap homozygous allele, rather than to the genome reference sequence. "Background" locations are all sites not known to be polymorphic and not recorded in dbSNP. | "Informative" locations are those where the candidate individual is homozygous, according to the HapMap genotype information, and base calls are compared to the HapMap homozygous allele, rather than to the genome reference sequence. "Background" locations are all sites not known to be polymorphic and not recorded in dbSNP. | ||
− | |||
abc | abc | ||
== TODO == | == TODO == |
Revision as of 17:39, 22 November 2009
LaneCheck
Basic Usage Example
Here is an example of how glfTrio
works:
lanecheck -f NA19239.chrom20.SLX.glf -m NA19238.chrom20.SLX.glf -c NA19240.chrom20.SLX.glf \ --father NA19239 --mother NA19238 --child NA19240 \ --minMapQuality 30 --minTotalDepth 0 --maxTotalDepth 1000 \ -b YRI.chrom20.SLX.vcf > YRI.chrom20.SLX.log
Command Line Options
Input Files
-f genotype likelihood file Father's GLF-format genotype likelihood file -m genotype likelihood file Mother's GLF-format genotype likelihood file -c genotype likelihood file Child's GLF-format genotype likelihood file
Basic Output Options
-b baseCallFile Specifies the name of the output VCF-format base call file -p threshold The threshold for base calling. Base calls will be made when their posterior likelihood exceeds threshold --reference Positions called as homozygous reference will be included in the output. --verbose Print debug information to the screen
Filtering According to Depth and Map Quality
--minMapQuality threshold Positions where the root-means squared mapping quality falls below this threshold will be excluded. --strict When the map quality is interpreted strictly, all three trio individuals must exceed minMapQuality before a call is made. Without the --strict option, reads for individuals below the threshold are ignored.
--minTotalDepth threshold Positions where the read depth falls below this threshold will be excluded. --maxTotalDepth threshold Positions where the read depth exceeds this threshold will be excluded.
Sample Labels
--father fatherLabel Specifies a label for the male parent, which will be included in the output VCF file --mother motherLabel Specifies a label for the female parent, which will be included in the output VCF file --child childLabel Specifies a label for the child, which will be included in the output VCF file
X Chromosome Variant Calling
--xChr chromosomeName Label for the 'X' chromosome in the GLF file --xStart sexChromosomeStart Start of the non-pseudo-autosomal portion of the X (2,709,521 bp in build 36) --xStop sexChromosomeEnd End of the non-pseudo-autosomal portion of the X (154,584,237 bp in build 36)
Principle of operation:
The overall procedure is that the genotype identity checking program compares internal evidence from the sequence reads themselves to reference genotype information for a panel of candidate individuals. In the case of 1000 Genomes pilot data, these are HapMap genotypes from the same Coriell cell lines that are being sequenced. For each combination of [sequencing run x candidate individual] the program calculates the observed rate of mismatches at both "informative" and "background" locations and reports as "excess mismatch rate"
excess rate = (informative rate - background rate).
"Informative" locations are those where the candidate individual is homozygous, according to the HapMap genotype information, and base calls are compared to the HapMap homozygous allele, rather than to the genome reference sequence. "Background" locations are all sites not known to be polymorphic and not recorded in dbSNP.
abc