From Genome Analysis Wiki
Jump to navigationJump to search
2 bytes added
, 22:57, 23 November 2009
Line 8: |
Line 8: |
| | | |
| lanecheck --referencegenome NCBI36.fa --dbSNPfile dbSNP.txt | | lanecheck --referencegenome NCBI36.fa --dbSNPfile dbSNP.txt |
− | --lanefile lane.lst --pedfile test.ped --datfile test.dat --mapfile test.map --prefix result
| + | --lanefile lane.lst --pedfile test.ped --datfile test.dat --mapfile test.map --prefix result |
| | | |
| == Command Line Options == | | == Command Line Options == |
Line 35: |
Line 35: |
| === Other Options === | | === Other Options === |
| | | |
− | --memorymap ''use memory map technique for efficient memory sharing of reference genome file | + | --memorymap ''use memory map technique for efficient memory sharing of reference genome file'' |
− | '' | + | |
| | | |
− | == Principle of operation: == | + | == Principle of Operation: == |
| | | |
| The overall procedure is that the genotype identity checking program compares internal evidence from the sequence reads themselves to reference genotype information for a panel of candidate individuals. In the case of 1000 Genomes pilot data, these are HapMap genotypes from the same Coriell cell lines that are being sequenced. For each combination of [sequencing run x candidate individual] the program calculates the observed rate of mismatches at both "informative" and "background" locations and reports as "excess mismatch rate" | | The overall procedure is that the genotype identity checking program compares internal evidence from the sequence reads themselves to reference genotype information for a panel of candidate individuals. In the case of 1000 Genomes pilot data, these are HapMap genotypes from the same Coriell cell lines that are being sequenced. For each combination of [sequencing run x candidate individual] the program calculates the observed rate of mismatches at both "informative" and "background" locations and reports as "excess mismatch rate" |
Line 52: |
Line 52: |
| 1. Separate the results by "Read group classifier". | | 1. Separate the results by "Read group classifier". |
| | | |
− | The mapped .bam file may contains sequence data from different instrument runs. The read identifiers often are dot or colon-separated strings of the form 'run_name<sep>read_number'. The 'run_name' may be either an SRR / ERR identifier or the sequencing center's own alpha-numeric internal run identifier. Allow users to input extended regular expression such as '\(^[^.:]+\)[.:].*' hich matches just the part of each read identifier that is common to all reads from one instrument run and which differs between instrument runs. | + | The mapped .bam file may contains sequence data from different instrument runs. The read identifiers often are dot or colon-separated strings of the form 'run_name<sep>read_number'. The 'run_name' may be either an SRR / ERR identifier or the sequencing center's own alpha-numeric internal run identifier. Allow users to input extended regular expression such as '\(^[^.:]+\)[.:].*' hich matches just the part of each read identifier that is common to all reads from one instrument run and which differs between instrument runs. |
− | | |
| | | |
| + | <br> |
| | | |
| 2. Use model based approach to calculate probability of lane coming from the claimed individual in the index file given a pool of individuals. | | 2. Use model based approach to calculate probability of lane coming from the claimed individual in the index file given a pool of individuals. |