Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2 bytes added ,  22:57, 23 November 2009
no edit summary
Line 8: Line 8:     
   lanecheck  --referencegenome NCBI36.fa --dbSNPfile dbSNP.txt  
 
   lanecheck  --referencegenome NCBI36.fa --dbSNPfile dbSNP.txt  
            --lanefile lane.lst --pedfile test.ped --datfile test.dat --mapfile test.map --prefix result
+
            --lanefile lane.lst --pedfile test.ped --datfile test.dat --mapfile test.map --prefix result
    
== Command Line Options ==
 
== Command Line Options ==
Line 35: Line 35:  
=== Other Options ===
 
=== Other Options ===
   −
  --memorymap ''use memory map technique for efficient memory sharing of reference genome file
+
  --memorymap ''use memory map technique for efficient memory sharing of reference genome file''
  ''
+
   
   −
== Principle of operation: ==
+
== Principle of Operation: ==
    
The overall procedure is that the genotype identity checking program compares internal evidence from the sequence reads themselves to reference genotype information for a panel of candidate individuals. In the case of 1000 Genomes pilot data, these are HapMap genotypes from the same Coriell cell lines that are being sequenced. For each combination of [sequencing run x candidate individual] the program calculates the observed rate of mismatches at both "informative" and "background" locations and reports as "excess mismatch rate"
 
The overall procedure is that the genotype identity checking program compares internal evidence from the sequence reads themselves to reference genotype information for a panel of candidate individuals. In the case of 1000 Genomes pilot data, these are HapMap genotypes from the same Coriell cell lines that are being sequenced. For each combination of [sequencing run x candidate individual] the program calculates the observed rate of mismatches at both "informative" and "background" locations and reports as "excess mismatch rate"
Line 52: Line 52:  
1. Separate the results by "Read group classifier".
 
1. Separate the results by "Read group classifier".
   −
The mapped .bam file may contains sequence data from different instrument runs. The read identifiers often are dot or colon-separated strings of the form 'run_name<sep>read_number'. The 'run_name' may be either an SRR / ERR identifier or the sequencing center's own alpha-numeric internal run identifier. Allow users to input extended regular expression such as '\(^[^.:]+\)[.:].*' hich matches just the part of each read identifier that is common to all reads from one instrument run and which differs between instrument runs.  
+
The mapped .bam file may contains sequence data from different instrument runs. The read identifiers often are dot or colon-separated strings of the form 'run_name<sep>read_number'. The 'run_name' may be either an SRR / ERR identifier or the sequencing center's own alpha-numeric internal run identifier. Allow users to input extended regular expression such as '\(^[^.:]+\)[.:].*' hich matches just the part of each read identifier that is common to all reads from one instrument run and which differs between instrument runs.
 
      +
<br>
    
2. Use model based approach to calculate probability of lane coming from the claimed individual in the index file given a pool of individuals. &nbsp;
 
2. Use model based approach to calculate probability of lane coming from the claimed individual in the index file given a pool of individuals. &nbsp;
533

edits

Navigation menu