Changes

From Genome Analysis Wiki
Jump to navigationJump to search
171 bytes added ,  14:54, 4 February 2010
no edit summary
Line 1: Line 1:  
== Validation Criteria ==
 
== Validation Criteria ==
 
=== Sequence Identifier Line ===
 
=== Sequence Identifier Line ===
{| class="wikitable" border="1"
+
{| class="wikitable" style="width:100%" border="1"
 
|-
 
|-
!  Validation Criteria
+
width="50%"|Validation Criteria
!  Error Message
+
width="50%"|Error Message
 
|-
 
|-
 
|  Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)
 
|  Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)
Line 20: Line 20:     
=== Raw Sequence Line ===
 
=== Raw Sequence Line ===
{| class="wikitable" border="1"
+
{| class="wikitable" style="width:100%" border="1"
 
|-
 
|-
!  Validation Criteria
+
width="50%"|Validation Criteria
!  Error Message
+
width="50%"|Error Message
 
|-
 
|-
 
|  A base sequence should have non-zero length.
 
|  A base sequence should have non-zero length.
Line 43: Line 43:     
=== Plus Line ===
 
=== Plus Line ===
{| class="wikitable" border="1"
+
{| class="wikitable" style="width:100%" border="1"
 
|-
 
|-
!  Validation Criteria
+
width="50%"|Validation Criteria
!  Error Message
+
width="50%"|Error Message
 
|-
 
|-
 
|  Must exist for every sequence.
 
|  Must exist for every sequence.
Line 56: Line 56:     
=== Quality String Line ===
 
=== Quality String Line ===
{| class="wikitable" border="1"
+
{| class="wikitable" style="width:100%" border="1"
 
|-
 
|-
!  Validation Criteria
+
width="50%"|Validation Criteria
!  Error Message
+
width="50%"|Error Message
 
|-
 
|-
 
|  A quality string should be present for every base sequence.
 
|  A quality string should be present for every base sequence.
Line 74: Line 74:  
*Base composition are reported and tracked by position.
 
*Base composition are reported and tracked by position.
 
*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
 
*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
  −
== Additional Wishlist - Not Implemented ==
  −
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
  −
      
== Assumptions ==
 
== Assumptions ==
Line 86: Line 82:  
*All lines are part of the Raw Sequence Line until a line that starts with a '+' is discovered.
 
*All lines are part of the Raw Sequence Line until a line that starts with a '+' is discovered.
 
*All lines are considered part of the quality string until at least the length of the associated raw sequence is hit (or the end of the file is reached).  This is due to the fact that '@' is a valid quality character, so does not necessarily indicate the start of a Sequence Identifier Line.
 
*All lines are considered part of the quality string until at least the length of the associated raw sequence is hit (or the end of the file is reached).  This is due to the fact that '@' is a valid quality character, so does not necessarily indicate the start of a Sequence Identifier Line.
 +
 +
== Additional Wishlist - Not Implemented ==
 +
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
    
== How to Use the fastQValidator Executable ==
 
== How to Use the fastQValidator Executable ==

Navigation menu