Changes

From Genome Analysis Wiki
Jump to navigationJump to search
151 bytes removed ,  10:49, 2 February 2017
Line 1: Line 1: −
== Validation Criteria ==
+
[[Category:C++]]
=== Sequence Identifier Line ===
+
[[Category:libStatGen]]
{| class="wikitable" border="1"
+
[[Category:libStatGen FASTQ]]
|-
  −
!  Validation Criteria
  −
!  Error Message
  −
|-
  −
|  Every entry in the file should have a unique identifier.
  −
|  ERROR on Line <current line #>: Repeated Sequence Identifier: <identifier> at Lines <previous line #> <current line #>
  −
|}
     −
=== Raw Sequence Line ===
+
== Where to find the fastqFile Library and the FastQValidator ==
*A base sequence should have non-zero length.
  −
*Validates the base sequences against the characters allowed via configuration.
  −
** Base Only: A C T G N a c t g n
  −
** Color Space Only: 0 1 2 3 .(period)
  −
** Base or Color Space: A C T G N a c t g n 0 1 2 3 .(period)
  −
*Reads should be of a minimum length; many mappers will get into trouble with very short reads.
     −
=== Plus Line ===
+
The fastQ Library is now a part of [[C++ Library: libStatGen]].
   −
=== Quality String Line ===
+
The FastQValidator is documented at [[FastQValidator]].
*A quality string should be present for every base sequence.
  −
*Paired quality and base sequences should be of the same length.
  −
*Valid quality values should all have ASCII codes &gt; 32.
     −
== Additional Features ==
+
== FASTQ Library Component for Reading and Validating FastQFiles ==
*Base composition are reported and tracked by position.
+
The software reads and validates fastq files in both compressed and uncompressed formats.
*Consumes gzipped and uncompressed text files transparently (see libcsg/InputFile.h).
     −
== Additional Wishlist - Not Implemented ==
+
The FASTQ component of the library is found in libStatGen/fastq/.
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
      +
See https://github.com/statgen/libStatGen/commits/master/fastq for a list of the most recent updates to the development version of the FASTQ portion of the library.
    +
For the old change log, see: [[C++ Library: FASTQ Change Log]]
   −
== Assumptions ==
+
=== Classes in the FASTQ Portion of Library ===
 
+
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
== How to Use the fastQValidator Executable ==
+
|-style="background: #f2f2f2; text-align: center;"
'''Required Parameters:'''
+
! Class Name !! Description
        -: FastQ filename with path to be prorcessed.
+
|-
 
+
| <code>[[C++ Class: FastQFile|FastQFile]]</code>
'''Optional Parameters:'''
+
| Class used for reading/validating a fastq file.
        -: Minimum allowed read length (Defaults to 10).
+
|-
        -: Maximum number of errors to display before suppressing them(Defaults to 20).
+
| <code>[http://csg.sph.umich.edu//mktrost/doxygen/current/classBaseCount.html BaseCount]</code>
        -Raw sequence type:  B - ACTGN only (Default)
+
| Wrapper around an array that has one index per base and an extra index for a total count of all bases. This class is used to keep a count of the number of times each index has occurredIt can print a percentage of the occurrence of each base against the total number of bases.
                                  C - 0123. only
+
|-
                                  BC - ACTGN or 0123.
+
| <code>[http://csg.sph.umich.edu//mktrost/doxygen/current/classBaseComposition.html BaseComposition]</code>
 
+
| Class that tracks the composition of base by read location.
'''Testing only Parameters:'''
+
|-
        -t :  If "ReadOnly" is specified, the fastq will be read but not processedThis may be used for determining read time.
+
| <code>[http://csg.sph.umich.edu//mktrost/doxygen/current/classFastQStatus.html FastQStatus]</code>
'''Usage:'''
+
| Status for FastQ operations.
        ./fastQValidator -f <fileName> -l <minReadLen> -e <maxReprotedErrors> -b <rawSeqType>
+
|}
   −
'''Examples:'''
+
== FASTQ Output ==
        ../fastQValidator -f testFile.txt
+
When a sequence is read, error messages for the first maxReportedErrors are output for failed [[C++ Class: FastQFile#Validation Criteria Used For Reading a Sequence|Validation Criteria]].
        ../fastQValidator -f testFile.txt -l 10 -b BC -e 100
+
For Example:
        ./fastQValidator -f test/testFile.txt -l 10 -b BC -e 100
+
ERROR on Line 25: The sequence identifier line was too short.
        time ./fastQValidator -f test/testFile.txt -t ReadOnly
+
ERROR on Line 29: First line of a sequence does not begin wtih @
 +
ERROR on Line 33: No Sequence Identifier specified before the comment.
   −
== FastQ Validator Output ==
+
== FastQValidator ==
'''Coming Soon'''
+
The [[FastQValidator]] was built using the FastQFile class.  More details on that program are at the supplied link.
96

edits

Navigation menu