LibStatGen: FASTQ

From Genome Analysis Wiki
Jump to navigationJump to search

Where to find the fastqFile Library and the FastQValidator

The released version of fastQFile and FastQValidator code can be downloaded at: http://www.sph.umich.edu/csg/mktrost/fastQFile/

FastQFile Library for Reading and Validating FastQFiles

The software reads and validates fastq files in both compressed and uncompressed formats.

The library is found in pipeline/bam, and is called libfqf.a.

This library is dependent on two other libraries, libcsg/libcsg.a and samtools/libbam.a so be sure to include them all in the proper order:

<path to base pipeline directory>/fastQFile/libfqf.a <path to base pipeline directory>/libcsg/libcsg.a <path to base pipeline directory>/thirdParty/samtools/libbam.a

See C++ Library: libfqf Change Log for a list of the most recent updates to the development version of the library.

Classes in the FastQFile Library

Class Name Description
FastQFile Class used for reading/validating a fastq file.
BaseCount Wrapper around an array that has one index per base and an extra index for a total count of all bases. This class is used to keep a count of the number of times each index has occurred. It can print a percentage of the occurrence of each base against the total number of bases.
BaseComposition Class that tracks the composition of base by read location.


Library Output

When a sequence is read, error messages for the first maxReportedErrors are output for failed Validation Criteria. For Example:

ERROR on Line 25: The sequence identifier line was too short.
ERROR on Line 29: First line of a sequence does not begin wtih @
ERROR on Line 33: No Sequence Identifier specified before the comment.


FastQValidator

The FastQ_Validator was built using the FastQFile class. More details on that program are at the supplied link.