Difference between revisions of "LibStatGen: FASTQ"

From Genome Analysis Wiki
Jump to navigationJump to search
(No difference)

Revision as of 11:47, 25 October 2010

Where to find the fastqFile Library and the FastQValidator

The released version of fastQFile and FastQValidator code can be downloaded at: http://www.sph.umich.edu/csg/mktrost/fastQFile/

FastQFile Library for Reading and Validating FastQFiles

The software reads and validates fastq files in both compressed and uncompressed formats.

The library is found in pipeline/bam, and is called libfqf.a.

This library is dependent on two other libraries, libcsg/libcsg.a and samtools/libbam.a so be sure to include them all in the proper order:

<path to base pipeline directory>/fastQFile/libfqf.a <path to base pipeline directory>/libcsg/libcsg.a <path to base pipeline directory>/thirdParty/samtools/libbam.a

See C++ Library: libfqf Change Log for a list of the most recent updates to the development version of the library.

Classes in the FastQFile Library

Class Name Description
FastQFile Class used for reading/validating a fastq file.
BaseCount Wrapper around an array that has one index per base and an extra index for a total count of all bases. This class is used to keep a count of the number of times each index has occurred. It can print a percentage of the occurrence of each base against the total number of bases.
BaseComposition Class that tracks the composition of base by read location.


Library Output

When a sequence is read, error messages for the first maxReportedErrors are output for failed Validation Criteria. For Example:

ERROR on Line 25: The sequence identifier line was too short.
ERROR on Line 29: First line of a sequence does not begin wtih @
ERROR on Line 33: No Sequence Identifier specified before the comment.


FastQValidator

The FastQ_Validator was built using the FastQFile class. More details on that program are at the supplied link.