Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,495 bytes added ,  11:00, 2 February 2017
Line 1: Line 1: −
[[Category:C++|Software|FastQValidator]]
+
[[Category:C++]]
 +
[[Category:Software]]
 +
[[Category:LibStatGen FASTQ]]
 
= fastQValidator Overview =
 
= fastQValidator Overview =
    
The fastQValidator validates the format of fastq files.
 
The fastQValidator validates the format of fastq files.
   −
The initial version of a FASTQ Validator is complete. It was built using the [[FastQFile]] class which is part of the [[C++ Library: libStatGen|libStatGen]] library.
+
The initial version of a FASTQ Validator is complete. It was built using [[LibStatGen: FASTQ]] which is part of the [[C++ Library: libStatGen|libStatGen]] library.
    
Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory - this can be disabled by specifying the --disableSeqIDCheck option
 
Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory - this can be disabled by specifying the --disableSeqIDCheck option
Line 10: Line 12:  
== Where to find it ==
 
== Where to find it ==
 
This command line tool can be obtained via:
 
This command line tool can be obtained via:
* Release Download '''coming soon'''
+
* Release Download
 
* github: https://github.com/statgen/fastQValidator
 
* github: https://github.com/statgen/fastQValidator
 
** Current development version
 
** Current development version
Line 17: Line 19:     
=== Releases ===
 
=== Releases ===
Release downloads are '''Coming Soon'''.
+
If you prefer to run the last official release rather than the latest development version, you can download that here.
    +
There are two versions of the release, one that include libStatGen and one that does not.  If you already have libStatGen installed and want to use your own copy, use the version that does not include libStatGen.
 +
 +
=== Full Release (includes libStatGen) ===
 +
 +
To install an official release, unpack the downloaded file (tar xvf), cd into the fastQValidator_x.x.x directory and type make all.
 +
 +
 +
[[Media:fastQValidatorLibStatGen.0.1.1a.tgz|fastQValidatorLibStatGen.0.1.1a.tgz‎]] - Released 11/13/2012
 +
 +
'''fastQValidatorLibStatGen.0.1.1a Release Notes'''
 +
* Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.5]] (update for this version)
 +
* Contains: [[#Release of just fastQValidator (does not include libStatGen)|fastQValidator version 0.1.1]] (same as full release 0.1.1)
 +
 +
'''Older Releases'''
 +
* [[Media:fastQValidatorLibStatGen.0.1.1.tgz|fastQValidatorLibStatGen.0.1.1.tgz‎]] - Released 10/19/2012
 +
** Contains: [[LibStatGen Download#Official Releases|libStatGen version 1.0.3]]
 +
** Contains: [[#Release of just fastQValidator (does not include libStatGen)|fastQValidator version 0.1.1]]
 +
*** Validates a fastq file with options to print additional information.
 +
 +
=== Release of just fastQValidator (does not include libStatGen) ===
 +
 +
To install an official release, unpack the downloaded file (tar xvf), cd into the fastQValidator_x.x.x directory and type make all.
 +
 +
[[Media:fastQValidator.0.1.1.tgz|‎fastQValidator.0.1.1.tgz]] - Released 10/19/2012
 +
 +
'''fastQValidator.0.1.1 Release Notes'''
 +
* Validates a fastq file with options to print additional information.
 +
* Adds option to output average qualities
 +
 +
 +
'''Older Releases'''
 +
*[[Media:fastQValidator.0.0.1.tgz|fastQValidator.0.0.1.tgz]]
 +
** Validates a fastq file with options to print additional information.
    
=== Using github ===
 
=== Using github ===
Line 53: Line 88:  
After obtaining the fastQValidator repository (either by download or from github), compile the code using <code>make all</code>.  This creates the executable, <code>fastQValidator</code>, in the <code>fastQValidator/bin/</code> directory, the debug executable in the <code>fastQValidator/bin/debug/</code> directory, and the profiling executable in the <code>fastQValidator/bin/profile/</code> directory.
 
After obtaining the fastQValidator repository (either by download or from github), compile the code using <code>make all</code>.  This creates the executable, <code>fastQValidator</code>, in the <code>fastQValidator/bin/</code> directory, the debug executable in the <code>fastQValidator/bin/debug/</code> directory, and the profiling executable in the <code>fastQValidator/bin/profile/</code> directory.
   −
 
+
'''NOTE:''' you should install the [[C++ Library: libStatGen|libStatGen]] package (or just check it out from Git) in order to compile this.
    
== Valid FastQ File Requirements  ==
 
== Valid FastQ File Requirements  ==
Line 80: Line 115:  
                               overwrites the printableErrors option.
 
                               overwrites the printableErrors option.
 
         --baseComposition    : Print the Base Composition Statistics.
 
         --baseComposition    : Print the Base Composition Statistics.
        --disableSeqIDCheck  : Disable the unique sequence identifier check.
+
--avgQual            : Print the average phred quality per cycle & overall average quality.
 +
--disableSeqIDCheck  : Disable the unique sequence identifier check.
 
                               Use this option to save memory since the sequence id
 
                               Use this option to save memory since the sequence id
 
                               check uses a lot of memory.
 
                               check uses a lot of memory.
Line 103: Line 139:  
* 0 - the fastq file is valid.
 
* 0 - the fastq file is valid.
 
* < 0 - invalid options specified.
 
* < 0 - invalid options specified.
* > 0 - fastq file did not validate succesfully.  One of the [[C++ Class: FastQFile#Public Class Enums|FastQStatus]] failure values is returned
+
* > 0 - fastq file did not validate succesfully.  One of the [http://csg.sph.umich.edu//mktrost/doxygen/current/classFastQStatus.html FastQStatus] failure values is returned
 
      
== FastQ Validator Output ==
 
== FastQ Validator Output ==
Line 130: Line 165:  
         1    5.00  95.00    0.00    0.00    0.00 20
 
         1    5.00  95.00    0.00    0.00    0.00 20
 
         2    5.00    0.00    5.00  90.00    0.00 20
 
         2    5.00    0.00    5.00  90.00    0.00 20
 +
 +
Phred Quality by Index are printed if --avgQual is set to ON in a version after May 29, 2012.  Only valid qualities are included in these averages. For Example:
 +
<pre>
 +
Average Phred Quality by Read Index (starts at 0):
 +
Read Index Average Quality
 +
0 44.10
 +
1 45.55
 +
2 51.11
 +
3 47.68
 +
4 47.37
 +
 +
Overall Average Phred Quality = 50.40
 +
</pre>
      Line 151: Line 199:  
*Prints a summary of the total number of errors.
 
*Prints a summary of the total number of errors.
 
*Prints the total number of lines processed as well as the total number of sequences processed.  
 
*Prints the total number of lines processed as well as the total number of sequences processed.  
 +
* (May 29, 2012) Average Phred Quality can be reported by cycle & overall.
      Line 158: Line 207:     
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
 
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
*Report average read quality score.
   
*AutoDetect 64/33 illumina/standard quality scores.
 
*AutoDetect 64/33 illumina/standard quality scores.
      
== Discussion ==
 
== Discussion ==
96

edits

Navigation menu