Changes

From Genome Analysis Wiki
Jump to: navigation, search

FastQValidator

60 bytes removed, 14:16, 22 February 2010
no edit summary
A valid fastQ file meets the validation criteria specified in [[FastQFile#Validation Criteria Used For Reading a Sequence|FastQ File Validation]].
 
 
== How to Use the fastQValidator Executable ==
'''Required Parameters:'''
--file : FastQ filename with path to be prorcessed.
 
'''Optional Parameters:'''
--minReadLen : Minimum allowed read length (Defaults to 10).
--maxReportedErrors : Maximum number of errors to display before suppressing them (Defaults to 20).
--ignoreAllErrors : Ignore all errors (same as --maxReportedErrors 0), overwrites the maxReportedErrors option.
 
'''Optional Space Options for Raw Sequence (Last one specified is used):'''
--autoDetect : Determine baseSpace/colorSpace from the Raw Sequence in the file (Default).
--baseSpace : ACTGN only
--colorSpace : 0123. only
 
'''Usage:'''
./fastQValidator --file <fileName> [--minReadLen <minReadLen>] [--maxReportedErrors <maxReprotedErrors>|--ignoreAllErrors] [--baseSpace|--colorSpace|--autoDetect]
 
'''Examples:'''
../fastQValidator --file testFile.txt
../fastQValidator --file testFile.txt --minReadLen 10 --baseSpace --maxReportedErrors 100
./fastQValidator --file test/testFile.txt --minReadLen 10 --colorSpace --ignoreAllErrors
 
 
== FastQ Validator Output ==
When running the fastQValidator Executable, the output starts with a summary of the parameters:
 
The following parameters are in effect:
 
Input Parameters
--file [testFile.txt], --minReadLen [10]
Space Type : --baseSpace [ON], --colorSpace, --autoDetect
Errors : --ignoreAllErrors, --maxReportedErrors [100]
 
The Validator Executable outputs error messages for invalid sequences based on [[FastQFile#Validation Criteria Used For Reading a Sequence|Validation Criteria]].
For Example:
ERROR on Line 25: The sequence identifier line was too short.
ERROR on Line 29: First line of a sequence does not begin wtih @
ERROR on Line 33: No Sequence Identifier specified before the comment.
 
Base Composition Percentages by Index:
 
Base Composition Statistics:
Read Index %A %C %G %T %N Total Reads At Index
0 100.00 0.00 0.00 0.00 0.00 20
1 5.00 95.00 0.00 0.00 0.00 20
2 5.00 0.00 5.00 90.00 0.00 20
 
 
Summary of the number of lines, sequences, and errors:
Finished processing testFile.txt with 92 lines containing 20 sequences.
There were a total of 17 errors.
 
== Additional Features ==
* It may be useful to report 2 types of information to the user: ERROR (critical failure) and WARNING (tolerable errors).
 
 
 
== How to Use the fastQValidator Executable ==
'''Required Parameters:'''
-f : FastQ filename with path to be prorcessed.
 
'''Optional Parameters:'''
-l : Minimum allowed read length (Defaults to 10).
-e : Maximum number of errors to display before suppressing them(Defaults to 20).
-b : Raw sequence type: "A"/"C"/"G"/"T"/"N" - Bases only;
"0"/"1"/"2"/"3"/"." - Color space only;
"" - Base Decision on the first Raw Sequence Character (Default)
All other characters - Bases & Color space
 
'''Testing only Parameters:'''
-t : If "ReadOnly" is specified, the fastq will be read but not processed. This may be used for determining read time.
'''Usage:'''
./fastQValidator -f <fileName> -l <minReadLen> -e <maxReprotedErrors> -b <rawSeqType>
 
'''Examples:'''
../fastQValidator -f testFile.txt
../fastQValidator -f testFile.txt -l 10 -b A -e 100
./fastQValidator -f test/testFile.txt -l 10 -b Z -e 100
time ./fastQValidator -f test/testFile.txt -t ReadOnly
 
 
== FastQ Validator Output ==
When running the fastQValidator Executable, the output starts with a summary of the parameters:
The following parameters are in effect:
FastQ File Name : testFile.txt (-fname)
Min Read Length : 10 (-l9999)
Max Reported Errors : 100 (-e9999)
BaseType : A (-bname)
TestMode : (-tname)
 
Both the Executable and the Library outputs the following:
*Error messages for the first Configurable number of errors.:
ERROR on Line 25: The sequence identifier line was too short.
ERROR on Line 29: First line of a sequence does not begin wtih @
ERROR on Line 33: No Sequence Identifier specified before the comment.
*Base Composition Percentages by Index:
 
Base Composition Statistics:
Read Index %A %C %G %T %N Total Reads At Index
0 100.00 0.00 0.00 0.00 0.00 20
1 5.00 95.00 0.00 0.00 0.00 20
2 5.00 0.00 5.00 90.00 0.00 20
*Summary of the number of lines, sequences, and errors:
Finished processing testFile.txt with 92 lines containing 20 sequences.
There were a total of 17 errors.

Navigation menu