Changes

From Genome Analysis Wiki
Jump to navigationJump to search
411 bytes added ,  14:33, 17 November 2010
no edit summary
Line 6: Line 6:  
This command line tool can be downloaded as part of the library: http://genome.sph.umich.edu/wiki/Software#Download
 
This command line tool can be downloaded as part of the library: http://genome.sph.umich.edu/wiki/Software#Download
   −
Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory.
+
Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory - this can be disabled by specifying the --disableSeqIDCheck option
    
== Valid FastQ File Requirements  ==
 
== Valid FastQ File Requirements  ==
Line 33: Line 33:  
                               overwrites the printableErrors option.
 
                               overwrites the printableErrors option.
 
         --baseComposition    : Print the Base Composition Statistics.
 
         --baseComposition    : Print the Base Composition Statistics.
 +
--disableSeqIDCheck  : Disable the unique sequence identifier check.
 +
                      Use this option to save memory since the sequence id
 +
                      check uses a lot of memory.
 +
                      Does not affect the printing of Base Composition Statistics.
 
         --quiet              : Suppresses the display of errors and summary statistics.
 
         --quiet              : Suppresses the display of errors and summary statistics.
 
                               Does not affect the printing of Base Composition Statistics.
 
                               Does not affect the printing of Base Composition Statistics.
Line 42: Line 46:     
=== Usage ===
 
=== Usage ===
        ./fastQValidator --file <fileName> [--minReadLen <minReadLen>] [--maxErrors <numErrors>] [--printableErrors <printableErrors>|--ignoreErrors] [--baseSpace|--colorSpace|--auto] [--baseComposition] [--quiet]
+
./fastQValidator --file <fileName> [--minReadLen <minReadLen>] [--maxErrors <numErrors>] [--printableErrors <printableErrors>|--ignoreErrors] [--baseComposition] [--disableSeqIDCheck] [--quiet] [--baseSpace|--colorSpace|--auto] [--params]
    
=== Examples ===
 
=== Examples ===
Line 56: Line 60:     
== FastQ Validator Output ==
 
== FastQ Validator Output ==
When running the fastQValidator Executable, the output starts with a summary of the parameters:
+
When running the fastQValidator Executable, if the --params option is specified, the output starts with a summary of the parameters:
   −
  The following parameters are in effect:
+
  The following parameters are available.  Ones with "[]" are in effect:
    
  Input Parameters
 
  Input Parameters
--file [testFile.txt], --baseComposition [ON], --quiet, --minReadLen [10],
+
  --file [../fastqValidator/test/testFile.txt], --baseComposition,
 +
                --disableSeqIDCheck, --quiet, --params [ON], --minReadLen [10],
 
                 --maxErrors [-1]
 
                 --maxErrors [-1]
   Space Type : --baseSpace [ON], --colorSpace, --auto
+
   Space Type : --baseSpace, --colorSpace, --auto [ON]
       Errors : --ignoreErrors, --printableErrors [100]
+
       Errors : --ignoreErrors, --printableErrors [20]
    
The Validator Executable outputs error messages for invalid sequences based on [[C++ Class: FastQFile#Validation Criteria Used For Reading a Sequence|Validation Criteria]].  For Example:
 
The Validator Executable outputs error messages for invalid sequences based on [[C++ Class: FastQFile#Validation Criteria Used For Reading a Sequence|Validation Criteria]].  For Example:
Line 105: Line 110:  
There are a series of optional capabilities a FastQ Validator could implement. Among those:  
 
There are a series of optional capabilities a FastQ Validator could implement. Among those:  
   −
*Add option to disable the unique sequence name validation so it does not store all the sequence names.
   
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
 
*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).
 
*Report average read quality score.
 
*Report average read quality score.

Navigation menu