Changes

FastQValidator (view source)

Revision as of 14:33, 17 November 2010

411 bytes added , 14:33, 17 November 2010

no edit summary

Line 6: Line 6:

This command line tool can be downloaded as part of the library: http://genome.sph.umich.edu/wiki/Software#Download

−

Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory.

+

Note: Since the FastQValidator checks for unique sequence names, it may use a large amount of memory - this can be disabled by specifying the --disableSeqIDCheck option

== Valid FastQ File Requirements ==

Line 33: Line 33:

overwrites the printableErrors option.

--baseComposition : Print the Base Composition Statistics.

+

--disableSeqIDCheck : Disable the unique sequence identifier check.

+

Use this option to save memory since the sequence id

+

check uses a lot of memory.

+

Does not affect the printing of Base Composition Statistics.

--quiet : Suppresses the display of errors and summary statistics.

Does not affect the printing of Base Composition Statistics.

Line 42: Line 46:

=== Usage ===

−

./fastQValidator --file <fileName> [--minReadLen <minReadLen>] [--maxErrors <numErrors>] [--printableErrors <printableErrors>|--ignoreErrors] [--baseSpace|--colorSpace|--auto] [--~~baseComposition] [--quiet~~]

+

./fastQValidator --file <fileName> [--minReadLen <minReadLen>] [--maxErrors <numErrors>] [--printableErrors <printableErrors>|--ignoreErrors] [--baseComposition] [--disableSeqIDCheck] [--quiet] [--baseSpace|--colorSpace|--auto] [--params]

=== Examples ===

Line 56: Line 60:

== FastQ Validator Output ==

−

When running the fastQValidator Executable, the output starts with a summary of the parameters:

+

When running the fastQValidator Executable, if the --params option is specified, the output starts with a summary of the parameters:

−

The following parameters are in effect:

+

The following parameters are available. Ones with "[]" are in effect:

Input Parameters

−

--file [testFile.txt], --baseComposition ~~[ON]~~, --quiet, --minReadLen [10],

+

--file [../fastqValidator/test/testFile.txt], --baseComposition,

+

--disableSeqIDCheck, --quiet, --params [ON], --minReadLen [10],

--maxErrors [-1]

−

Space Type : --baseSpace ~~[ON]~~, --colorSpace, --auto

+

Space Type : --baseSpace, --colorSpace, --auto [ON]

−

Errors : --ignoreErrors, --printableErrors [~~100~~]

+

Errors : --ignoreErrors, --printableErrors [20]

The Validator Executable outputs error messages for invalid sequences based on [[C++ Class: FastQFile#Validation Criteria Used For Reading a Sequence|Validation Criteria]]. For Example:

Line 105: Line 110:

There are a series of optional capabilities a FastQ Validator could implement. Among those:

−

*Add option to disable the unique sequence name validation so it does not store all the sequence names.

*To reduce memory usage, implement a two-pass algorithm that stores only a key for each sequence name (rather than complete sequence names) in memory (suggest a pair of options -1 -> one pass, high memory use, -2 -> two pass lower memory use, default is -1).

*Report average read quality score.

Mktrost

Administrators

3,045

edits

Changes

FastQValidator (view source)

Revision as of 14:33, 17 November 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools