FASTA

From Genome Analysis Wiki
Revision as of 19:39, 23 March 2010 by Goncalo (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

A simple text format for storing DNA sequences.

A FASTA file can store one or more DNA sequences. Each record in a FASTA file begins with one line header a > character (which must be the first character in the line), a sequence label and optional commentary. This header line is followed by a sequence that can wrap over multiple lines, as needed. Typically, each line has about 50 characters and it is recommended that every line in a sequence should have the same length -- to facilitate indexing. Nearly all programs that support FASTA format recognize A, C, T, G and N as valid characters in the sequence. Many also recognize IUPAC codes.

Example

 >sequenceName Comments about the sequence len=120
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGAC