FASTA

From Genome Analysis Wiki
Jump to: navigation, search

A simple text format for storing DNA sequences.

A FASTA file can store one or more DNA sequences. Each record in a FASTA file begins with one line header a > character (which must be the first character in the line), a sequence label and optional commentary. This header line is followed by a sequence that can wrap over multiple lines, as needed. Typically, each line has about 50 characters and it is recommended that every line in a sequence should have the same length -- to facilitate indexing. Nearly all programs that support FASTA format recognize A, C, T, G and N as valid characters in the sequence. Many also recognize IUPAC codes.

Example

 >sequenceName Comments about the sequence len=120
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGAC