Difference between revisions of "FASTA"

From Genome Analysis Wiki
Jump to navigationJump to search
 
Line 1: Line 1:
 
A simple text format for storing DNA sequences.
 
A simple text format for storing DNA sequences.
  
A FASTA file can store one or more DNA sequences. Each record in a FASTA file begins with a > character (which must be the first character in the line) a sequence label and optional commentary. This record is followed by a sequence that can wrap over multiple lines, as needed. Typically, each line has about 50 characters and it is recommended that every line in a sequence should have the same length -- to facilitate indexing.
+
A FASTA file can store one or more DNA sequences. Each record in a FASTA file begins with one line header a > character (which must be the first character in the line), a sequence label and optional commentary. This header line is followed by a sequence that can wrap over multiple lines, as needed. Typically, each line has about 50 characters and it is recommended that every line in a sequence should have the same length -- to facilitate indexing. Nearly all programs that support FASTA format recognize A, C, T, G and N as valid characters in the sequence. Many also recognize IUPAC codes.
  
 
== Example ==
 
== Example ==

Latest revision as of 19:39, 23 March 2010

A simple text format for storing DNA sequences.

A FASTA file can store one or more DNA sequences. Each record in a FASTA file begins with one line header a > character (which must be the first character in the line), a sequence label and optional commentary. This header line is followed by a sequence that can wrap over multiple lines, as needed. Typically, each line has about 50 characters and it is recommended that every line in a sequence should have the same length -- to facilitate indexing. Nearly all programs that support FASTA format recognize A, C, T, G and N as valid characters in the sequence. Many also recognize IUPAC codes.

Example

 >sequenceName Comments about the sequence len=120
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGACACTGACTGACACTGACTGACACTGACTGAC
 ACTGACTGACACTGACTGAC