GotCloud: Alignment Pipeline

From Genome Analysis Wiki
Revision as of 16:11, 5 November 2012 by Mktrost (talk | contribs)
Jump to navigationJump to search

Back to the beginning [1]

The Mapping Pipeline takes FASTQ files and generates recalibrated BAM files from them.

Input Data:

  • Raw Sequence (FASTQ) files
  • Sequence Index file
  • Reference files
  • (Optional) Configuration file to override default options

Raw Sequence (FASTQ) files

These are the FASTQ files that need to be mapped to BAM files.


Sequence Index File

This file specifies the FASTQ files that need to be processed and the Read Group information for them.

The Sequence Index is a tab delimited file that starts with a header line. The columns may be in any order.

Following the header line, there is one line per single-end read and one line per paired-end read (only 1 line per pair).

Required Column Names:

  • MERGE_NAME - base name for the resulting BAM file for the sample (used to group multiple fastqs or fastq pairs into a single BAM)
  • FASTQ1 - name of the fastq or the first in the pair if paired-end. (Only 1 line per pair)

Optional Column Names:

  • FASTQ2 - name of the 2nd fastq in paired-end reads. Specify '.' if the column exists, but this line is single-ended.
  • RGID - Read Group ID for this entry
  • SAMPLE - Sample Name for this entry
  • LIBRARY - Library for this entry
  • CENTER - Center Name for this entry
  • PLATFORM - Platform for this entry

The RGID, SAMPLE, LIBRARY, CENTER, and PLATFORM are used to populate the Read Group information for this entry. These fields are optional. Either leave the column header out of the file or specify '.' if the column header exists, but the data is N/A. As long as the RGID field is specified non-N/A fields are added to the BAM file.

Reference Files

The following Reference Files are required:

  • BWA Indexed Reference Files
    • Configuration Name: BWA_REF
    • Specify the basename for the BWA Indexed Reference File
    • Need .bwt, .ann, amb,

Configuration File

Running the Mapping Pipeline

cd ~/myseq
 /usr/local/biopipe/bin/gen_biopipeline.pl -out aligner -index ???