Sequence Analysis Practice 2011/03/09

From Genome Analysis Wiki
Revision as of 17:15, 9 March 2011 by Hmkang (talk | contribs)
Jump to navigationJump to search

Overview

Below lists a sequence of practice mapping fastq files to bam files, performing variant calling and variout quality checks.

Steps

0. SETTING UP ENVIRONMENTAL VARIABLES

setenv BIN /home/hyun/wed/bin
setenv IN /home/hyun/wed/input
setenv REF /home/hyun/wed/ref

setenv OUT ~/seq/wednesday/output 
mkdir --p ${OUT}

1. Understanding FASTQ format

zcat ${IN}/NA12878.exon.sample.read1.fastq.gz | head zcat ${IN}/NA12878.exon.sample.read2.fastq.gz | head zcat ${IN}/NA12878.exon.sample.unpaired.fastq.gz | head

press q to quit

2. Align using BWA

${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read1.fastq.gz > ${OUT}/NA12878.exon.sample.read1.fastq.gz.sai

${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read2.fastq.gz > ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai

${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.unpaired.fastq.gz > ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai
${BIN}/bwa samse ${REF}/human_g1k_v37_chr20.fa ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai ${IN}/NA12878.exon.sample.unpaired.fastq.gz | ${BIN}/samtools-hybrid view -uhS - | ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted

${BIN}/bwa sampe ${REF}/human_g1k_v37_chr20.fa  ${OUT}//NA12878.exon.sample.read1.fastq.gz.sai ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai  ${IN}/NA12878.exon.sample.read1.fastq.gz ${IN}/NA12878.exon.sample.read2.fastq.gz ${BIN}/samtools-hybrid view -uhS - ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.paired.bwa.sorted

3. Merge multiple BAMs into one

${BIN}/samtools-hybrid merge ${OUT}/NA12878.exon.sample.merged.bam ${OUT}/NA12878.exon.sample.paired.bwa.sorted.bam  ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted.bam


4. View SAM/BAM format

${BIN}/samtools-hybrid view -h ${OUT}/NA12878.exon.sample.merged.bam | head -5


5. Mark Deuplicate Reads

${BIN}/superDeDuper -i ${OUT}/NA12878.exon.sample.merged.bam \
 -o ${OUT}/NA12878.exon.sample.deduped.bam -v

6. Visualize alignment to reference genome

${BIN}/samtools-hybrid tview ${OUT}/NA12878.exon.sample.deduped.bam \
   ${REF}/human_g1k_v37_chr20.fa