Sequence Analysis Practice 2011/03/09
From Genome Analysis Wiki
Jump to navigationJump to searchOverview
Below lists a sequence of practice mapping fastq files to bam files, performing variant calling and variout quality checks.
Steps
0. SETTING UP ENVIRONMENTAL VARIABLES
setenv BIN /home/hyun/wed/bin setenv IN /home/hyun/wed/input setenv REF /home/hyun/wed/ref setenv OUT ~/seq/wednesday/output mkdir --p ${OUT}
1. Understanding FASTQ format
zcat ${IN}/NA12878.exon.sample.read1.fastq.gz | head zcat ${IN}/NA12878.exon.sample.read2.fastq.gz | head zcat ${IN}/NA12878.exon.sample.unpaited.fastq.gz | head
press q to quit
2. Align using BWA
${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa \ ${IN}/NA12878.exon.sample.read1.fastq.gz \ > ${OUT}/NA12878.exon.sample.read1.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa \ ${IN}/NA12878.exon.sample.read2.fastq.gz \ > ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa \ ${IN}/NA12878.exon.sample.unpaired.fastq.gz \ > ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai
${BIN}/bwa samse ${REF}/human_g1k_v37_chr20.fa \ ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai \ ${IN}/NA12878.exon.sample.unpaired.fastq.gz \ | ${BIN}/samtools-hybrid view -uhS - \ | ${BIN}/samtools-hybrid sort -m 10000000 \ - ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted ${BIN}/bwa sampe ${REF}/human_g1k_v37_chr20.fa \ ${OUT}//NA12878.exon.sample.read1.fastq.gz.sai \ ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai \ ${IN}/NA12878.exon.sample.read1.fastq.gz \ ${IN}/NA12878.exon.sample.read2.fastq.gz \ | ${BIN}/samtools-hybrid view -uhS - \ | ${BIN}/samtools-hybrid sort -m 10000000 \ - ${OUT}/NA12878.exon.sample.paired.bwa.sorted
3. Merge multiple BAMs into one
${BIN}/samtools-hybrid merge ${OUT}/NA12878.exon.sample.merged.bam \ ${OUT}/NA12878.exon.sample.paired.bwa.sorted.bam \ ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted.bam
4. View SAM/BAM format
${BIN}/samtools-hybrid view -h ${OUT}/NA12878.exon.sample.merged.bam | head -5
5. Mark Deuplicate Reads
${BIN}/superDeDuper -i ${OUT}/NA12878.exon.sample.merged.bam \ -o ${OUT}/NA12878.exon.sample.deduped.bam -v
6. Visualize alignment to reference genome
${BIN}/samtools-hybrid tview ${OUT}/NA12878.exon.sample.deduped.bam \ ${REF}/human_g1k_v37_chr20.fa