Sequence Analysis Practice 2011/03/09
From Genome Analysis Wiki
Jump to navigationJump to searchOverview
Below lists a sequence of practice mapping fastq files to bam files, performing variant calling and variout quality checks.
Steps
0. SETTING UP ENVIRONMENTAL VARIABLES
setenv BIN /home/hyun/wed/bin setenv IN /home/hyun/wed/input setenv REF /home/hyun/wed/ref setenv OUT ~/seq/wednesday/output mkdir --p ${OUT}
1. Understanding FASTQ format
zcat ${IN}/NA12878.exon.sample.read1.fastq.gz | head zcat ${IN}/NA12878.exon.sample.read2.fastq.gz | head zcat ${IN}/NA12878.exon.sample.unpaired.fastq.gz | head
press q to quit
2. Align using BWA
${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read1.fastq.gz > ${OUT}/NA12878.exon.sample.read1.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read2.fastq.gz > ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.unpaired.fastq.gz > ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai
${BIN}/bwa samse ${REF}/human_g1k_v37_chr20.fa ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai ${IN}/NA12878.exon.sample.unpaired.fastq.gz | ${BIN}/samtools-hybrid view -uhS - | ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted ${BIN}/bwa sampe ${REF}/human_g1k_v37_chr20.fa ${OUT}//NA12878.exon.sample.read1.fastq.gz.sai ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai ${IN}/NA12878.exon.sample.read1.fastq.gz ${IN}/NA12878.exon.sample.read2.fastq.gz | ${BIN}/samtools-hybrid view -uhS - | ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.paired.bwa.sorted
3. Merge multiple BAMs into one
${BIN}/samtools-hybrid merge ${OUT}/NA12878.exon.sample.merged.bam ${OUT}/NA12878.exon.sample.paired.bwa.sorted.bam ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted.bam
4. View SAM/BAM format
${BIN}/samtools-hybrid view -h ${OUT}/NA12878.exon.sample.merged.bam | less
5. Mark Duplicate Reads
${BIN}/superDeDuper -i ${OUT}/NA12878.exon.sample.merged.bam -o ${OUT}/NA12878.exon.sample.deduped.bam -v
6. Visualize alignment to reference genome
${BIN}/samtools-hybrid index ${OUT}/NA12878.exon.sample.deduped.bam ${BIN}/samtools-hybrid tview ${OUT}/NA12878.exon.sample.deduped.bam ${REF}/human_g1k_v37_chr20.fa
- Type 'g', and 20:19989392
- TYPE 'g', and 20:20032998