Difference between revisions of "Sequence Analysis Practice 2011/03/09"
From Genome Analysis Wiki
Jump to navigationJump to searchLine 52: | Line 52: | ||
* Type 'g', and 20:19989392 | * Type 'g', and 20:19989392 | ||
* TYPE 'g', and 20:20032998 | * TYPE 'g', and 20:20032998 | ||
+ | |||
+ | 7. Use QPLOT to assess the quality | ||
+ | |||
+ | ${BIN}/qplot --plot ${OUT}/NA12878.exon.sample.deduped.bam.qplot.pdf --stats ${OUT}/NA12878.exon.sample.deduped.bam.qplot.stats --reference ${REF}/human_g1k_v37_chr20.fa --dbsnp ${REF}/dbsnp.b130.ncbi37.chr20.tbl --gccontent ${REF}/ncbi37.chr20.gc ${OUT}/NA12878.exon.sample.deduped.bam |
Revision as of 17:56, 9 March 2011
Overview
Below lists a sequence of practice mapping fastq files to bam files, performing variant calling and variout quality checks.
Steps
Low-level Processing - Practical Slides (PDF)
0. SETTING UP ENVIRONMENTAL VARIABLES
setenv BIN /home/hyun/wed/bin setenv IN /home/hyun/wed/input setenv REF /home/hyun/wed/ref setenv OUT ~/seq/wednesday/output mkdir --p ${OUT}
1. Understanding FASTQ format
zcat ${IN}/NA12878.exon.sample.read1.fastq.gz | head zcat ${IN}/NA12878.exon.sample.read2.fastq.gz | head zcat ${IN}/NA12878.exon.sample.unpaired.fastq.gz | head
press q to quit
2. Align using BWA
${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read1.fastq.gz > ${OUT}/NA12878.exon.sample.read1.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.read2.fastq.gz > ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai ${BIN}/bwa aln -q 15 ${REF}/human_g1k_v37_chr20.fa ${IN}/NA12878.exon.sample.unpaired.fastq.gz > ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai
${BIN}/bwa samse ${REF}/human_g1k_v37_chr20.fa ${OUT}/NA12878.exon.sample.unpaired.fastq.gz.sai ${IN}/NA12878.exon.sample.unpaired.fastq.gz | ${BIN}/samtools-hybrid view -uhS - | ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted ${BIN}/bwa sampe ${REF}/human_g1k_v37_chr20.fa ${OUT}//NA12878.exon.sample.read1.fastq.gz.sai ${OUT}/NA12878.exon.sample.read2.fastq.gz.sai ${IN}/NA12878.exon.sample.read1.fastq.gz ${IN}/NA12878.exon.sample.read2.fastq.gz | ${BIN}/samtools-hybrid view -uhS - | ${BIN}/samtools-hybrid sort -m 10000000 - ${OUT}/NA12878.exon.sample.paired.bwa.sorted
3. Merge multiple BAMs into one
${BIN}/samtools-hybrid merge ${OUT}/NA12878.exon.sample.merged.bam ${OUT}/NA12878.exon.sample.paired.bwa.sorted.bam ${OUT}/NA12878.exon.sample.unpaired.bwa.sorted.bam
4. View SAM/BAM format
${BIN}/samtools-hybrid view -h ${OUT}/NA12878.exon.sample.merged.bam | less
5. Mark Duplicate Reads
${BIN}/superDeDuper -i ${OUT}/NA12878.exon.sample.merged.bam -o ${OUT}/NA12878.exon.sample.deduped.bam -v
6. Visualize alignment to reference genome
${BIN}/samtools-hybrid index ${OUT}/NA12878.exon.sample.deduped.bam ${BIN}/samtools-hybrid tview ${OUT}/NA12878.exon.sample.deduped.bam ${REF}/human_g1k_v37_chr20.fa
- Type 'g', and 20:19989392
- TYPE 'g', and 20:20032998
7. Use QPLOT to assess the quality
${BIN}/qplot --plot ${OUT}/NA12878.exon.sample.deduped.bam.qplot.pdf --stats ${OUT}/NA12878.exon.sample.deduped.bam.qplot.stats --reference ${REF}/human_g1k_v37_chr20.fa --dbsnp ${REF}/dbsnp.b130.ncbi37.chr20.tbl --gccontent ${REF}/ncbi37.chr20.gc ${OUT}/NA12878.exon.sample.deduped.bam