Changes

TrioCaller (view source)

Revision as of 12:13, 3 February 2012

172 bytes removed , 12:13, 3 February 2012

Line 37: Line 37:

Here, we will simply use BWA to find the most likely sequence location for each read using the <code>bwa aln</code> command. This command requires two parameters, one corresponding to the reference genome, the other corresponding to a fastq file containing reads to be mapped.

−

bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/~~NA20589~~.fastq~~.gz~~ > bwa.sai/~~NA20589~~.sai

+

bin/bwa aln -q 15 ref/human_g1k_v37_chr20.fa fastq/SAMPLE1.fastq > bwa.sai/SAMPLE1.sai

−

The file ~~NA20589~~.fastq~~.gz~~ contains DNA sequence reads for sample ~~NA20589. To conserve disk space, the file has been compressed with gzip but, since fastq is a simple text format, you can easily view the contents of the file using a command like:~~

+

The file SAMPLE1.fastq contains DNA sequence reads for sample SAMPLE1.

−

~~zcat NA20589.fastq~~.~~gz | more~~

A fastq file consists of a series of multi-line records. Each record starts with a read name, followed by a DNA sequencing, a separator line, and a set of per base quality scores. Base quality scores estimate the probability of error at each sequenced base (a base quality of 10 denotes an error probability of 10%, base quality 20 denotes 1% error probability and base quality 30 denotes 0.1% error probability). These error probabilities are each encoded in a single character (for compactness) and can be decoded using an [http://www.google.com/search?q=ascii+table] - you should look up the ascii code for each base and subtract 33 to get base quality. By inspecting the FastQ file you should be able to learn about the length of reads being mapped and their base qualities (is base quality typically higher at the start or end of each read).

Line 49: Line 47:

The .sai alignment format is specific to BWA, so the first thing to do is to convert the alignment to a more standard format that will be compatible with downstream analysis tools. We can do this with a combination of the <code>bwa samse</code> command and <code>samtools view</code> and <code>samtoosl sort</code> commands.

−

bwa samse ref/human_g1k_v37_chr20.fa bwa.sai/~~NA20589~~.sai fastq/~~NA20589~~.fastq~~.gz~~ | \

+

bin/bwa samse -r "@RG\tID:ILLUMINA\tSM:SAMPLE1" ref/human_g1k_v37_chr20.fa bwa.sai/SAMPLE1.sai fastq/SAMPLE1.fastq | \

−

samtools view -uhS - | samtools sort -m 2000000000 - bams/~~NA20589~~

+

samtools view -uhS - | samtools sort -m 2000000000 - bams/SAMPLE1

The result BAM file uses a compact binary format to represent the

Line 56: Line 54:

of the file using the <code>samtools view</code> command, like so:

−

samtools view bams/~~NA20589~~.bam | more

+

samtools view bams/SAMPLE.bam | more

The text representation of the alignemt produced by <code>samtools view</code> describes

Line 80: Line 78:

genome location. We do this with the <code>samtools index</code> command, like so:

−

samtools index bams/~~NA20589~~.bam

+

samtools index bams/SAMPLE1.bam

=== Browsing Alignment Results ===

Weich

533

edits

Changes

TrioCaller (view source)

Revision as of 12:13, 3 February 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools