Tutorial: Low Pass Sequence Analysis Answers

From Genome Analysis Wiki
Jump to navigationJump to search

Low Pass Sequence Analysis Answers

  • Q1: What is the base quality of the fifth nucleotide of the third read in the file HG00111.lowcoverage.chr20.smallregion_1.fastq.gz?

The third read in the file is:

@ERR020230.76497044/1
CTGTACTACTAAAGTAAAACTAGTTTTCCAATAGTTTGTTGCAGGATAAGCAGTTTTACTTTTGTTGACAATATGTGTATGAATTTACTTC
+
DFEEGFKIFKIKLKIJLMMIMKMJKKKIKLMKKLKLLLKKLKLMMJLLJMKMMJLKLLJNLLLIKLJMILKLJKLKKKKKMMMJJJIFJFA

The quality string is the 4th line of each read, then the base quality of the first nucleotide is encoded with the character "G". Its decimal ASCII code is 71, so the base quality of this nucleotide is 38 (71-33)

  • Q2: Which is the mean depth of the sample HG00108? And the mapping rate?

The mean depth is 4.60X and mapping rate is 99.19%. However, keep in mind that these statistics are evaluated only in the 100kb included in our example dataset.

  • Q3: What is the depth of position 33538999 for the sample HG00111? What would be the most likely genotype looking at the reads? (You can answer this question by using tview or mpileup.)

The depth of the sample HG00111 at the position 3353899 is 9, there are 5 G's and 4 C's piling up at this position. Just looking at the nucleotide, the most likely genotype would be C/G