Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 77: Line 77:  
</div>
 
</div>
 
</div>
 
</div>
 +
 +
=== Did I find interesting variants? ===
 +
 +
The region we selected contains ''APOL1'' gene, which is known to play an important role in kidney diseases such as nephrotic syndrome. There are two risk alleles in protein-coding region of this gene.
 +
* '''g1''' allele is non-synonymous risk allele, <code>rs73885139</code> located at position <code>22:36661906</code> increases the risk of nephrotic syndrome by >2-folds. We looked at this variant in the morning.
 +
* '''g2''' allele is an 6-base deletion (in-frame indel), located the end of the APOL1 gene. It has smaller than '''g1''' but still large effect size (>1.5 fold). Let's see if we have found the variant.
 +
 +
First, look up the indels by navigating exome variant server.
 +
 +
Focusing on APOL1 gene,
 +
[[File:EvsAPOL1a.png|500px]]
 +
 +
You will be able to find the in-frame indel near the end of the gene
 +
[[File:EvsAPOL1a.png|700px]]
 +
 +
Let's see if we found the indel
 +
 +
$GC/bin/tabix $OUT/final/all.genotypes.vcf.gz 22:36662041 | head -1
 +
 +
Did you see a variant at the position?
    
=== Looking at the INDEL Variant Call File (VCF) ===
 
=== Looking at the INDEL Variant Call File (VCF) ===
We will use [[Vt]] to look at the INDEL VCF file.
+
We will use [[Vt]] or <code>tabix</code> to look at the INDEL VCF file.
    
==== Header ====
 
==== Header ====
 
First, let's look at the header:
 
First, let's look at the header:
  ${GC}/bin/vt view -H ${OUT}/final/all.genotypes.vcf.gz
+
$GC/bin/tabix -H $OUT/final/all.genotypes.vcf.gz
    
The header is as follows:
 
The header is as follows:
Line 112: Line 132:  
   ##FILTER=<ID=PASS,Description="Temporary pass">
 
   ##FILTER=<ID=PASS,Description="Temporary pass">
 
   ##FILTER=<ID=overlap,Description="Overlapping variant">
 
   ##FILTER=<ID=overlap,Description="Overlapping variant">
 +
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00551 HG00553 HG00554 HG00637 HG00638 HG00640 HG00641 HG00734 HG00736 HG00737 HG00739 HG00740 HG01047 HG01049 HG01051 HG01052 HG01054 HG01055 HG01060 HG01061 HG01066 HG01067 HG01069 HG01070 HG01072 HG01073 HG01075 HG01079 HG01080 HG01082 HG01083 HG01094 HG01097 HG01098 HG01101 HG01102 HG01107 HG01108 HG01110 HG01111 HG01167 HG01168 HG01170 HG01171 HG01173 HG01174 HG01176 HG01177 HG01182 HG01183 HG01187 HG01188 HG01190 HG01191 HG01197 HG01198 HG01204 HG01205 HG01241 HG01242 HG01247 HG01248
 +
 +
Using [[Vt]], we can see the same output
 +
 +
${GC}/bin/vt view -H ${OUT}/final/all.genotypes.vcf.gz
    
====Records====
 
====Records====
   −
To view a specific region of records:
+
To view a specific region of records (such as APOL1 g2 allele)
   −
  ${GC}/bin/vt view -i 22:36990878-36990879 ${OUT}/final/all.genotypes.vcf.gz
+
$GC/bin/tabix $OUT/final/all.genotypes.vcf.gz 22:36662041-36662041
* -i specifies the region
  −
* You can leave it out and look at all the records
      
The columns are CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, Genotype fields denoted by the sample name.
 
The columns are CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, Genotype fields denoted by the sample name.
   −
  22 36990878 . GGT G 455 PASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;
+
22 36662041 . AATAATT A 756 PASS AC=2;AN=114;AF=0.0175439;GC=55,2,0;GN=57;
                                                                 GF=0.551724,0.344828,0.103448;NS=58;  
+
                                                                 GF=0.964912,0.0350877,0;NS=57;
                                                                 HWEAF=0.275797;HWEGF=0.52447,0.399466,0.0760642;
+
                                                                 HWEAF=0.019571;HWEGF=0.961242,0.038376,0.000383024;
                                                                 MLEAF=0.27366; MLEGF=0.494275,0.464129,0.0415952;
+
                                                                 MLEAF=0.0196187;MLEGF=0.960762,0.0392374,2.11537e-15;
                                                                 HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;
+
                                                                 HWE_LLR=-0.0222464;HWE_LPVAL=-0.182794;HWE_DF=1;
                                                                 FIC=-0.0718807;AB=0.6129
+
                                                                 FIC=-0.00372601;AB=0.384578
                                                        GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10
+
                                                          GT:PL:DP:AD:GQ 0/0:0,9,158:3:3,0,0:10 0/0:0,18,281:6:6,0,0:18
    
Here is a description of the record's fields.
 
Here is a description of the record's fields.
    
   22            : chromosome
 
   22            : chromosome
   36990878       : genome position
+
   36662041       : genome position
 
   .              : this is the ID field that is left blank.
 
   .              : this is the ID field that is left blank.
   GGT           : the reference sequence that is replaced by the alternative sequence below.
+
   AATAATT           : the reference sequence that is replaced by the alternative sequence below.
   G             : so this is basically a deletion of GT
+
   A             : so this is basically a deletion of GT
   455           : QUAL field denoting validity of this variant, higher the better.
+
   756           : QUAL field denoting validity of this variant, higher the better.
 
   PASS          : a passed variant.
 
   PASS          : a passed variant.
 
   INFO          : fields containing information about the variant.
 
   INFO          : fields containing information about the variant.
 
   FORMAT        : format field labels for the genotype columns.
 
   FORMAT        : format field labels for the genotype columns.
   0/0:0,9,108:9:3,0,6:10 :  genotype information.
+
   0/0:0,9,158:3:3,0,0:10 :  genotype information.
 +
 
 +
You can obtain the same output by using the following command
 +
  ${GC}/bin/vt view -i 22:36990878-36990879 ${OUT}/final/all.genotypes.vcf.gz
 +
* -i specifies the region
 +
* You can leave it out and look at all the records
    
=====INFO field=====
 
=====INFO field=====

Navigation menu