Changes

From Genome Analysis Wiki
Jump to navigationJump to search
876 bytes added ,  15:38, 20 February 2014
Line 34: Line 34:     
You may also work with vcf.gz, just name the output as *.vcf.gz.  But it will be slower with vt.
 
You may also work with vcf.gz, just name the output as *.vcf.gz.  But it will be slower with vt.
 +
 +
==Peek==
 +
 +
You can see what you have in the file with:
 +
 
 +
  vt peek mills.genotypes.bcf
 +
 +
You can also focus on a chromosome:
 +
 +
  vt peek mills.genotypes.bcf -i 20
 +
 +
Or with just passed variants:
 +
 +
  vt peek mills.genotypes.bcf -i 20 -f PASS
 +
 +
Or with failed variants:
 +
 +
  vt peek mills.genotypes.bcf -i 20 -f ~PASS
 +
 +
Or with just 1bp indels:
 +
 +
  vt peek mills.genotypes.bcf -i 20 -f "PASS&&DLEN==1"
 +
 +
Or with just 1bp deletions:
 +
 +
  vt peek mills.genotypes.bcf -i 20 -f "PASS&&LEN==-1"
 +
 +
Or with just biallelic 1bp indels:
 +
 +
  vt peek mills.genotypes.bcf -i 20 -f "PASS&&N_ALLELE==2&&LEN==1"
 +
 +
Or with just biallelic 1bp indels that are somewhat rare:
 +
 +
  vt peek mills.sites.bcf -f "PASS&&N_ALLELE==2&&LEN==1&&INFO.AF<0.03"
 +
 +
Or with just biallelic 1bp indels that are somewhat rare with sanity checking:
 +
 +
  vt peek mills.sites.bcf -f "PASS&&N_ALLELE==2&&LEN==1&&INFO.AC/INFO.AN<0.03"
    
==Normalization==
 
==Normalization==
Line 46: Line 84:  
number of duplicate variants found for some of the 1000 Genomes Trio High Coverage call sets.
 
number of duplicate variants found for some of the 1000 Genomes Trio High Coverage call sets.
 
Although left alignment seems to be a trivial concept, it is easily overlooked and remain a common  
 
Although left alignment seems to be a trivial concept, it is easily overlooked and remain a common  
mistake.  Another example is the Mills et al. data set which followed up with 10004 Indels for validation.
+
mistake.   
Out of 9996 passed variants, it was found that after normalization, only 8904 distinct Indels remain
  −
- about a loss of 11% of variant thought distinct.
      
{| class="wikitable"
 
{| class="wikitable"
Line 121: Line 157:  
| 7541
 
| 7541
 
|}
 
|}
 +
 +
Another example is the Mills et al. data set which followed up with 10004 Indels for validation.
 +
Out of 9996 passed variants, it was found that after normalization, only 8904 distinct Indels remain
 +
- about a loss of 11% of variant thought distinct.
 +
 +
    
==Coding regions==
 
==Coding regions==
1,102

edits

Navigation menu