From Genome Analysis Wiki
Jump to navigationJump to search
876 bytes added
, 15:38, 20 February 2014
Line 34: |
Line 34: |
| | | |
| You may also work with vcf.gz, just name the output as *.vcf.gz. But it will be slower with vt. | | You may also work with vcf.gz, just name the output as *.vcf.gz. But it will be slower with vt. |
| + | |
| + | ==Peek== |
| + | |
| + | You can see what you have in the file with: |
| + | |
| + | vt peek mills.genotypes.bcf |
| + | |
| + | You can also focus on a chromosome: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 |
| + | |
| + | Or with just passed variants: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 -f PASS |
| + | |
| + | Or with failed variants: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 -f ~PASS |
| + | |
| + | Or with just 1bp indels: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 -f "PASS&&DLEN==1" |
| + | |
| + | Or with just 1bp deletions: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 -f "PASS&&LEN==-1" |
| + | |
| + | Or with just biallelic 1bp indels: |
| + | |
| + | vt peek mills.genotypes.bcf -i 20 -f "PASS&&N_ALLELE==2&&LEN==1" |
| + | |
| + | Or with just biallelic 1bp indels that are somewhat rare: |
| + | |
| + | vt peek mills.sites.bcf -f "PASS&&N_ALLELE==2&&LEN==1&&INFO.AF<0.03" |
| + | |
| + | Or with just biallelic 1bp indels that are somewhat rare with sanity checking: |
| + | |
| + | vt peek mills.sites.bcf -f "PASS&&N_ALLELE==2&&LEN==1&&INFO.AC/INFO.AN<0.03" |
| | | |
| ==Normalization== | | ==Normalization== |
Line 46: |
Line 84: |
| number of duplicate variants found for some of the 1000 Genomes Trio High Coverage call sets. | | number of duplicate variants found for some of the 1000 Genomes Trio High Coverage call sets. |
| Although left alignment seems to be a trivial concept, it is easily overlooked and remain a common | | Although left alignment seems to be a trivial concept, it is easily overlooked and remain a common |
− | mistake. Another example is the Mills et al. data set which followed up with 10004 Indels for validation. | + | mistake. |
− | Out of 9996 passed variants, it was found that after normalization, only 8904 distinct Indels remain
| |
− | - about a loss of 11% of variant thought distinct.
| |
| | | |
| {| class="wikitable" | | {| class="wikitable" |
Line 121: |
Line 157: |
| | 7541 | | | 7541 |
| |} | | |} |
| + | |
| + | Another example is the Mills et al. data set which followed up with 10004 Indels for validation. |
| + | Out of 9996 passed variants, it was found that after normalization, only 8904 distinct Indels remain |
| + | - about a loss of 11% of variant thought distinct. |
| + | |
| + | |
| | | |
| ==Coding regions== | | ==Coding regions== |