M3VCF Files
Introduction
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
This wiki page is designed to give users a detailed explanation on the structure of M3VCF files.
M3VCF Files
M3VCF
files stand for " Minimac3 VCF" files and are files that can store data on large reference panels in a compact way, thereby saving on memory required. These files are created on the basis of the same idea as this method of imputation. Since, in small genomic segments, the number of unique haplotypes is much lesser than the total number of haplotypes, we could just store the unique representatives instead of all the haplotypes and thus save on memory required. M3VCF
files are a very convenient way to save large reference panels as compared to VCF files because:
- They require lesser space than VCF files. The compression ratio for a panel of 50K samples and 337K markers is ~1200x (unzipped) and ~4x (zipped).
- They are faster to read while importing data. Above mentioned reference panel was 20x faster when imported as
M3VCF
file (as compared to VCF file). - They are already stored in a way to attain optimal computational complexity while imputation.
M3VCF
files are formatted somewhat following the structure of a VCF files. An example is shown below. The first few lines are header lines and contain information pertaining to number of haplotypes, number of markers and number of genomic segments. Following these, we define each genomic segment (usually denoted by <BLOCK:*-*>
) followed by the markers contained in this genomic segment (denoted by their original marker IDs). In the example below, a reference panel of 6 samples (12 haplotypes) and 8 markers was reduced to two genomic segments (<BLOCK:0-5>
and <BLOCK:5-7>
). The first block is from marker 0 to 5 (with 6 variants) and the next one from 5 to 7 (with 3 variants). Note that two consecutive blocks must overlap at the common marker. The column under FORMAT
stores the number of markers in a segment (VARIANTS
) and the number of unique haplotypes in that segment (REPS
). The following columns represent the unique label for each sample in that block. The numbers represent (under the column of samples) the unique haplotype representative which it resembles in that genomic segment. The unique haplotypes are stored in the following rows in marker x sample format.
In the rows followed by the block identification, the details of the variants are stored (like in a usual VCF file) along with the unique haplotypes (under the FORMAT
column). For the <BLOCK:0-5>
, we have 4 unique haplotypes (given by the variable REPS
) which are the four sub-columns (of 0's and 1's) under the FORMAT
column. Similarly, the 2 unique haplotypes for <BLOCK:5-7>
are shown in the FORMAT
column for its three markers.
##fileformat=M3VCF ##version=1.1 ##compression=block ##n_blocks=2 ##n_haps=12 ##n_markers=8 ##<Note=This is NOT a VCF File and cannot be read by vcftools> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A1 A2 B1 B2 C1 C2 D1 D2 E1 E2 F1 F2 6 73924 <BLOCK:0-5> . . . . B1;VARIANTS=6;REPS=4 . 0 1 3 0 0 0 1 0 3 1 0 3 6 73924 chr6:73924:D AAGAG A . . B1.M1;R=7;A=5 0000 6 89919 chr6:89919 T G . . B1.M;R=4;A=3 0100 6 89921 chr6:89921 C T . . B1.M3;R=2;A=4 0000 6 89932 chr6:89932 A G . . B1.M4;R=1;A=3 0000 6 89949 chr6:89949 G A . . B1.M5;R=3;A=1 0010 6 100116 chr6:100116 C A . . B1.M6;R=2;A=1 0001 6 100116 <BLOCK:5-7> . . . . B2;VARIANTS=3;REPS=2 . 0 1 0 0 0 0 1 0 1 1 0 1 6 100116 chr6:100116 T A . . B1.M8;R=4;A=1 00 6 132285 chr6:132285 T A . . B1.M9;R=4;A=1 01 6 148689 chr6:148689 TAA T . . B1.M9;R=4;A=1 01
Download
Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
Contact
In case of any queries and bugs please contact Sayantan Das.