M3vcftools Usage

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

m3vcftools is a tool (exactly similar to vcftools) but will be much faster and can be used for simple data query and basic summary like allele frequency, linkage-r2 calculation, etc. m3vcftools is ideal for mega reference panels like Haplotye Reference Consortium (HRC) with over 32,000 samples where m3vcftools is 70-90 times faster than vcftools (18 mins vs 28 hours) in calculating AF or LD summaries.

This wiki page gives users a full list of all the available options on m3vcftools .

Full List of Options

The following table gives a brief description of all the parameters of m3vcftools . The software is in a very developmental mode and new options are going to be added regularly. Please join our NEW mailing list to get updates about future options that are added or new option requests that you might want early. User could also email their queries/feedback to Sayantan Das.

BASIC OPTIONS

INPUT FILE OPTIONS

Parameter Description
--vcf or --in <input_filename> This option defines the VCF file or M3VCF file (DOES NOT support missing or multi-allelic variants now, will update soon).
--check This option does some extra verification while reading M3VCF files. If your program crashed for some reason, try re-running it with this option ON and see if it gives an error message.


OUTPUT FILE OPTIONS

Parameter Description
--out <output_prefix> This option defines the output prefix for all files that are generated by m3vcftools. Default Value: m3vcftools.Output.
--recode This option is used for converting input VCF files to output M3VCF files. It is also used when you import a M3VCF file on a subset of variants or subset of samples and would wish to re-compress the sub-set file for efficient analysis in the future.


SITE FILTERING OPTIONS

POSITION FILTERING

Parameter Description
--chr <chromosome> This option includes all sites with identifiers matching the value of <chromosome>. (m3vcftools only handles single chromosome files)
--from-bp <integer>
--to-bp <integer>
This option specifies a lower / upper bound (inclusive) for the range of sites to be processed. This option must be used in conjunction with --chr .
--positions <filename>
--exclude-positions <filename>
This option is used to include / exclude a set of sites on the basis of list of positions in the given file. Each line of the input file should contain a (tab-separated) chromosome and position. The file can have comment lines that start with a "#", they will be ignored. Thus, M3VCF or VCF files can be used as <filename>.


OUTPUT OPTIONS

OUTPUT ALLELE STATISTICS

Parameter Description
--freq This option outputs the allele frequency for each site in a file with suffix ".frq".
--counts This option outputs the raw allele count for each site in a file with suffix ".frq.count".


OUTPUT LINKAGE (LD) STATISTICS

Parameter Description
--hap-r2 This option outputs summary of linkage disequilibrium (reported as r2, D and D' statistics using phased haplotypes) in a file with suffix ".hap.ld". These statistics are only calculated for phased, bi-allelic sites.
--ld-window <integer>
--ld-window-min <integer>
This optional parameter defines the maximum / minimum number of variants between the variants being analyzed.
--ld-window-bp <integer>
--ld-window-bp-min <integer>
This optional parameter defines the maximum / minimum number of physical bases between the variants being analyzed.
--min-r2 <float> This optional parameter sets a minimum value of r2, below with LD statistics are NOT reported. It can be used in conjunction with the above parameters.

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.