Minimac4 Documentation

From Genome Analysis Wiki
Revision as of 20:50, 16 July 2019 by Yukt (talk | contribs) (Add output options)
Jump to navigationJump to search

A typical Minimac4 command line would have the following parameter options:

      Reference Haplotypes : --refHaps [], --passOnly, --rsid,
                             --referenceEstimates [ON],
                             --mapFile [docs/geneticMapFile.b38.map.txt.gz]
         Target Haplotypes : --haps []
         Output Parameters : --prefix [Minimac4.Output], --estimate,
                             --nobgzip, --vcfBuffer [200], --format [GT,DS],
                             --allTypedSites, --meta, --memUsage
       Chunking Parameters : --ChunkLengthMb [20.00], --ChunkOverlapMb [3.00]
         Subset Parameters : --chr [], --start, --end, --window
  Approximation Parameters : --minimac3, --probThreshold [0.01],
                             --diffThreshold [0.01], --topThreshold [0.01]
          Other Parameters : --log, --help, --cpus [1], --params
                 PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Among all, --refHaps and --haps are required.


Reference Haplotypes

--refHaps <input_m3vcf_filename>
This option defines the reference panel in M3VCF format to impute against.
If your reference panel is in VCF format, please use Minimac3 to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
--passOnly
DEACTIVATED for now. OFF by default. If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
--rsid
OFF by default. If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
--referenceEstimates
ON by default. If ON, Minimac4 expects the input M3VCF file comes with parameter estimate; otherwise, a genetic map file for option --mapFile is required.
--mapFile <input_genetic_map_file>
This option is automatically ignored except when --referenceEstimates is OFF.
It defines the genetic map file used for recombination rate estimation during imputation.
The input genetic map file should be tab-separated, with the first row as its header, and the columns representing chromosome id, base pair position, cumulative recombination rate in cM/Mb, and genetic map coordinates in cM, respectively.

Target Haplotypes

--haps <input_vcf_filename>
This option defines the pre-phased target genotype data in VCF format to impute.

Output Parameters

--prefix <output_prefix>
This option defines the output filename prefix for all files generated by Minimac4.
If this option is omitted, all output files will have the prefix "Minimac4.Output" in the current working directory.
--estimate
DEACTIVATED for now. This option is equivalent to the option --processReference in Minimac3.
--nobgzip
OFF by default. If ON, output files will be NOT bgzipped.
--vcfBuffer
This option defines the maximum number of samples in the target genotype data to be imputed at a time. By default, it is set as 200, or the total number of samples, whichever is smaller.
Note that the larger the value is, the more memory Minimac4 will consume.
--format
This option specifies which fields to output for the FORMAT field in output imputed VCF file. Available handles are GT,DS,HDS,GP,SD. Default setting is GT,DS.
  • GT - Estimated most likely genotype.
  • DS - Estimated alternate allele dosage [P(0/1)+2*P(1/1)].
  • HDS - Estimated phased haploid alternate allele dosage.
  • GP - Estimated Posterior Genotype Probabilities P(0/0), P(0/1) and P(1/1).
  • SD - Estimated Variance of Posterior Genotype Probabilities.
--allTypedSites
OFF by default. If ON, Minimac4 will also include variants that were genotyped but NOT in the reference panel in the output files (and imputes any missing data in such variants to the major allele frequency).
--meta
OFF by default. If ON, Minimac4 will generate a separate file that can be used by MetaMinimac2 for meta-imputation.
--memUsage
OFF by default. If ON, Minimac4 will not perform imputation. Instead, it will estimate memory that imputation would consume based on a single chunk.

Chunking Parameters

Subset Parameters

Approximation Parameters

Other Parameters

PhoneHome