Minimac4 Documentation

From Genome Analysis Wiki
Revision as of 23:17, 19 July 2019 by Yukt (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

A typical Minimac4 command line would have the following parameter options:

      Reference Haplotypes : --refHaps [], --passOnly, --rsid,
                             --referenceEstimates [ON],
                             --mapFile [docs/geneticMapFile.b38.map.txt.gz]
         Target Haplotypes : --haps []
         Output Parameters : --prefix [Minimac4.Output], --estimate,
                             --nobgzip, --vcfBuffer [200], --format [GT,DS],
                             --allTypedSites, --meta, --memUsage
       Chunking Parameters : --ChunkLengthMb [20.00], --ChunkOverlapMb [3.00]
         Subset Parameters : --chr [], --start, --end, --window
  Approximation Parameters : --minimac3, --probThreshold [0.01],
                             --diffThreshold [0.01], --topThreshold [0.01]
          Other Parameters : --log, --help, --cpus [1], --params
                 PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Among all, --refHaps and --haps are required.


Reference Haplotypes

--refHaps <input_m3vcf_filename>
This option defines the reference panel in M3VCF format to impute against.
If your reference panel is in VCF format, please use Minimac3 to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
--passOnly
DEACTIVATED for now. OFF by default. If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
--rsid
OFF by default. If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
--referenceEstimates
ON by default. If ON, Minimac4 expects the input M3VCF file comes with parameter estimate; otherwise, a genetic map file for option --mapFile is required.
--mapFile <input_genetic_map_file>
This option is automatically ignored except when --referenceEstimates is OFF.
It defines the genetic map file used for recombination rate estimation during imputation.
The input genetic map file should be tab-separated, with the first row as its header, and the columns representing chromosome id, base pair position, cumulative recombination rate in cM/Mb, and genetic map coordinates in cM, respectively.

Target Haplotypes

--haps <input_vcf_filename>
This option defines the pre-phased target genotype data in VCF format to impute.

Output Parameters

--prefix <output_prefix>
This option defines the output filename prefix for all files generated by Minimac4.
If this option is omitted, all output files will have the prefix "Minimac4.Output" in the current working directory.
--estimate
DEACTIVATED for now. This option is equivalent to the option --processReference in Minimac3.
--nobgzip
OFF by default. If ON, output files will be NOT bgzipped.
--vcfBuffer
This option defines the maximum number of samples in the target genotype data to be imputed at a time. By default, it is set as 200, or the total number of samples, whichever is smaller.
Note that the larger the value is, the more memory Minimac4 will consume.
--format
This option specifies which fields to output for the FORMAT field in output imputed VCF file. Available handles are GT,DS,HDS,GP,SD. Default setting is GT,DS.
  • GT - Estimated most likely genotype.
  • DS - Estimated alternate allele dosage [P(0/1)+2*P(1/1)].
  • HDS - Estimated phased haploid alternate allele dosage.
  • GP - Estimated Posterior Genotype Probabilities P(0/0), P(0/1) and P(1/1).
  • SD - Estimated Variance of Posterior Genotype Probabilities.
--allTypedSites
OFF by default. If ON, Minimac4 will also include variants that were genotyped but NOT in the reference panel in the output files (and imputes any missing data in such variants to the major allele frequency).
--meta
OFF by default. If ON, Minimac4 will generate a separate file that can be used by MetaMinimac2 for meta-imputation.
--memUsage
OFF by default. If ON, Minimac4 will not perform imputation. Instead, it will estimate memory that imputation would consume based on a single chunk.

Chunking Parameters

Minimac4 automatically chunks the whole chromosome (into overlapping chunks), analyzes each chunk sequentially and then concatenates the imputed chunks back.

--ChunkLengthMb <float_number>
This option defines the average length of chunks in units of million base pairs (Mb). The input value should be within (0.001, 300]. The default setting is 20.
--ChunkOverlapMb <float_number>
This option defines the length of overlap between chunks in units of Mb, 3Mb by default. The valid input value should be within (0.001, 300].
The overlap length should be at most 1/3 of the chunk length, if larger, Minimac4 will automatically reduce it to 1/3 of the chunk length.

Subset Parameters

The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). If using the subset parameters, a default window of additional 500 Kbp is applied on either side as the buffer region, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output. For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 300000 base positions on either side as a buffer, one must use --chr 6 --from 1000000 --to 2000000 --window 300000.

--chr <chromosome>
This option specifies the chromosome number for which we will carry out imputation.
Note that it is required to specify non-zero values for --start and --end when --chr option is used.
--start <integer>
This option specifies the start position of the region to be analyzed. Would not work without --chr option.
--end <integer>
This option specifies the end position of the region to be analyzed. Would not work without --chr option.
--window <integer>
This option specifies the length of buffer region in units of base pairs on either side of the region to be analyzed. Would not work without --chr option; otherwise, it is set as 500000 by default.

Approximation Parameters

--minimac3
OFF by default. If ON, Minimac3 algorithm will be used for imputation.
--probThreshold <float_number>
--diffThreshold <float_number>
--topThreshold <float_number>

Other Parameters

--log
OFF by default. If ON, information including warnings and errors will be saved into <output_prefix>.logfile, instead printed on the screen.
--help
If ON, it will show the list of all available options.
--cpus <integer>
This option defines the number of cpus for parallel computing, 1 by default.

PhoneHome

This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back. User should use the handle --noPhoneHome to opt out from this option or should use --phoneHomeThinning 50 to send back a message with 50% chance (typically used when running lots of command lines).

--noPhoneHome
OFF by default. If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
--phoneHomeThinning <integer>
Percentage probability of sending SUCCESS/FAILURE status of the execution to home server, 50 by default.


Back to Minimac4 Overview Page