Minimac4 Documentation
From Genome Analysis Wiki
Jump to navigationJump to searchA typical Minimac4 command line would have the following parameter options:
Reference Haplotypes : --refHaps [], --passOnly, --rsid, --referenceEstimates [ON], --mapFile [docs/geneticMapFile.b38.map.txt.gz] Target Haplotypes : --haps [] Output Parameters : --prefix [Minimac4.Output], --estimate, --nobgzip, --vcfBuffer [200], --format [GT,DS], --allTypedSites, --meta, --memUsage Chunking Parameters : --ChunkLengthMb [20.00], --ChunkOverlapMb [3.00] Subset Parameters : --chr [], --start, --end, --window Approximation Parameters : --minimac3, --probThreshold [0.01], --diffThreshold [0.01], --topThreshold [0.01] Other Parameters : --log, --help, --cpus [1], --params PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
Among all, --refHaps and --haps are required.
Reference Haplotypes
- --refHaps <input_m3vcf_filename>
- This option defines the reference panel in M3VCF format to impute against.
- If your reference panel is in VCF format, please use Minimac3 to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
- --passOnly
- DEACTIVATED for now. OFF by default. If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
- --rsid
- OFF by default. If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
- --referenceEstimates
- ON by default. If ON, Minimac4 expects the input M3VCF file comes with parameter estimate; otherwise, a genetic map file for option
--mapFile
is required.
- --mapFile <input_genetic_map_file>
- This option is automatically ignored except when
--referenceEstimates
is OFF. - It defines the genetic map file used for recombination rate estimation during imputation.
- The input genetic map file should be tab-separated, with the first row as its header, and the columns representing chromosome id, base pair position, cumulative recombination rate in cM/Mb, and genetic map coordinates in cM, respectively.
Target Haplotypes
- --haps <input_vcf_filename>
- This option defines the pre-phased target genotype data in VCF format to impute.
Output Parameters
- --prefix <output_prefix>
- This option defines the output filename prefix for all files generated by Minimac4.
- If this option is omitted, all output files will have the prefix "Minimac4.Output" in the current working directory.
- --estimate
- DEACTIVATED for now. This option is equivalent to the option
--processReference
in Minimac3.
- --nobgzip
- OFF by default. If ON, output files will be NOT bgzipped.
- --vcfBuffer
- This option defines the maximum number of samples in the target genotype data to be imputed at a time. By default, it is set as 200, or the total number of samples, whichever is smaller.
- Note that the larger the value is, the more memory Minimac4 will consume.
- --format
- This option specifies which fields to output for the FORMAT field in output imputed VCF file. Available handles are
GT
,DS
,HDS
,GP
,SD
. Default setting isGT,DS
.- GT - Estimated most likely genotype.
- DS - Estimated alternate allele dosage [P(0/1)+2*P(1/1)].
- HDS - Estimated phased haploid alternate allele dosage.
- GP - Estimated Posterior Genotype Probabilities P(0/0), P(0/1) and P(1/1).
- SD - Estimated Variance of Posterior Genotype Probabilities.
- --allTypedSites
- OFF by default. If ON, Minimac4 will also include variants that were genotyped but NOT in the reference panel in the output files (and imputes any missing data in such variants to the major allele frequency).
- --meta
- OFF by default. If ON, Minimac4 will generate a separate file that can be used by MetaMinimac2 for meta-imputation.
- --memUsage
- OFF by default. If ON, Minimac4 will not perform imputation. Instead, it will estimate memory that imputation would consume based on a single chunk.