Difference between revisions of "Minimac4 Documentation"

From Genome Analysis Wiki
Jump to navigationJump to search
(half done)
 
(Add output options)
Line 24: Line 24:
 
: If your reference panel is in VCF format, please use [[Minimac3]] to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
 
: If your reference panel is in VCF format, please use [[Minimac3]] to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
  
;<s>--passOnly</s>
+
; --passOnly  
: <s>DEACTIVATED! If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).</s>
+
: DEACTIVATED for now. OFF by default. If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).  
  
 
;--rsid
 
;--rsid
: If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
+
: OFF by default. If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
  
 
;--referenceEstimates
 
;--referenceEstimates
Line 36: Line 36:
 
: This option is automatically ignored except when <code>--referenceEstimates</code> is OFF.  
 
: This option is automatically ignored except when <code>--referenceEstimates</code> is OFF.  
 
: It defines the genetic map file used for recombination rate estimation during imputation.  
 
: It defines the genetic map file used for recombination rate estimation during imputation.  
: The input genetic map file should be tab-separated, with 1st column as chromosome id, 3rd column as cumulative recombination rate in cM/Mb, and 4th as genetic map coordinates in cM.
+
: The input genetic map file should be tab-separated, with the first row as its header, and the columns representing chromosome id, base pair position, cumulative recombination rate in cM/Mb, and genetic map coordinates in cM, respectively.
  
 
== Target Haplotypes ==
 
== Target Haplotypes ==
Line 43: Line 43:
  
 
== Output Parameters ==
 
== Output Parameters ==
 +
; --prefix <output_prefix>
 +
: This option defines the output filename prefix for all files generated by Minimac4.
 +
: If this option is omitted, all output files will have the prefix "Minimac4.Output" in the current working directory.
 +
 +
; --estimate
 +
: DEACTIVATED for now. This option is equivalent to the option <code>--processReference</code> in [[Minimac3 Usage|Minimac3]].
 +
 +
; --nobgzip
 +
: OFF by default. If ON, output files will be NOT bgzipped.
 +
 +
; --vcfBuffer
 +
: This option defines the maximum number of samples in the target genotype data to be imputed at a time. By default, it is set as 200, or the total number of samples, whichever is smaller.
 +
: Note that the larger the value is, the more memory Minimac4 will consume.
 +
 +
; --format
 +
: This option specifies which fields to output for the FORMAT field in output imputed VCF file. Available handles are <code>GT</code>,<code>DS</code>,<code>HDS</code>,<code>GP</code>,<code>SD</code>. Default setting is <code>GT,DS</code>.
 +
:* '''GT''' - Estimated most likely genotype.
 +
:* '''DS''' - Estimated alternate allele dosage [P(0/1)+2*P(1/1)].
 +
:* '''HDS''' - Estimated phased haploid alternate allele dosage.
 +
:* '''GP''' - Estimated Posterior Genotype Probabilities P(0/0), P(0/1) and P(1/1).
 +
:* '''SD''' - Estimated Variance of Posterior Genotype Probabilities.
 +
 +
;--allTypedSites
 +
: OFF by default. If ON, Minimac4 will also include variants that were genotyped but NOT in the reference panel in the output files (and imputes any missing data in such variants to the major allele frequency).
 +
 +
;--meta
 +
: OFF by default. If ON, Minimac4 will generate a separate file that can be used by [[MetaMinimac2|MetaMinimac2]] for meta-imputation.
 +
 +
;--memUsage
 +
: OFF by default. If ON, Minimac4 will not perform imputation. Instead, it will estimate memory that imputation would consume based on a single chunk.
  
 
== Chunking Parameters ==
 
== Chunking Parameters ==

Revision as of 20:50, 16 July 2019

A typical Minimac4 command line would have the following parameter options:

      Reference Haplotypes : --refHaps [], --passOnly, --rsid,
                             --referenceEstimates [ON],
                             --mapFile [docs/geneticMapFile.b38.map.txt.gz]
         Target Haplotypes : --haps []
         Output Parameters : --prefix [Minimac4.Output], --estimate,
                             --nobgzip, --vcfBuffer [200], --format [GT,DS],
                             --allTypedSites, --meta, --memUsage
       Chunking Parameters : --ChunkLengthMb [20.00], --ChunkOverlapMb [3.00]
         Subset Parameters : --chr [], --start, --end, --window
  Approximation Parameters : --minimac3, --probThreshold [0.01],
                             --diffThreshold [0.01], --topThreshold [0.01]
          Other Parameters : --log, --help, --cpus [1], --params
                 PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Among all, --refHaps and --haps are required.


Reference Haplotypes

--refHaps <input_m3vcf_filename>
This option defines the reference panel in M3VCF format to impute against.
If your reference panel is in VCF format, please use Minimac3 to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4.
--passOnly
DEACTIVATED for now. OFF by default. If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
--rsid
OFF by default. If ON, Minimac4 will only import RS ID of variants from ID column of reference file (if available).
--referenceEstimates
ON by default. If ON, Minimac4 expects the input M3VCF file comes with parameter estimate; otherwise, a genetic map file for option --mapFile is required.
--mapFile <input_genetic_map_file>
This option is automatically ignored except when --referenceEstimates is OFF.
It defines the genetic map file used for recombination rate estimation during imputation.
The input genetic map file should be tab-separated, with the first row as its header, and the columns representing chromosome id, base pair position, cumulative recombination rate in cM/Mb, and genetic map coordinates in cM, respectively.

Target Haplotypes

--haps <input_vcf_filename>
This option defines the pre-phased target genotype data in VCF format to impute.

Output Parameters

--prefix <output_prefix>
This option defines the output filename prefix for all files generated by Minimac4.
If this option is omitted, all output files will have the prefix "Minimac4.Output" in the current working directory.
--estimate
DEACTIVATED for now. This option is equivalent to the option --processReference in Minimac3.
--nobgzip
OFF by default. If ON, output files will be NOT bgzipped.
--vcfBuffer
This option defines the maximum number of samples in the target genotype data to be imputed at a time. By default, it is set as 200, or the total number of samples, whichever is smaller.
Note that the larger the value is, the more memory Minimac4 will consume.
--format
This option specifies which fields to output for the FORMAT field in output imputed VCF file. Available handles are GT,DS,HDS,GP,SD. Default setting is GT,DS.
  • GT - Estimated most likely genotype.
  • DS - Estimated alternate allele dosage [P(0/1)+2*P(1/1)].
  • HDS - Estimated phased haploid alternate allele dosage.
  • GP - Estimated Posterior Genotype Probabilities P(0/0), P(0/1) and P(1/1).
  • SD - Estimated Variance of Posterior Genotype Probabilities.
--allTypedSites
OFF by default. If ON, Minimac4 will also include variants that were genotyped but NOT in the reference panel in the output files (and imputes any missing data in such variants to the major allele frequency).
--meta
OFF by default. If ON, Minimac4 will generate a separate file that can be used by MetaMinimac2 for meta-imputation.
--memUsage
OFF by default. If ON, Minimac4 will not perform imputation. Instead, it will estimate memory that imputation would consume based on a single chunk.

Chunking Parameters

Subset Parameters

Approximation Parameters

Other Parameters

PhoneHome