Polymutt2
From Genome Analysis Wiki
Jump to navigationJump to searchUpdates
The latest version of v0.2 is available for Download.
Note
Polymutt2 can only handle one chromosome, so please run it chromosome by chromosome.
Compilation
- After downloading the source code, unzip and untar it, and cd polymutt2, and then type Make
- Two executables will be generated in bin/ directory: polymutt2 and vcf2map
- vcf2map is to prune LD and generate a map file with high quality SNPs
- polymutt2 is to generate genotype calls taking a VCF file and a map file as input
Usage
- First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments
Input : --vcf [], --ped [], --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt], --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz] Output : --out_map [] Variant filter : --min_maf [0.10], --min_avg_dp [0.00], --max_avg_dp [-1.0e+00], --max_missing_rate [0.05] LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing
- A command polymutt2 without any argument displays the following message
pedfile : (-pname) datfile : (-dname) mapfile : (-mname)
Additional Options Input : --in_vcf [], --in_range [], --mixed_vcf_records Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04], --tstv_ratio [2.00], --submap [1.00] Multi-threading : --nthreads [1] Output : --out_vcf [], --fam_idx, --fam_id [], --out_all, --out_range [], --best_marginal, --best_path Approximation : --cum_prob [1.00], --single_iv
- NOTE: current version can only process one chromosome at a time
Examples of generating the map file
- vcf2map: generate a sparse map file (see Download for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz)
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
- User defined r2 cutoff for LD pruning , min of average depth for filtering
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
Examples of running polymutt2
- polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf
- If parents are available genotypes can be phased by transmission (accuracy is not as good as above)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path
- If a single family is desired to be output (the ped file can contain all families but will be ignored)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id
- If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000
File format
See PLINK http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml for file format
Download
- The latest version of source code v0.2 can be downloaded here.
- The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be downloaded here.
- The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be downloaded here.
Contact
For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)
Citation
Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271