Difference between revisions of "Polymutt2"
From Genome Analysis Wiki
Jump to navigationJump to search (→Usage) |
|||
Line 63: | Line 63: | ||
== Contact == | == Contact == | ||
− | For questions please contact the authors (Bingshan Li: [mailto:bingshan@ | + | For questions please contact the authors (Bingshan Li: [mailto:bingshan.li@vanderbilt.edu bingshan.li@vanderbilt.edu]) |
== Citation == | == Citation == | ||
Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271 | Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271 |
Revision as of 11:32, 3 September 2015
Updates
The latest version of 0.1 is available for Download.
Compilation
- After downloading the source code, unzip and untar it, and cd polymutt2, and then type Make
- Two executables will be generated in bin/ directory: polymutt2 and vcf2map
- vcf2map is to prune LD and generate a map file with high quality SNPs
- polymutt2 is to generate genotype calls taking a VCF file and a map file as input
Usage
- First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments
Input : --vcf [], --ped [], --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt], --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz] Output : --out_map [] Variant filter : --min_maf [0.10], --min_avg_dp [0.00], --max_avg_dp [-1.0e+00], --max_missing_rate [0.05] LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing
- A command polymutt2 without any argument displays the following message
pedfile : (-pname) datfile : (-dname) mapfile : (-mname)
Additional Options Input : --in_vcf [], --in_range [], --mixed_vcf_records Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04], --tstv_ratio [2.00], --submap [1.00] Multi-threading : --nthreads [1] Output : --out_vcf [], --fam_idx, --fam_id [], --out_all, --out_range [], --best_marginal, --best_path Approximation : --cum_prob [1.00], --single_iv
- NOTE: current version can only process one chromosome at a time
Examples of generating the map file
- vcf2map: generate a sparse map file (see Download for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz)
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
- User defined r2 cutoff for LD pruning , min of average depth for filtering
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
Examples of running polymutt2
- polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf
- If parents are available genotypes can be phased by transmission (accuracy is not as good as above)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path
- If a single family is desired to be output (the ped file can contain all families but will be ignored)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id
- If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000
Download
- The latest version of source code v0.1 can be downloaded here.
- The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be downloaded here.
- The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be downloaded here.
Contact
For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)
Citation
Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271