Difference between revisions of "Polymutt2"
From Genome Analysis Wiki
Jump to navigationJump to search(19 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Updates == | == Updates == | ||
− | The latest version of | + | The latest version of v0.2 is available for [[#Download | Download]]. |
+ | |||
+ | == Note == | ||
+ | '''Polymutt2 can only handle one chromosome, so please run it chromosome by chromosome'''. | ||
== Compilation == | == Compilation == | ||
Line 9: | Line 12: | ||
== Usage == | == Usage == | ||
+ | * First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments | ||
+ | |||
+ | Input : --vcf [], --ped [], | ||
+ | --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt], | ||
+ | --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz] | ||
+ | Output : --out_map [] | ||
+ | Variant filter : --min_maf [0.10], --min_avg_dp [0.00], | ||
+ | --max_avg_dp [-1.0e+00], --max_missing_rate [0.05] | ||
+ | LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing | ||
+ | |||
+ | * A command polymutt2 without any argument displays the following message | ||
+ | |||
+ | pedfile : (-pname) | ||
+ | datfile : (-dname) | ||
+ | mapfile : (-mname) | ||
+ | |||
+ | Additional Options | ||
+ | Input : --in_vcf [], --in_range [], --mixed_vcf_records | ||
+ | Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04], | ||
+ | --tstv_ratio [2.00], --submap [1.00] | ||
+ | Multi-threading : --nthreads [1] | ||
+ | Output : --out_vcf [], --fam_idx, --fam_id [], --out_all, | ||
+ | --out_range [], --best_marginal, --best_path | ||
+ | Approximation : --cum_prob [1.00], --single_iv | ||
+ | |||
* NOTE: current version can only process one chromosome at a time | * NOTE: current version can only process one chromosome at a time | ||
− | * vcf2map: generate a sparse map file | + | == Examples of generating the map file == |
+ | * vcf2map: generate a sparse map file (see [[#Download|Download]] for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz) | ||
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map | vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map | ||
− | |||
− | * polymutt2: taking a VCF and the map file generated by vcf2map | + | * User defined r2 cutoff for LD pruning , min of average depth for filtering |
+ | vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map | ||
+ | |||
+ | == Examples of running polymutt2 == | ||
+ | |||
+ | * polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples) | ||
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf | ||
+ | |||
+ | * If parents are available genotypes can be phased by transmission (accuracy is not as good as above) | ||
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path | ||
− | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id | + | |
+ | * If a single family is desired to be output (the ped file can contain all families but will be ignored) | ||
+ | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id | ||
+ | |||
+ | * If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range) | ||
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000 | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000 | ||
+ | |||
+ | == File format == | ||
+ | See PLINK http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml for file format | ||
== Download == | == Download == | ||
− | *The latest version of source code v0. | + | *The latest version of source code v0.2 can be [[Media:Polymutt2_v0.2.tar.gz | downloaded]] here. |
*The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be [[Media:genetic_map_HapMapII_GRCh37.tar.gz | downloaded]] here. | *The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be [[Media:genetic_map_HapMapII_GRCh37.tar.gz | downloaded]] here. | ||
*The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be [[Media:1000G.SNV.clean.MAF0.05.tbl.gz | downloaded]] here. | *The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be [[Media:1000G.SNV.clean.MAF0.05.tbl.gz | downloaded]] here. | ||
+ | |||
+ | == Contact == | ||
+ | For questions please contact the authors (Bingshan Li: [mailto:bingshan.li@vanderbilt.edu bingshan.li@vanderbilt.edu]) | ||
+ | |||
+ | == Citation == | ||
+ | Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271 |
Latest revision as of 23:54, 10 November 2016
Updates
The latest version of v0.2 is available for Download.
Note
Polymutt2 can only handle one chromosome, so please run it chromosome by chromosome.
Compilation
- After downloading the source code, unzip and untar it, and cd polymutt2, and then type Make
- Two executables will be generated in bin/ directory: polymutt2 and vcf2map
- vcf2map is to prune LD and generate a map file with high quality SNPs
- polymutt2 is to generate genotype calls taking a VCF file and a map file as input
Usage
- First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments
Input : --vcf [], --ped [], --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt], --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz] Output : --out_map [] Variant filter : --min_maf [0.10], --min_avg_dp [0.00], --max_avg_dp [-1.0e+00], --max_missing_rate [0.05] LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing
- A command polymutt2 without any argument displays the following message
pedfile : (-pname) datfile : (-dname) mapfile : (-mname)
Additional Options Input : --in_vcf [], --in_range [], --mixed_vcf_records Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04], --tstv_ratio [2.00], --submap [1.00] Multi-threading : --nthreads [1] Output : --out_vcf [], --fam_idx, --fam_id [], --out_all, --out_range [], --best_marginal, --best_path Approximation : --cum_prob [1.00], --single_iv
- NOTE: current version can only process one chromosome at a time
Examples of generating the map file
- vcf2map: generate a sparse map file (see Download for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz)
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
- User defined r2 cutoff for LD pruning , min of average depth for filtering
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
Examples of running polymutt2
- polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf
- If parents are available genotypes can be phased by transmission (accuracy is not as good as above)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path
- If a single family is desired to be output (the ped file can contain all families but will be ignored)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id
- If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000
File format
See PLINK http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml for file format
Download
- The latest version of source code v0.2 can be downloaded here.
- The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be downloaded here.
- The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be downloaded here.
Contact
For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)
Citation
Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271