Line 1: |
Line 1: |
| == Updates == | | == Updates == |
− | The latest version of 0.1 is available for [[#Download | Download]]. | + | The latest version of v0.2 is available for [[#Download | Download]]. |
| + | |
| + | == Note == |
| + | '''Polymutt2 can only handle one chromosome, so please run it chromosome by chromosome'''. |
| | | |
| == Compilation == | | == Compilation == |
Line 9: |
Line 12: |
| | | |
| == Usage == | | == Usage == |
| + | * First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments |
| + | |
| + | Input : --vcf [], --ped [], |
| + | --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt], |
| + | --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz] |
| + | Output : --out_map [] |
| + | Variant filter : --min_maf [0.10], --min_avg_dp [0.00], |
| + | --max_avg_dp [-1.0e+00], --max_missing_rate [0.05] |
| + | LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing |
| + | |
| + | * A command polymutt2 without any argument displays the following message |
| + | |
| + | pedfile : (-pname) |
| + | datfile : (-dname) |
| + | mapfile : (-mname) |
| + | |
| + | Additional Options |
| + | Input : --in_vcf [], --in_range [], --mixed_vcf_records |
| + | Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04], |
| + | --tstv_ratio [2.00], --submap [1.00] |
| + | Multi-threading : --nthreads [1] |
| + | Output : --out_vcf [], --fam_idx, --fam_id [], --out_all, |
| + | --out_range [], --best_marginal, --best_path |
| + | Approximation : --cum_prob [1.00], --single_iv |
| + | |
| * NOTE: current version can only process one chromosome at a time | | * NOTE: current version can only process one chromosome at a time |
| | | |
| + | == Examples of generating the map file == |
| * vcf2map: generate a sparse map file (see [[#Download|Download]] for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz) | | * vcf2map: generate a sparse map file (see [[#Download|Download]] for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz) |
| vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map | | vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map |
− | vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map
| |
| | | |
− | * polymutt2: taking a VCF and the map file generated by vcf2map | + | * User defined r2 cutoff for LD pruning , min of average depth for filtering |
| + | vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map |
| + | |
| + | == Examples of running polymutt2 == |
| + | |
| + | * polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples) |
| polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf | | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf |
| + | |
| + | * If parents are available genotypes can be phased by transmission (accuracy is not as good as above) |
| polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path | | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path |
− | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id fam1 | + | |
| + | * If a single family is desired to be output (the ped file can contain all families but will be ignored) |
| + | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id |
| + | |
| + | * If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range) |
| polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000 | | polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000 |
| + | |
| + | == File format == |
| + | See PLINK http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml for file format |
| | | |
| == Download == | | == Download == |
− | *The latest version of source code v0.1 can be [[Media:Polymutt2.0.1.tar.gz | downloaded]] here. | + | *The latest version of source code v0.2 can be [[Media:Polymutt2_v0.2.tar.gz | downloaded]] here. |
| *The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be [[Media:genetic_map_HapMapII_GRCh37.tar.gz | downloaded]] here. | | *The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be [[Media:genetic_map_HapMapII_GRCh37.tar.gz | downloaded]] here. |
| *The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be [[Media:1000G.SNV.clean.MAF0.05.tbl.gz | downloaded]] here. | | *The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be [[Media:1000G.SNV.clean.MAF0.05.tbl.gz | downloaded]] here. |
| | | |
| == Contact == | | == Contact == |
− | For questions please contact the authors (Bingshan Li: [mailto:bingshan@umich.edu bingshan@umich.edu]) | + | For questions please contact the authors (Bingshan Li: [mailto:bingshan.li@vanderbilt.edu bingshan.li@vanderbilt.edu]) |
| | | |
| == Citation == | | == Citation == |
| Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271 | | Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271 |