Polymutt2

From Genome Analysis Wiki
Revision as of 11:32, 3 September 2015 by Bingshan (talk | contribs) (→‎Contact)
Jump to navigationJump to search

Updates

The latest version of 0.1 is available for Download.

Compilation

  • After downloading the source code, unzip and untar it, and cd polymutt2, and then type Make
  • Two executables will be generated in bin/ directory: polymutt2 and vcf2map
  • vcf2map is to prune LD and generate a map file with high quality SNPs
  • polymutt2 is to generate genotype calls taking a VCF file and a map file as input

Usage

  • First an LD pruned map file is generated by vcf2map. The following is the message by command vcf2map without any arguments
           Input : --vcf [], --ped [],
                   --map [/scratch/cgg/lib13/db/hapmap/genetic_map_GRCh37_chr1.txt],
                   --include_list [/scratch/cgg/Public/hg19/1000G.SNV.clean.MAF0.05.tbl.gz]
          Output : --out_map []
  Variant filter : --min_maf [0.10], --min_avg_dp [0.00],
                   --max_avg_dp [-1.0e+00], --max_missing_rate [0.05]
      LD pruning : --win_size [100], --max_r2 [0.10], --ignore_missing
  • A command polymutt2 without any argument displays the following message
                      pedfile :                 (-pname)
                      datfile :                 (-dname)
                      mapfile :                 (-mname)
Additional Options
               Input : --in_vcf [], --in_range [], --mixed_vcf_records
  Mutation paramters : --theta_snv [1.0e-03], --theta_indel [1.0e-04],
                       --tstv_ratio [2.00], --submap [1.00]
     Multi-threading : --nthreads [1]
              Output : --out_vcf [], --fam_idx, --fam_id [], --out_all,
                       --out_range [], --best_marginal, --best_path
       Approximation : --cum_prob [1.00], --single_iv
  • NOTE: current version can only process one chromosome at a time

Examples of generating the map file

  • vcf2map: generate a sparse map file (see Download for files genetic_map_GRCh37_chr1.txt and 1000G.SNV.clean.MAF0.05.tbl.gz)
 vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --out_map chr1.map
  • User defined r2 cutoff for LD pruning , min of average depth for filtering
vcf2map --vcf input.vcf --ped input.ped --map genetic_map_GRCh37_chr1.txt --include_list 1000G.SNV.clean.MAF0.05.tbl.gz --max_r2 0.2 --min_avg_dp 2 --out_map chr1.r0.2.map

Examples of running polymutt2

  • polymutt2: taking a VCF and the map file generated by vcf2map (the vcf file can be a complete vcf with all variants and samples)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf
  • If parents are available genotypes can be phased by transmission (accuracy is not as good as above)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --best_path
  • If a single family is desired to be output (the ped file can contain all families but will be ignored)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --fam_id
  • If only a range is desire to output (for example the whole chromosome can be divided into multiple parallel jobs each working on a range)
polymutt2 -p input.ped -m chr1.map --in_vcf input.vcf --out_vcf out.vcf --out_range 1:1000000-2000000

Download

  • The latest version of source code v0.1 can be downloaded here.
  • The genetic map files (genetic_map_GRCh37_chr1.txt) used above can be downloaded here.
  • The clean and common variants in the 1000 Genome Project (1000G.SNV.clean.MAF0.05.tbl.gz) used above can be downloaded here.

Contact

For questions please contact the authors (Bingshan Li: bingshan.li@vanderbilt.edu)

Citation

Li B, Wei Q, Zhan X, Zhong X, Chen W, Li C, et al. (2015) Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 11(6): e1005271. doi:10.1371/journal.pgen.1005271