Changes

1,561 bytes added , 12:03, 2 February 2017

Line 1: Line 1: −

CalcMatch is a C/C++ software developed by Yun Li ~~that~~ compares two sets of pedigree files. It was initially written to compare imputed genotypes with their true/experimental counterpart but can be used to compare the concordance between any two sets of pedigree files. The input data are in standard Merlin/QTDT format (http://~~www~~.sph.umich.edu/~~csg~~/abecasis/Merlin/tour/input_files.html).

+

CalcMatch is a C/C++ software developed by [https://csg.sph.umich.edu//yli/ Yun Li]. It compares two sets of pedigree files. It was initially written to compare imputed genotypes with their true/experimental counterpart but can be used to compare the concordance between any two sets of pedigree files. The input data are in standard Merlin/QTDT format (http://csg.sph.umich.edu//abecasis/Merlin/tour/input_files.html).

= Options =

−

== --impped --impdat specify one input pedigree set. ~~ ==~~

+

== --impped --impdat ==

+

specify one input pedigree set.

−

== --trueped --truedat specify the other input pedigree set. ==

+

== --trueped --truedat ==

+

specify the other input pedigree set.

== --match ==

generates a matrix taking values 0,1,2 indicating # of matched alleles. The dimension of the matrix is # of overlapping individuals times # of overlapping markers of the two input pedigree sets.

−

--bySNP is turned on by default to generate SNP specific measures. The output .bySNP will contain the following 6 fields for each SNP:

+

== --bySNP ==

+

is turned on by default (which means: if you put --bySNP in command line, it will be turned OFF!) to generate SNP specific measures. The output .bySNP will contain the following 6 fields for each SNP:

(1) SNP : SNP name

Line 18: Line 21:

(6) maskedG: total number of genotypes evaluated/masked (<=n of course) (I should change the naming to comparedG or evaluatedG)

−

--byGeno can be added on top of --bySNP. It will generates the following fields after the 6 fields above:

+

+

== --byGeno ==

+

NOTE: this option is turned on by default. If you put --byGeno in command line, it will be turned OFF!

+

can be added on top of --bySNP. It will generates the following fields after the 6 fields above:

(7) hetAerr : allelic discordance rate among heterozygotes

Line 39: Line 46:

−

--accuracyByGeno ~~is an option I added most recently to represent the above (7-20) information in a different way.~~ Similar to --byGeno, it is used on top of --bySNP. It ~~can~~ be used together with --byGeno. It will generate the following fields, after (7-20) is --byGeno is turned on or after the 6th field otherwise.

+

== --accuracyByGeno ==

+

Similar to --byGeno, it is used on top of --bySNP. It may be used together with --byGeno. It will generate the following fields, after (7-20) is --byGeno is turned on or after the 6th field otherwise.

(A) almajor: major allele

Line 48: Line 56:

(F) accuracy22: allelic concordance rate for homozygotes minor allele

−

--byPerson generates a separate output file .byPerson and contains the following information for each person:

+

+

== --byPerson ==

+

generates a separate output file .byPerson and contains the following information for each person:

(1) famid

Line 58: Line 68:

(7) maskedG

−

This --~~bySNP~~ option is useful if there is potential sample swap or inter-individual difference, e.g., sequencing depth, number of markers genotyped.

+

This --byPerson option is useful if there is potential sample swap or inter-individual difference, e.g., sequencing depth, number of markers genotyped.

+

+

== --maskflag --maskped --maskdat ==

+

CalcMatch compares all genotypes overlapping the two input sets. However, when --maskflag is turned on AND --maskped and --maskdat are specified (I know ...) it compares only the following subset of the overlapping genotypes: genotypes either not found (i.e., individual or marker not included) or missing (included but with value 0/0, N/N, ./. etc) in --maskped / --maskdat. These options are useful when some individuals were masked for some SNPs while others masked for a different set of SNPs.

+

= output files =

+

== .bySNP ==

+

See option --bySNP

+

== .byPerson ==

+

See option --byPerson

+

== .minusstrand ==

+

Reports the list of SNPs that appear in minus strand (that is, SNPs for which more than two alleles are seen when combining imputed and true pedigree files. This file will only be generated if --byGeno or --accuracyByGeno is turned on. The former option --byGeno is turned on by default.

+

= example command lines =

+

CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --byPerson

+

Will generate CalcMatch.Output.bySNP (6 fields only) and CalcMatch.Output.byPerson.

+

CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --byGeno --byPerson

+

Will generate CalcMatch.Output.bySNP (6+20 fields) and CalcMatch.Output.byPerson.

+

CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --accuracyByGeno --byPerson

+

Will generate CalcMatch.Output.bySNP (6+6 fields only) and CalcMatch.Output.byPerson.

+

CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --accuracyByGeno --byGeno --byPerson

+

Will generate CalcMatch.Output.bySNP (6+20+6 fields only) and CalcMatch.Output.byPerson.

−

~~ CalcMatch compares all genotypes overlapping the two input sets. However, when --maskflag is turned on AND --maskped and --maskdat are specified (I know~~ ...~~) it compares only the following subset of the overlapping genotypes: genotypes either not found (i.e., individual or marker not included) or missing (included but with value 0~~/~~0, N~~/~~N, .~~/. ~~etc) in --maskped / --maskdat. These options are useful when some individuals were masked for some SNPs while others masked for a different set of SNPs.~~

+

= Download =

+

Please go to http://csg.sph.umich.edu//yli/software.html

Ppwhite

96

edits

Changes

CalcMatch (view source)

Revision as of 12:03, 2 February 2017

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools