Changes

CalcMatch (view source)

Revision as of 22:51, 22 June 2010

116 bytes added , 22:51, 22 June 2010

no edit summary

Line 1: Line 1:

CalcMatch is a C/C++ software developed by Yun Li that compares two sets of pedigree files. It was initially written to compare imputed genotypes with their true/experimental counterpart but can be used to compare the concordance between any two sets of pedigree files. The input data are in standard Merlin/QTDT format (http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html).

−

--impped --impdat specify one input pedigree set.

+

--impped --impdat specify one input pedigree set.

−

~~--trueped --truedat specify the other input pedigree set.~~

−

--~~match generates a matrix taking values 0,1,2 indicating # of matched alleles. The dimension of~~ the ~~matrix is # of overlapping individuals times # of overlapping markers of the two~~ input pedigree ~~sets~~.

+

--trueped --truedat specify the other input pedigree set.

−

--~~bySNP is turned on by default to generate SNP specific measures~~. The ~~output~~ .~~bySNP will contain the following 6 fields for each SNP:~~

+

--match generates a matrix taking values 0,1,2 indicating # of matched alleles. The dimension of the matrix is # of overlapping individuals times # of overlapping markers of the two input pedigree sets.

−

(1) SNP : SNP name

+

--bySNP is turned on by default to generate SNP specific measures. The output .bySNP will contain the following 6 fields for each SNP:

−

(2) gErr : genotypic discordance rate

+

−

(3) aErr : allelic discordance rate

+

(1) SNP : SNP name

−

(4) matchedG : number of genotypes matched

+

(2) gErr : genotypic discordance rate

+

(3) aErr : allelic discordance rate

+

(4) matchedG : number of genotypes matched

(5) matchedA: number of alleles matched

−

(6) maskedG: total number of genotypes evaluated/masked (<=n of course) (I should change the naming to comparedG or evaluatedG)

+

(6) maskedG: total number of genotypes evaluated/masked (<=n of course) (I should change the naming to comparedG or evaluatedG)

+

--byGeno can be added on top of --bySNP. It will generates the following fields after the 6 fields above:

+

(7) hetAerr : allelic discordance rate among heterozygotes

+

(8) AL1: allele 1 (an arbitrary allele)

+

(9) AL2: allele 2

+

(10) freq1: frequency of AL1

+

(11) MAF

+

(12) #true 1/1: # individuals with experimental genotype AL1/AL1

+

(13) mm1/2: # of true AL1/AL1 being imputed as AL1/AL2

+

(14) mm2/2: # of true AL1/AL1 being imputed as AL2/AL2

+

(15) #true 1/2

+

(16) mm1/1

+

(17) mm2/2

+

(18) #true 2/2

+

(19) mm1/1

+

(20) mm1/2

−

~~--byGeno can be added on top of --bySNP. It will generates the following fields after the 6 fields above:~~

−

~~(7) hetAerr : allelic discordance rate among heterozygotes~~

+

−

~~(8) AL1: allele 1 (an arbitrary allele)~~

−

~~(9) AL2: allele 2~~

−

~~(10) freq1: frequency of AL1~~

−

~~(11) MAF~~

−

~~(12) #true 1/1: # individuals with experimental genotype AL1/AL1~~

−

~~(13) mm1/2: # of true AL1/AL1 being imputed as AL1/AL2~~

−

~~(14) mm2/2: # of true AL1/AL1 being imputed as AL2/AL2~~

−

~~(15) #true 1/2~~

−

~~(16) mm1/1~~

−

~~(17) mm2/2~~

−

~~(18) #true 2/2~~

−

~~(19) mm1/1~~

−

~~(20) mm1/2~~

−

--accuracyByGeno is an option I added most recently to represent the above (7-20) information in a different way. Similar to --byGeno, it is used on top of --bySNP. It can be used together with --byGeno. It will generate the following fields, after (7-20) is --byGeno is turned on or after the 6th field otherwise.

+

--accuracyByGeno is an option I added most recently to represent the above (7-20) information in a different way. Similar to --byGeno, it is used on top of --bySNP. It can be used together with --byGeno. It will generate the following fields, after (7-20) is --byGeno is turned on or after the 6th field otherwise.

(A) almajor: major allele

Line 42: Line 46:

(F) accuracy22: allelic concordance rate for homozygotes minor allele

−

+

--byPerson generates a separate output file .byPerson and contains the following information for each person:

−

--byPerson generates a separate output file .byPerson and contains the following information for each person:

(1) famid

Line 53: Line 56:

(7) maskedG

+

This --bySNP option is useful if there is potential sample swap or inter-individual difference, e.g., sequencing depth, number of markers genotyped.

−

~~This --bySNP option is useful if there is potential sample swap or inter-individual difference, e.g., sequencing depth, number of markers genotyped.~~

+

CalcMatch compares all genotypes overlapping the two input sets. However, when --maskflag is turned on AND --maskped and --maskdat are specified (I know ...) it compares only the following subset of the overlapping genotypes: genotypes either not found (i.e., individual or marker not included) or missing (included but with value 0/0, N/N, ./. etc) in --maskped / --maskdat. These options are useful when some individuals were masked for some SNPs while others masked for a different set of SNPs.

−

CalcMatch compares all genotypes overlapping the two input sets. However, when --maskflag is turned on AND --maskped and --maskdat are specified (I know ...) it compares only the following subset of the overlapping genotypes: genotypes either not found (i.e., individual or marker not included) or missing (included but with value 0/0, N/N, ./. etc) in --maskped / --maskdat. These options are useful when some individuals were masked for some SNPs while others masked for a different set of SNPs.

Ylwtx

212

edits

Changes

CalcMatch (view source)

Revision as of 22:51, 22 June 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools