Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,433 bytes removed ,  18:22, 21 October 2010
Line 32: Line 32:     
== Optional Input ==
 
== Optional Input ==
===  
+
=== Relevant SNP File ===
 +
Fed to the option --relevant or -r, the relevant SNP file is a list of SNPs among which LD values are desired.
    +
= Options =
 +
== --allelefrequency ==
 +
Option to calculate allele frequency and output to prefix.freq. <br>
   −
= Options  =
+
== --allelecounts ==
 +
Option to calculate allele counts and output to prefix.ac. <br>
   −
== --impped --impdat <br> ==
+
== --ld ==
 +
Option to calculate LD. Note that this option has to be turned on for LD to be calculated. <br>
   −
specify one input pedigree set.  
+
== --windowSize or -w ==
 +
Option to specify the # of flanking SNPs with which LD values are calculated for each SNP. Default is 1,000, meaning that LD with 1,000 SNPs on each side (2,000 total) will be calculated for each SNP. <br>
   −
== --trueped --truedat <br> ==
+
== --r2Threshold or -t ==
 +
Minimum r<sup>2</sup> value for a pair of SNPs to be in output. Default is 0.00. <br>
   −
specify the other input pedigree set.  
+
== --DprimeThreshold or -d ==
 +
Minimum D' value for a pair of SNPs to be in output. Default is 0.00. <br>
   −
== --match  ==
+
== --pairWithSNP ==
 +
Option to calcuate LD only with a particular SNP. <br>
   −
generates a matrix taking values 0,1,2 indicating # of matched alleles. The dimension of the matrix is # of overlapping individuals times # of overlapping markers of the two input pedigree sets.  
+
== --pairWithList ==
 +
A list of SNPs with which LD values will be calculated. <br>
   −
== --bySNP  ==
+
== --coupling ==
 +
Option to output for each pair the alleles that are positively correlated. <br>
   −
is turned on by default to generate SNP specific measures. The output .bySNP will contain the following 6 fields for each SNP:
+
== --prefix or -o ==
 +
Option to specify output prefix. <br>
   −
    (1) SNP&nbsp;: SNP name
+
= Output =
  (2) gErr&nbsp;: genotypic discordance rate
+
== .freq ==
  (3) aErr&nbsp;: allelic discordance rate
+
Generated when option --allelefrequency is turned on. <br> <br>
  (4) matchedG&nbsp;: number of genotypes matched
  −
  (5) matchedA: number of alleles matched
  −
  (6) maskedG: total number of genotypes evaluated/masked (&lt;=n of course) (I should change the naming to comparedG or evaluatedG)
     −
<br>  
+
sample.freq <br>
 +
SNP AL1 AL2 Freq1 MAF<br>
 +
chr12:16099 1 3 0.4000 0.4000<br>
 +
chr12:16163 4 2 0.9000 0.1000<br>
 +
rs7358779 2 3 0.1000 0.1000<br>
 +
chr12:17063 1 3 0.8000 0.2000<br>
 +
...<br>
 +
<br>
   −
== --byGeno  ==
+
== .ac ==
 +
Generated when option --allelecounts is turned on. <br> <br>
   −
can be added on top of --bySNP. It will generates the following fields after the 6 fields above:  
+
sample.ac<br>
 +
SNP AL1 AL2 AC1 MAC<br>
 +
chr12:16099 1 3 4 4<br>
 +
chr12:16163 4 2 9 1<br>
 +
rs7358779 2 3 1 1<br>
 +
chr12:17063 1 3 8 2<br>
 +
<br>
   −
    (7) hetAerr&nbsp;: allelic discordance rate among heterozygotes
+
== .xt ==
  (8) AL1: allele 1 (an arbitrary allele)
+
Generated when option --ld is turned on. <br> <br>
  (9) AL2: allele 2
  −
  (10) freq1: frequency of AL1
  −
  (11) MAF
  −
  (12) #true 1/1: # individuals with experimental genotype AL1/AL1
  −
  (13) mm1/2: # of true AL1/AL1 being imputed as AL1/AL2
  −
  (14) mm2/2: # of true AL1/AL1 being imputed as AL2/AL2
  −
  (15) #true 1/2
  −
  (16) mm1/1
  −
  (17) mm2/2
  −
  (18) #true 2/2
  −
  (19) mm1/1
  −
  (20) mm1/2
     −
<br>  
+
sample.xt<br>
 +
M1 M2 DPRIME DELTASQ COUPLING<br>
 +
chr12:16252 chr12:16585 1.0000 0.6667 2,1<br>
 +
chr12:16252 chr12:16665 1.0000 1.0000 2,3<br>
 +
chr12:16252 chr12:16693 1.0000 1.0000 2,4<br>
 +
...<br>
 +
<br>
   −
<br>
+
= download =
 +
You can download the source codes and example files [https://www.sph.umich.edu/csg/yli/haploxt_V108.tgz haploxt].
   −
== --accuracyByGeno  ==
+
To install, simply type the following command:
   −
Similar to --byGeno, it is used on top of --bySNP. It may be used together with --byGeno. It will generate the following fields, after (7-20) is --byGeno is turned on or after the 6th field otherwise.  
+
  ./build.csh
   −
    (A) almajor: major allele
+
= sample command line =
  (B) alminor: minor allele
  −
  (C) freq1: major allele frequency
  −
  (D) accuracy11: allelic concordance rate for homozygotes major allele
  −
  (E) accuracy12: allelic concordance rate for heterozygotes
  −
  (F) accuracy22: allelic concordance rate for homozygotes minor allele
     −
<br>
+
  ./haploxt_names -s sample.snps -h sample.hap --allelefreq --ld -w 500 -t 0.5 --coupling -o sample.out
   −
== --byPerson  ==
+
= Additional Questions =
 
+
Please email [mailto:yunli@med.unc.edu Yun Li].
generates a separate output file .byPerson and contains the following information for each person:
  −
 
  −
    (1) famid
  −
  (2) subjID
  −
  (3) gErr
  −
  (4) aErr
  −
  (5) matchedG
  −
  (6) matchedA
  −
  (7) maskedG
  −
 
  −
<br> This --byPerson option is useful if there is potential sample swap or inter-individual difference, e.g., sequencing depth, number of markers genotyped.
  −
 
  −
<br>
  −
 
  −
== --maskflag --maskped --maskdat  ==
  −
 
  −
CalcMatch compares all genotypes overlapping the two input sets. However, when --maskflag is turned on AND --maskped and --maskdat are specified (I know ...) it compares only the following subset of the overlapping genotypes: genotypes either not found (i.e., individual or marker not included) or missing (included but with value 0/0, N/N, ./. etc) in --maskped / --maskdat. These options are useful when some individuals were masked for some SNPs while others masked for a different set of SNPs.
  −
 
  −
= example command lines  =
  −
 
  −
  CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --byPerson
  −
 
  −
Will generate CalcMatch.Output.bySNP (6 fields only) and CalcMatch.Output.byPerson.
  −
 
  −
  CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --byGeno --byPerson
  −
 
  −
Will generate CalcMatch.Output.bySNP (6+20 fields) and CalcMatch.Output.byPerson.
  −
 
  −
  CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --accuracyByGeno --byPerson
  −
 
  −
Will generate CalcMatch.Output.bySNP (6+6 fields only) and CalcMatch.Output.byPerson.
  −
 
  −
  CalcMatch --trueped true.ped --truedat true.dat --impped imp.ped --impdat imp.dat -o CalcMatch.Output --accuracyByGeno --byGeno --byPerson
  −
 
  −
Will generate CalcMatch.Output.bySNP (6+20+6 fields only) and CalcMatch.Output.byPerson.
 
212

edits

Navigation menu