Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,922 bytes added ,  14:11, 10 September 2013
Line 24: Line 24:  
==Approach==
 
==Approach==
 
'''famRvTest''' uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.
 
'''famRvTest''' uses linear mixed model approach, incorporating efficient optimization algorithm, to account for familial relationship, where kinship is either quantified based upon pedigree structures or estimated from genotypes of markers from genome-wide. Single marker associations including score, likelihood ratio and ward tests and gene-level associations methods (weighted and un-weighted burden, SKAT and variable threshold tests) have been implemented. Manuscript is under preparation.
 +
 +
== Input Files ==
 +
Rare-Metal-Worker needs the following files as input: PED and DAT file in Merlin format, '''AND/OR''' a VCF file. When genotypes are stored in PED and DAT file, the VCF file is not needed. However, even if genotypes are saved in a VCF file, PED and DAT files are still needed for carrying covariate and trait information.
 +
 +
=== PED and DAT Files ===
 +
* When PED file has genotypes saved, there is no need for a VCF file as input.
 +
* Rare-Metal-Worker takes PED/DAT file in Merlin format. Please refer to [[http://www.sph.umich.edu/csg/abecasis/merlin/tour/input_files.html PED/DAT format description]] for details.
 +
* An example PED file is in the following:
 +
    1 1 0 0 1 1.5 1 23 A A A A A A A A A A
 +
    2 1 0 0 1 1.0 1 34 A C A C A C A C A C
 +
    3 1 0 0 2 0.4 1 43 A A A A A A A A A A
 +
    4 1 0 0 2 0.9 1 13 A C A C A C A C A C
 +
* The matching DAT file is in the following:
 +
  T YourTraitName
 +
  C SEX
 +
  C AGE
 +
  M 1:123456a
 +
  M 1:234567
 +
  M 2:111111
 +
  M 2:222222
 +
  M X:12345
 +
* DAT file must have variant names in the following format "M chr:pos".
 +
* Orders of labels in DAT file have to match the order of fields in PED file.
 +
* '''Markers in PED and DAT file must be sorted by chromosome and position.'''
 +
 +
* Covariate and trait values are saved in PED file. Covariate and trait descriptions are saved in DAT file.
 +
 +
=== VCF File ===
 +
* Another option is to use VCF as input. Please refer to the following link for VCF file specification: [[http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 1000 genome wiki VCF specs]]
 +
* VCF file should be compressed by bgzip and indexed by tabix, using the following command:
 +
  bgzip input.vcf    ## this command will produce input.vcf.gz
 +
  tabix -pvcf -f input.vcf.gz  ## this command will produce input.vcf.gz.tbi
 +
* Even with the presence of VCF file, PED/DAT files are still needed for covariates and phenotypes.
    
== Command References ==
 
== Command References ==
2,004

edits

Navigation menu