Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,360 bytes added ,  17:39, 15 November 2016
Line 14: Line 14:  
To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command:
 
To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command:
   −
   tar xzvf GREGOR.tar.gz
+
   tar xzvf GREGOR.v1.4.0.tar.gz
   −
After you unzip, you can find 3 directories in "GREGOR" (./example ./lib  ./script).
+
After you unzip, in the folder "GREGOR" you can find 4 directories (./Copyrights, ./example, ./lib  ./script) and 2 files (README, release_version.txt).
    
== Download reference files ==
 
== Download reference files ==
 +
ownload the reference files from this link [http://csg.sph.umich.edu/GREGOR/  GREGOR Download].
   −
Download the reference files from this link [http://csg.sph.umich.edu/GREGOR/  GREGOR Download], then un-package the file
+
Reference files are created for the different population groups(AFR, AMR, ASN, EUR, SAN) from 1000G data (Release date : May 21, 2011).
   −
  tar xzvf GREGOR.ref.tar.gz
+
If your LD r2 threshold equals or greater than 0.7, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.7.
 +
If your LD r2 threshold equals or greater than 0.2, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.2.
   −
After unzip, you will get 47 reference files in the directory "~/ref"
+
After download reference files, you need merge the part files to one gz file. Use the command line likes:
 +
 
 +
  cat \
 +
    GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.00 \
 +
    GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.01 \
 +
    GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.02 \
 +
    GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.03 \
 +
    GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.04 \
 +
    > GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz
 +
 +
Then extract this file:
 +
  tar zxvf GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz
 +
 
 +
You will get one directory which has the name "AFR".
    
== Basic Usage Example ==
 
== Basic Usage Example ==
Line 68: Line 83:  
   ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY
 
   ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY
 
   ###############################################################################
 
   ###############################################################################
   INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt    ## e.g. /workingdirectory/example/example.index.snps.rsid.list.txt
+
   INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt
   BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index ## e.g. /workingdirectory/example/example.bed.file.index
+
   BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index  
   REF_DIR = /workingdirectory/ref/ ## reference directory
+
   REF_DIR = /workingdirectory/ref/
 
   R2THRESHOLD = 0.7
 
   R2THRESHOLD = 0.7
 
   LDWINDOWSIZE = 1000000
 
   LDWINDOWSIZE = 1000000
   OUT_DIR = /workingdirectory/example/example.rsid.20130808/ ## e.g. /workingdirectory/example/example.rsid.20130808/
+
   OUT_DIR = /workingdirectory/example/example.rsid.20130808/
 
   MIN_NEIGHBOR_NUM = 500
 
   MIN_NEIGHBOR_NUM = 500
 
   BEDFILE_IS_SORTED = True
 
   BEDFILE_IS_SORTED = True
   BATCHTYPE = slurm
+
   POPULATION = AFR  ## define the population, you can specify EUR, AFR, AMR or ASN
   BATCHOPTS = --partition=main --time=0:30:0
+
  TOPNBEDFILES = 2
 +
  JOBNUMBER = 10
 +
  ###############################################################################
 +
  #BATCHTYPE = mosix ##  submit jobs on MOSIX
 +
  #BATCHOPTS = -E/tmp -i -m2000 -j10,11,12,13,14,15,16,17,18,19,120,122,123,124,125 sh -c
 +
  ###############################################################################
 +
  #BATCHTYPE = slurm   ##  submit jobs on SLURM
 +
   #BATCHOPTS = --partition=main --time=0:30:0
 +
  ###############################################################################
 +
  BATCHTYPE = local ##  run jobs on local machine
 +
 
    
In the config file, there are several parameters to adjust:
 
In the config file, there are several parameters to adjust:
Line 85: Line 110:  
BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format.
 
BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format.
   −
REF_DIR: Define reference file directory which you download at here.
+
REF_DIR: Define reference file directory which you download at here. If your "AFR" folder is at "/home/myid/GRGORE/ref/AFR/", then define this parameter to "/home/myid/GRGORE/ref/".
   −
R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size.
+
R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. If you download r2 ≥ 0.7, you can define this number between 1 and 0.7.
    
OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP".  
 
OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP".  
Line 95: Line 120:  
BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted.
 
BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted.
   −
BATCHTYPE: We have three options for this parameter. When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm".
+
POPULATION: If you use reference file "AFR", define this to AFR. You have 5 optiones: AFR, AMR, ASN, EUR and SAN.
 +
 
 +
GREGOR can run on local machine or on the cluster with MOSIX or SLURM.
 +
BATCHTYPE: When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm".
    
BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0"
 
BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0"
 +
 +
== Reference Files  ==
 +
We provide two kinds of reference files. The difference between these reference data are LD buddy definitions.
 +
*LD window size = 1MB; LD r2 ≥ 0.7:
 +
**All LD buddies are in window size 1MB and r2 is greater than and equals to 0.7. If you want to calculate LD buddies in 1MB and r2 ≥ 0.7 (such as 0.9,0.8,0.7), please use these reference data.
 +
*LD window size = 1MB; LD r2 ≥ 0.2:
 +
**All LD buddies are in window size 1MB and r2 is greater than and equals to 0.2. If you want to calculate LD buddies in 1MB and r2 ≥ 0.2 (such as 0.6,0.5,0.4,0.3,0.2), please use these reference data.
    
== Results Output ==
 
== Results Output ==
Line 112: Line 147:  
*Note:   
 
*Note:   
 
**SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt.  SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt.
 
**SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt.  SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt.
 +
** If one index SNP and its LD-buddies are not in any bed region, the Pvalue could be defined to "NA"
    
== Testing GREGOR ==
 
== Testing GREGOR ==
66

edits

Navigation menu