Line 8: |
Line 8: |
| | | |
| === Download from webpage === | | === Download from webpage === |
− | Through this link [http://gvt.sph.umich.edu/GREGOR/ GREGOR], you can download a copy of GREGOR. | + | Through this link [http://csg.sph.umich.edu/GREGOR/ GREGOR], you can download a copy of GREGOR. |
| | | |
| == Build GREGOR == | | == Build GREGOR == |
Line 14: |
Line 14: |
| To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: | | To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: |
| | | |
− | tar xzvf GREGOR.tar.gz | + | tar xzvf GREGOR.v1.4.0.tar.gz |
| | | |
− | After you unzip, you can find 3 directories in "GREGOR" (./example ./lib ./script). | + | After you unzip, in the folder "GREGOR" you can find 4 directories (./Copyrights, ./example, ./lib ./script) and 2 files (README, release_version.txt). |
| | | |
| == Download reference files == | | == Download reference files == |
| + | ownload the reference files from this link [http://csg.sph.umich.edu/GREGOR/ GREGOR Download]. |
| | | |
− | Download the reference files from this link [http://www.sph.umich.edu/csg/jich/GREGOR/ GREGOR Download], then un-package the file
| + | Reference files are created for the different population groups(AFR, AMR, ASN, EUR, SAN) from 1000G data (Release date : May 21, 2011). |
| | | |
− | tar xzvf GREGOR.ref.tar.gz
| + | If your LD r2 threshold equals or greater than 0.7, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.7. |
| + | If your LD r2 threshold equals or greater than 0.2, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.2. |
| | | |
− | After unzip, you will get 47 reference files in the directory "~/ref" | + | After download reference files, you need merge the part files to one gz file. Use the command line likes: |
| + | |
| + | cat \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.00 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.01 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.02 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.03 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.04 \ |
| + | > GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| + | |
| + | Then extract this file: |
| + | tar zxvf GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| + | |
| + | You will get one directory which has the name "AFR". |
| | | |
| == Basic Usage Example == | | == Basic Usage Example == |
Line 68: |
Line 83: |
| ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY | | ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY |
| ############################################################################### | | ############################################################################### |
− | INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt ## e.g. /workingdirectory/example/example.index.snps.rsid.list.txt | + | INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt |
− | BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index ## e.g. /workingdirectory/example/example.bed.file.index | + | BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index |
− | REF_DIR = /workingdirectory/ref/ ## reference directory | + | REF_DIR = /workingdirectory/ref/ |
| R2THRESHOLD = 0.7 | | R2THRESHOLD = 0.7 |
| LDWINDOWSIZE = 1000000 | | LDWINDOWSIZE = 1000000 |
− | OUT_DIR = /workingdirectory/example/example.rsid.20130808/ ## e.g. /workingdirectory/example/example.rsid.20130808/ | + | OUT_DIR = /workingdirectory/example/example.rsid.20130808/ |
| MIN_NEIGHBOR_NUM = 500 | | MIN_NEIGHBOR_NUM = 500 |
| BEDFILE_IS_SORTED = True | | BEDFILE_IS_SORTED = True |
− | MOSRUN = mosbatch -E/tmp -i -m2000 -j20,43,122,135,137,138,149,151,153,154,155,156,162,163 sh -c | + | POPULATION = AFR ## define the population, you can specify EUR, AFR, AMR or ASN |
| + | TOPNBEDFILES = 2 |
| + | JOBNUMBER = 10 |
| + | ############################################################################### |
| + | #BATCHTYPE = mosix ## submit jobs on MOSIX |
| + | #BATCHOPTS = -E/tmp -i -m2000 -j10,11,12,13,14,15,16,17,18,19,120,122,123,124,125 sh -c |
| + | ############################################################################### |
| + | #BATCHTYPE = slurm ## submit jobs on SLURM |
| + | #BATCHOPTS = --partition=main --time=0:30:0 |
| + | ############################################################################### |
| + | BATCHTYPE = local ## run jobs on local machine |
| + | |
| | | |
| In the config file, there are several parameters to adjust: | | In the config file, there are several parameters to adjust: |
Line 84: |
Line 110: |
| BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. | | BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. |
| | | |
− | REF_DIR: Define reference file directory which you download at here. | + | REF_DIR: Define reference file directory which you download at here. If your "AFR" folder is at "/home/myid/GRGORE/ref/AFR/", then define this parameter to "/home/myid/GRGORE/ref/". |
| | | |
− | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. | + | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. If you download r2 ≥ 0.7, you can define this number between 1 and 0.7. |
| | | |
| OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". | | OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". |
Line 93: |
Line 119: |
| | | |
| BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. | | BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. |
| + | |
| + | POPULATION: If you use reference file "AFR", define this to AFR. You have 5 optiones: AFR, AMR, ASN, EUR and SAN. |
| + | |
| + | GREGOR can run on local machine or on the cluster with MOSIX or SLURM. |
| + | BATCHTYPE: When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm". |
| + | |
| + | BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0" |
| + | |
| + | == Reference Files == |
| + | We provide two kinds of reference files. The difference between these reference data are LD buddy definitions. |
| + | *LD window size = 1MB; LD r2 ≥ 0.7: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.7. If you want to calculate LD buddies in 1MB and r2 ≥ 0.7 (such as 0.9,0.8,0.7), please use these reference data. |
| + | *LD window size = 1MB; LD r2 ≥ 0.2: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.2. If you want to calculate LD buddies in 1MB and r2 ≥ 0.2 (such as 0.6,0.5,0.4,0.3,0.2), please use these reference data. |
| | | |
| == Results Output == | | == Results Output == |
| The file StatisticSummaryFile.txt in the output directory contains enrichment results with the following information: | | The file StatisticSummaryFile.txt in the output directory contains enrichment results with the following information: |
| + | |
| + | [[File:GREGOR Summary 20160201.png]] |
| | | |
| Bed_File: The individual datasets used in the enrichment analysis | | Bed_File: The individual datasets used in the enrichment analysis |
Line 103: |
Line 145: |
| Pvalue: P-value calculated assuming a sum of binomial distributions to represent the number of index SNPs (or LD proxies) that overlap a dataset compared to the expectation observed in the matched control sets | | Pvalue: P-value calculated assuming a sum of binomial distributions to represent the number of index SNPs (or LD proxies) that overlap a dataset compared to the expectation observed in the matched control sets |
| | | |
− | *Note: SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. | + | *Note: |
| + | **SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. |
| + | ** If one index SNP and its LD-buddies are not in any bed region, the Pvalue could be defined to "NA" |
| | | |
| == Testing GREGOR == | | == Testing GREGOR == |