Line 14: |
Line 14: |
| To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: | | To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: |
| | | |
− | tar xzvf GREGOR.tar.gz | + | tar xzvf GREGOR.v1.4.0.tar.gz |
| | | |
− | After you unzip, you can find 3 directories in "GREGOR" (./example ./lib ./script). | + | After you unzip, in the folder "GREGOR" you can find 4 directories (./Copyrights, ./example, ./lib ./script) and 2 files (README, release_version.txt). |
| | | |
| == Download reference files == | | == Download reference files == |
| + | ownload the reference files from this link [http://csg.sph.umich.edu/GREGOR/ GREGOR Download]. |
| | | |
− | Download the reference files from this link [http://csg.sph.umich.edu/GREGOR/ GREGOR Download], then un-package the file
| + | Reference files are created for the different population groups(AFR, AMR, ASN, EUR, SAN) from 1000G data (Release date : May 21, 2011). |
| | | |
− | tar xzvf GREGOR.ref.tar.gz
| + | If your LD r2 threshold equals or greater than 0.7, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.7. |
| + | If your LD r2 threshold equals or greater than 0.2, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.2. |
| | | |
− | After unzip, you will get 47 reference files in the directory "~/ref" | + | After download reference files, you need merge the part files to one gz file. Use the command line likes: |
| + | |
| + | cat \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.00 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.01 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.02 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.03 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.04 \ |
| + | > GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| + | |
| + | Then extract this file: |
| + | tar zxvf GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| + | |
| + | You will get one directory which has the name "AFR". |
| | | |
| == Basic Usage Example == | | == Basic Usage Example == |
Line 68: |
Line 83: |
| ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY | | ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY |
| ############################################################################### | | ############################################################################### |
− | INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt ## e.g. /workingdirectory/example/example.index.snps.rsid.list.txt | + | INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt |
− | BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index ## e.g. /workingdirectory/example/example.bed.file.index | + | BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index |
− | REF_DIR = /workingdirectory/ref/ ## reference directory | + | REF_DIR = /workingdirectory/ref/ |
| R2THRESHOLD = 0.7 | | R2THRESHOLD = 0.7 |
| LDWINDOWSIZE = 1000000 | | LDWINDOWSIZE = 1000000 |
− | OUT_DIR = /workingdirectory/example/example.rsid.20130808/ ## e.g. /workingdirectory/example/example.rsid.20130808/ | + | OUT_DIR = /workingdirectory/example/example.rsid.20130808/ |
| MIN_NEIGHBOR_NUM = 500 | | MIN_NEIGHBOR_NUM = 500 |
| BEDFILE_IS_SORTED = True | | BEDFILE_IS_SORTED = True |
− | BATCHTYPE = slurm | + | POPULATION = AFR ## define the population, you can specify EUR, AFR, AMR or ASN |
− | BATCHOPTS = --partition=main --time=0:30:0 | + | TOPNBEDFILES = 2 |
| + | JOBNUMBER = 10 |
| + | ############################################################################### |
| + | #BATCHTYPE = mosix ## submit jobs on MOSIX |
| + | #BATCHOPTS = -E/tmp -i -m2000 -j10,11,12,13,14,15,16,17,18,19,120,122,123,124,125 sh -c |
| + | ############################################################################### |
| + | #BATCHTYPE = slurm ## submit jobs on SLURM |
| + | #BATCHOPTS = --partition=main --time=0:30:0 |
| + | ############################################################################### |
| + | BATCHTYPE = local ## run jobs on local machine |
| + | |
| | | |
| In the config file, there are several parameters to adjust: | | In the config file, there are several parameters to adjust: |
Line 85: |
Line 110: |
| BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. | | BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. |
| | | |
− | REF_DIR: Define reference file directory which you download at here. | + | REF_DIR: Define reference file directory which you download at here. If your "AFR" folder is at "/home/myid/GRGORE/ref/AFR/", then define this parameter to "/home/myid/GRGORE/ref/". |
| | | |
− | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. | + | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. If you download r2 ≥ 0.7, you can define this number between 1 and 0.7. |
| | | |
| OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". | | OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". |
Line 95: |
Line 120: |
| BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. | | BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. |
| | | |
− | BATCHTYPE: We have three options for this parameter. When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm".
| + | POPULATION: If you use reference file "AFR", define this to AFR. You have 5 optiones: AFR, AMR, ASN, EUR and SAN. |
| + | |
| + | GREGOR can run on local machine or on the cluster with MOSIX or SLURM. |
| + | BATCHTYPE: When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm". |
| | | |
| BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0" | | BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0" |
| + | |
| + | == Reference Files == |
| + | We provide two kinds of reference files. The difference between these reference data are LD buddy definitions. |
| + | *LD window size = 1MB; LD r2 ≥ 0.7: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.7. If you want to calculate LD buddies in 1MB and r2 ≥ 0.7 (such as 0.9,0.8,0.7), please use these reference data. |
| + | *LD window size = 1MB; LD r2 ≥ 0.2: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.2. If you want to calculate LD buddies in 1MB and r2 ≥ 0.2 (such as 0.6,0.5,0.4,0.3,0.2), please use these reference data. |
| | | |
| == Results Output == | | == Results Output == |
Line 112: |
Line 147: |
| *Note: | | *Note: |
| **SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. | | **SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. |
| + | ** If one index SNP and its LD-buddies are not in any bed region, the Pvalue could be defined to "NA" |
| | | |
| == Testing GREGOR == | | == Testing GREGOR == |