Line 8: |
Line 8: |
| | | |
| === Download from webpage === | | === Download from webpage === |
− | Through this link [http://www.sph.umich.edu/csg/jich/FallInBed/ FallInBed Download], you can download a copy of GREGOR. | + | Through this link [http://csg.sph.umich.edu/GREGOR/ GREGOR], you can download a copy of GREGOR. |
− | | |
− | === Download from GitHub with Git ===
| |
− | You can create your own git clone(copy) using:
| |
− | | |
− | git clone https://github.com/jinchen-umich/GREGOR.git
| |
− | or
| |
− | git clone git://github.com/jinchen-umich/GREGOR.git
| |
− | | |
− | Either of these two commands creates a directory called GREGOR in the current directory.
| |
− | | |
− | === Update your copy ===
| |
− | If you have already downloaded your copy, use the following commands to update:
| |
− | 1. cd pathToYourCopy/GREGOR
| |
− | 2. git pull
| |
− | | |
− | === Download From GitHub without Git ===
| |
− | If there is no git in your system, you can still download from GitHub:
| |
− | # Latest Code (master branch)
| |
− | #: via Website
| |
− | #:# Go to : https://github.com/jinchen-umich/GREGOR
| |
− | #:# Click on the <code>Download ZIP</code> button on the right side panel.
| |
− | #: via Command Line
| |
− | #:: <code>wget https://github.com/jinchen-umich/GREGOR/archive/master.zip</code>
| |
− | | |
− | After downloading the file, uncompress (unzip/untar) it. The directory created will be named <code>FallInBed</code>.
| |
| | | |
| == Build GREGOR == | | == Build GREGOR == |
Line 39: |
Line 14: |
| To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: | | To build GREGOR, copy the GREGOR package to the directory you want, and then run the following command: |
| | | |
− | tar xzvf GREGOR.tar.gz | + | tar xzvf GREGOR.v1.4.0.tar.gz |
| | | |
− | After you unzip, you can find 3 directories in "GREGOR" (./example ./lib ./script). | + | After you unzip, in the folder "GREGOR" you can find 4 directories (./Copyrights, ./example, ./lib ./script) and 2 files (README, release_version.txt). |
| | | |
| == Download reference files == | | == Download reference files == |
| + | ownload the reference files from this link [http://csg.sph.umich.edu/GREGOR/ GREGOR Download]. |
| + | |
| + | Reference files are created for the different population groups(AFR, AMR, ASN, EUR, SAN) from 1000G data (Release date : May 21, 2011). |
| + | |
| + | If your LD r2 threshold equals or greater than 0.7, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.7. |
| + | If your LD r2 threshold equals or greater than 0.2, please download reference files from category: LD window size = 1MB; LD r2 ≥ 0.2. |
| | | |
− | Download the reference files from this link [http://www.sph.umich.edu/csg/jich/FallInBed/ FallInBed Download], then un-package the file
| + | After download reference files, you need merge the part files to one gz file. Use the command line likes: |
| | | |
− | tar xzvf GREGOR.ref.tar.gz | + | cat \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.00 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.01 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.02 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.03 \ |
| + | GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz.part.04 \ |
| + | > GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| + | |
| + | Then extract this file: |
| + | tar zxvf GREGOR.AFR.ref.r2.greater.than.0.2.tar.gz |
| | | |
− | After unzip, you will get 47 reference files in the directory "~/ref"
| + | You will get one directory which has the name "AFR". |
| | | |
| == Basic Usage Example == | | == Basic Usage Example == |
Line 93: |
Line 83: |
| ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY | | ## KEY ELEMENTS TO CONFIGURE : NEED TO MODIFY |
| ############################################################################### | | ############################################################################### |
− | INDEX_SNP_FILE = /net/dumbo/home/jchen/prj/chiseq/codes/FallInBed_Binomial/example/example.index.snps.rsid.list.txt ## e.g. /home/myid/data/FallInBed/example.snp.txt | + | INDEX_SNP_FILE = /workingdirectory/example/example.index.snps.rsid.list.txt |
− | BED_FILE_INDEX = /net/dumbo/home/jchen/prj/chiseq/codes/FallInBed_Binomial/example/example.bed.file.index ## e.g. /home/myid/data/FallInBed/bedfiles.index | + | BED_FILE_INDEX = /workingdirectory/example/example.bed.file.index |
− | REF_DIR = /net/dumbo/home/jchen/prj/chiseq/codes/FallInBed_Binomial/ref/ ## reference directory | + | REF_DIR = /workingdirectory/ref/ |
| R2THRESHOLD = 0.7 | | R2THRESHOLD = 0.7 |
| LDWINDOWSIZE = 1000000 | | LDWINDOWSIZE = 1000000 |
− | OUT_DIR = /net/dumbo/home/jchen/prj/chiseq/codes/FallInBed_Binomial/example/example.rsid.20130808/ ## e.g. /home/myid/data/FallInBed/result/ | + | OUT_DIR = /workingdirectory/example/example.rsid.20130808/ |
| MIN_NEIGHBOR_NUM = 500 | | MIN_NEIGHBOR_NUM = 500 |
| BEDFILE_IS_SORTED = True | | BEDFILE_IS_SORTED = True |
− | MOSRUN = mosbatch -E/tmp -i -m2000 -j20,43,122,135,137,138,149,151,153,154,155,156,162,163 sh -c | + | POPULATION = AFR ## define the population, you can specify EUR, AFR, AMR or ASN |
| + | TOPNBEDFILES = 2 |
| + | JOBNUMBER = 10 |
| + | ############################################################################### |
| + | #BATCHTYPE = mosix ## submit jobs on MOSIX |
| + | #BATCHOPTS = -E/tmp -i -m2000 -j10,11,12,13,14,15,16,17,18,19,120,122,123,124,125 sh -c |
| + | ############################################################################### |
| + | #BATCHTYPE = slurm ## submit jobs on SLURM |
| + | #BATCHOPTS = --partition=main --time=0:30:0 |
| + | ############################################################################### |
| + | BATCHTYPE = local ## run jobs on local machine |
| + | |
| | | |
| In the config file, there are several parameters to adjust: | | In the config file, there are several parameters to adjust: |
Line 109: |
Line 110: |
| BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. | | BED_FILE_INDEX: This file lists the datasets (e.g. BED files) to be used for enrichment analysis. Use complete paths to file locations and make sure positions are in hg19 format. |
| | | |
− | REF_DIR: Define reference file directory which you download at here. | + | REF_DIR: Define reference file directory which you download at here. If your "AFR" folder is at "/home/myid/GRGORE/ref/AFR/", then define this parameter to "/home/myid/GRGORE/ref/". |
| | | |
− | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP LD proxies by R2 threshold and LD window size. | + | R2THRESHOLD and LDWINDOWSIZE: These two parameters define the index SNP (and control SNP) LD proxies by r2 threshold and LD window size. If you download r2 ≥ 0.7, you can define this number between 1 and 0.7. |
| | | |
| OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". | | OUT_DIR: All result files are saved to this folder, where the script will create multiple sub-directories. Index SNPs are in the folder "index_SNP"; Random SNPs are in the folder "random_SNP". |
| | | |
− | MIN_NEIGHBOR_NUM: Define neighbor number around index SNP. Script will find no less than this number around every index SNP. | + | MIN_NEIGHBOR_NUM: Define the minimum number of control SNPs for each index SNP. Script will find no less than this number around every index SNP. If you make this number of control SNPs very large, the control SNPs will be less closely matched on the three matching properties (distance to nearest gene, frequency and number of SNPs in LD). |
| | | |
| BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. | | BEDFILE_IS_SORTED: True or false, depending on whether the BED files listed in the index file are sorted. |
| + | |
| + | POPULATION: If you use reference file "AFR", define this to AFR. You have 5 optiones: AFR, AMR, ASN, EUR and SAN. |
| + | |
| + | GREGOR can run on local machine or on the cluster with MOSIX or SLURM. |
| + | BATCHTYPE: When you run GREGOR on local machine, specify "local"; when run on MOSIX system, specify "mosix"; when run on SLURM system, specify "slurm". |
| + | |
| + | BATCHOPTS: This parameter works with BATCHTYPE when you specify "mosix" or "slurm". For example, when you define mosix, this parameter can be "-E/tmp -i -m2000 -j10,11,12,13,14,15,16 sh -c"; when you define "slurm", it can be "--partition=1000g --time=0:30:0" |
| + | |
| + | == Reference Files == |
| + | We provide two kinds of reference files. The difference between these reference data are LD buddy definitions. |
| + | *LD window size = 1MB; LD r2 ≥ 0.7: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.7. If you want to calculate LD buddies in 1MB and r2 ≥ 0.7 (such as 0.9,0.8,0.7), please use these reference data. |
| + | *LD window size = 1MB; LD r2 ≥ 0.2: |
| + | **All LD buddies are in window size 1MB and r2 is greater than and equals to 0.2. If you want to calculate LD buddies in 1MB and r2 ≥ 0.2 (such as 0.6,0.5,0.4,0.3,0.2), please use these reference data. |
| | | |
| == Results Output == | | == Results Output == |
| The file StatisticSummaryFile.txt in the output directory contains enrichment results with the following information: | | The file StatisticSummaryFile.txt in the output directory contains enrichment results with the following information: |
| + | |
| + | [[File:GREGOR Summary 20160201.png]] |
| | | |
| Bed_File: The individual datasets used in the enrichment analysis | | Bed_File: The individual datasets used in the enrichment analysis |
Line 128: |
Line 145: |
| Pvalue: P-value calculated assuming a sum of binomial distributions to represent the number of index SNPs (or LD proxies) that overlap a dataset compared to the expectation observed in the matched control sets | | Pvalue: P-value calculated assuming a sum of binomial distributions to represent the number of index SNPs (or LD proxies) that overlap a dataset compared to the expectation observed in the matched control sets |
| | | |
− | *Note: SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. | + | *Note: |
| + | **SNPs that cannot be converted from rsID to chr:pos format are listed in the output file rsid.index.snp.txt. SNPs for which there are no LD proxies or no MAF data available are listed in the output file nonannoted.index.snp.txt. |
| + | ** If one index SNP and its LD-buddies are not in any bed region, the Pvalue could be defined to "NA" |
| | | |
| == Testing GREGOR == | | == Testing GREGOR == |