Line 19: |
Line 19: |
| ==Aligning a Sample== | | ==Aligning a Sample== |
| | | |
− | As an example, we can analyze the sample files used in the automatic test. | + | As an example, we can align the sample files used in the automatic test. |
| | | |
| To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly. | | To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly. |
Line 58: |
Line 58: |
| PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20 | | PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20 |
| | | |
− | If you are in the test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line. | + | If you are in the {ROOT_DIR}/test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line. If you are not running this from {ROOT_DIR}/test/align, make sure your configuration and index files are in the same directory. |
| | | |
| ===Running the alignment pipeline=== | | ===Running the alignment pipeline=== |
Line 70: |
Line 70: |
| {ROOT_DIR}/bin/gen_biopipeline.pl --conf test.conf --out_dir {OUT_DIR} | | {ROOT_DIR}/bin/gen_biopipeline.pl --conf test.conf --out_dir {OUT_DIR} |
| | | |
− | where {OUT_DIR} is the directory in which you wish to store the resulting BAM files. | + | where {OUT_DIR} is the directory in which you wish to store the resulting BAM files (for example, ~/out). |
| | | |
| If everything went well, you will see the following messages: | | If everything went well, you will see the following messages: |
| | | |
− | Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample2.Makefile | + | Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile |
− | Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample1.Makefile | + | Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile |
| -------------------------------------------------------------------- | | -------------------------------------------------------------------- |
| Run the following commands: | | Run the following commands: |
| | | |
− | make -f {OUTDIR}/Makefiles/biopipe_Sample2.Makefile > {OUTDIR}/Makefiles/biopipe_Sample2.Makefile.log | + | make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log |
− | make -f {OUTDIR}/Makefiles/biopipe_Sample1.Makefile > {OUTDIR}/Makefiles/biopipe_Sample1.Makefile.log | + | make -f {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile.log |
| | | |
− | where {OUTDIR} will be replaced with the directory you entered above. | + | where {OUT_DIR} will be replaced with the directory you entered above. |
| | | |
| ====Running the Makefiles==== | | ====Running the Makefiles==== |
| | | |
− | To run a Makefile, simply enter one-by-one the commands generated in the previous step. The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory. | + | To run a Makefile, simply enter one-by-one the commands generated in the previous step. If you wish to run the alignment in the background, add "&" after the make command, as follows: |
| + | |
| + | make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log & |
| + | |
| + | The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory. |
| | | |
| | | |
| ==Analyzing a Sample== | | ==Analyzing a Sample== |
| | | |
− | Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file. | + | Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file. Once again, we can analyze BAM files used in the automatic test. You will need three files for this: index, configuration, and bed. |
| | | |
| ===Index file=== | | ===Index file=== |
| + | |
| + | First, you need a list of all the BAM files to be analyzed. Conveniently, the a test index file (umake_test.index) already exists in {ROOT_DIR}/test/umake/. It contains the following information: |
| + | |
| + | NA12272 ALL bams/NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam |
| + | NA12004 ALL bams/NA12004.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam |
| + | ... |
| + | NA12874 ALL bams/NA12874.mapped.LS454.ssaha2.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam |
| + | |
| + | You can use this file directly if you change your current directory to {ROOT_DIR}/test/umake/. |
| + | |
| + | Alternately, if you want to copy and use this index file to a different directory, you can create a symbolic link to the bams folder as follows: |
| + | |
| + | ln -s {ROOT_DIR}/test/umake/bams bams |
| + | |
| + | ===BED file=== |
| + | |
| + | This file contains a single line: |
| + | |
| + | chr20 20000050 20300000 |
| + | |
| + | You can copy this to the current directory and use it as-is. |
| + | |
| + | ===Configuration file=== |
| + | |
| + | A configuration file (umake_test.conf) already exists in {ROOT_DIR}/test/umake/. It contains the following information: |
| + | |
| + | CHRS = 20 |
| + | TEST_ROOT = $(UMAKE_ROOT)/test/umake |
| + | BAM_INDEX = $(TEST_ROOT)/umake_test.index |
| + | OUT_PREFIX = umake_test |
| + | REF_ROOT = $(TEST_ROOT)/ref |
| + | # |
| + | REF = $(REF_ROOT)/karma.ref/human.g1k.v37.chr20.fa |
| + | INDEL_PREFIX = $(REF_ROOT)/indels/1kg.pilot_release.merged.indels.sites.hg19 |
| + | DBSNP_PREFIX = $(REF_ROOT)/dbSNP/dbsnp_135_b37.rod |
| + | HM3_PREFIX = $(REF_ROOT)/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly |
| + | # |
| + | RUN_INDEX = TRUE # create BAM index file |
| + | RUN_PILEUP = TRUE # create GLF file from BAM |
| + | RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls |
| + | RUN_VCFPILEUP = TRUE # create PVCF files using vcfPileup and run infoCollector |
| + | RUN_FILTER = TRUE # filter SNPs using vcfCooker |
| + | RUN_SPLIT = TRUE # split SNPs into chunks for genotype refinement |
| + | RUN_BEAGLE = FALSE # BEAGLE - MUST SET AFTER FINISHING PREVIOUS STEPS |
| + | RUN_SUBSET = FALSE # SUBSET FOR THUNDER - MAY BE SET WITH BEAGLE STEP TOGETHER |
| + | RUN_THUNDER = FALSE # THUNDER - MUST SET AFTER FINISHING PREVIOUS STEPS |
| + | ############################################################################### |
| + | WRITE_TARGET_LOCI = TRUE # FOR TARGETED SEQUENCING ONLY -- Write loci file when performing pileup |
| + | UNIFORM_TARGET_BED = $(TEST_ROOT)/umake_test.bed # Targeted sequencing : When all individuals has the same target. Otherwise, comment it out |
| + | OFFSET_OFF_TARGET = 50 # Extend target by given # of bases |
| + | MULTIPLE_TARGET_MAP = # Target per individual : Each line contains [SM_ID] [TARGET_BED] |
| + | TARGET_DIR = target # Directory to store target information |
| + | SAMTOOLS_VIEW_TARGET_ONLY = TRUE # When performing samtools view, exclude off-target regions (may make command line too long) |
| + | |
| + | If you are running this from a different directory, you will want to change some of the lines as follows: |
| + | |
| + | BAM_INDEX = {CURRENT_DIR}/umake_test.index |
| + | UNIFORM_TARGET_BED = {CURRENT_DIR}/umake_test.bed |
| + | |
| + | where {CURRENT_DIR} is the absolute path to the directory that contains the index and bed files. You may also want to change the name of the output folder: |
| + | |
| + | OUT_PREFIX = {output_prefix} |
| + | |
| + | where {output_prefix} is the name of the folder in which you want the output to be stored. |