
From Genome Analysis Wiki
Jump to navigationJump to search
3,754 bytes added ,  01:26, 8 January 2013
no edit summary
Line 19: Line 19:  
==Aligning a Sample==
==Aligning a Sample==
As an example, we can analyze the sample files used in the automatic test.
As an example, we can align the sample files used in the automatic test.
To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly.
To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly.
Line 58: Line 58:  
  PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20
  PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20
If you are in the test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line.
If you are in the {ROOT_DIR}/test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line. If you are not running this from {ROOT_DIR}/test/align, make sure your configuration and index files are in the same directory.
===Running the alignment pipeline===
===Running the alignment pipeline===
Line 70: Line 70:  
  {ROOT_DIR}/bin/ --conf test.conf --out_dir {OUT_DIR}
  {ROOT_DIR}/bin/ --conf test.conf --out_dir {OUT_DIR}
where {OUT_DIR} is the directory in which you wish to store the resulting BAM files.
where {OUT_DIR} is the directory in which you wish to store the resulting BAM files (for example, ~/out).
If everything went well, you will see the following messages:
If everything went well, you will see the following messages:
  Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample2.Makefile
  Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile
  Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample1.Makefile
  Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile
  Run the following commands:
  Run the following commands:
  make -f {OUTDIR}/Makefiles/biopipe_Sample2.Makefile > {OUTDIR}/Makefiles/biopipe_Sample2.Makefile.log
  make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log
  make -f {OUTDIR}/Makefiles/biopipe_Sample1.Makefile > {OUTDIR}/Makefiles/biopipe_Sample1.Makefile.log
  make -f {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile.log
where {OUTDIR} will be replaced with the directory you entered above.
where {OUT_DIR} will be replaced with the directory you entered above.
====Running the Makefiles====
====Running the Makefiles====
To run a Makefile, simply enter one-by-one the commands generated in the previous step. The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory.
To run a Makefile, simply enter one-by-one the commands generated in the previous step. If you wish to run the alignment in the background, add "&" after the make command, as follows:
make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log &
The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory.
==Analyzing a Sample==
==Analyzing a Sample==
Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file.
Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file.  Once again, we can analyze BAM files used in the automatic test.  You will need three files for this: index, configuration, and bed.
===Index file===
===Index file===
First, you need a list of all the BAM files to be analyzed. Conveniently, the a test index file (umake_test.index) already exists in {ROOT_DIR}/test/umake/.  It contains the following information:
NA12272 ALL    bams/NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
NA12004 ALL    bams/NA12004.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
NA12874 ALL    bams/NA12874.mapped.LS454.ssaha2.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
You can use this file directly if you change your current directory to {ROOT_DIR}/test/umake/.
Alternately, if you want to copy and use this index file to a different directory, you can create a symbolic link to the bams folder as follows:
ln -s {ROOT_DIR}/test/umake/bams bams
===BED file===
This file contains a single line:
chr20  20000050        20300000
You can copy this to the current directory and use it as-is.
===Configuration file===
A configuration file (umake_test.conf) already exists in {ROOT_DIR}/test/umake/.  It contains the following information:
CHRS = 20
TEST_ROOT = $(UMAKE_ROOT)/test/umake
BAM_INDEX = $(TEST_ROOT)/umake_test.index
OUT_PREFIX = umake_test
REF = $(REF_ROOT)/karma.ref/human.g1k.v37.chr20.fa
INDEL_PREFIX = $(REF_ROOT)/indels/1kg.pilot_release.merged.indels.sites.hg19
DBSNP_PREFIX =  $(REF_ROOT)/dbSNP/dbsnp_135_b37.rod
HM3_PREFIX =  $(REF_ROOT)/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly
RUN_INDEX = TRUE        # create BAM index file
RUN_PILEUP = TRUE      # create GLF file from BAM
RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls
RUN_VCFPILEUP = TRUE    # create PVCF files using vcfPileup and run infoCollector
RUN_FILTER = TRUE      # filter SNPs using vcfCooker
RUN_SPLIT = TRUE        # split SNPs into chunks for genotype refinement
WRITE_TARGET_LOCI = TRUE  # FOR TARGETED SEQUENCING ONLY -- Write loci file when performing pileup
UNIFORM_TARGET_BED = $(TEST_ROOT)/umake_test.bed # Targeted sequencing : When all individuals has the same target. Otherwise, comment it out
OFFSET_OFF_TARGET = 50 # Extend target by given # of bases
MULTIPLE_TARGET_MAP =  # Target per individual : Each line contains [SM_ID] [TARGET_BED]
TARGET_DIR = target    # Directory to store target information
SAMTOOLS_VIEW_TARGET_ONLY = TRUE # When performing samtools view, exclude off-target regions (may make command line too long)
If you are running this from a different directory, you will want to change some of the lines as follows:
BAM_INDEX = {CURRENT_DIR}/umake_test.index
where {CURRENT_DIR} is the absolute path to the directory that contains the index and bed files.  You may also want to change the name of the output folder:
OUT_PREFIX = {output_prefix}
where {output_prefix} is the name of the folder in which you want the output to be stored.


Navigation menu