Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,754 bytes added ,  01:26, 8 January 2013
no edit summary
Line 19: Line 19:  
==Aligning a Sample==
 
==Aligning a Sample==
   −
As an example, we can analyze the sample files used in the automatic test.
+
As an example, we can align the sample files used in the automatic test.
    
To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly.
 
To make this easier, change to the {ROOT_DIR}/test/align directory. (We will call the directory in which GotCloud is installed "{ROOT_DIR}".) It contains an index file and a configuration file that can be used directly.
Line 58: Line 58:  
  PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20
 
  PLINK = $(REF_DIR)/hapmap_3.3.b37.chr20
   −
If you are in the test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line.
+
If you are in the {ROOT_DIR}/test/align directory, you can use this file as-is. If you are using a different index file, make sure your index file is named correctly in the first line. If you are not running this from {ROOT_DIR}/test/align, make sure your configuration and index files are in the same directory.
    
===Running the alignment pipeline===
 
===Running the alignment pipeline===
Line 70: Line 70:  
  {ROOT_DIR}/bin/gen_biopipeline.pl --conf test.conf --out_dir {OUT_DIR}
 
  {ROOT_DIR}/bin/gen_biopipeline.pl --conf test.conf --out_dir {OUT_DIR}
   −
where {OUT_DIR} is the directory in which you wish to store the resulting BAM files.
+
where {OUT_DIR} is the directory in which you wish to store the resulting BAM files (for example, ~/out).
    
If everything went well, you will see the following messages:
 
If everything went well, you will see the following messages:
   −
  Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample2.Makefile
+
  Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile
  Finished creating makefile {OUTDIR}/Makefiles/biopipe_Sample1.Makefile
+
  Finished creating makefile {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile
 
  --------------------------------------------------------------------
 
  --------------------------------------------------------------------
 
  Run the following commands:
 
  Run the following commands:
 
   
 
   
  make -f {OUTDIR}/Makefiles/biopipe_Sample2.Makefile > {OUTDIR}/Makefiles/biopipe_Sample2.Makefile.log
+
  make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log
  make -f {OUTDIR}/Makefiles/biopipe_Sample1.Makefile > {OUTDIR}/Makefiles/biopipe_Sample1.Makefile.log
+
  make -f {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample1.Makefile.log
   −
where {OUTDIR} will be replaced with the directory you entered above.
+
where {OUT_DIR} will be replaced with the directory you entered above.
    
====Running the Makefiles====
 
====Running the Makefiles====
   −
To run a Makefile, simply enter one-by-one the commands generated in the previous step. The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory.
+
To run a Makefile, simply enter one-by-one the commands generated in the previous step. If you wish to run the alignment in the background, add "&" after the make command, as follows:
 +
 
 +
make -f {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile > {OUT_DIR}/Makefiles/biopipe_Sample2.Makefile.log &
 +
 
 +
The log files for the runs will be found in the Makefiles directory, while the BAM files will be found in the {OUT_DIR}/alignment.recal directory.
       
==Analyzing a Sample==
 
==Analyzing a Sample==
   −
Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file.
+
Using umake, you can analyze the BAM files generated in the previous step and generate a VCF file.  Once again, we can analyze BAM files used in the automatic test.  You will need three files for this: index, configuration, and bed.
    
===Index file===
 
===Index file===
 +
 +
First, you need a list of all the BAM files to be analyzed. Conveniently, the a test index file (umake_test.index) already exists in {ROOT_DIR}/test/umake/.  It contains the following information:
 +
 +
NA12272 ALL    bams/NA12272.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
 +
NA12004 ALL    bams/NA12004.mapped.ILLUMINA.bwa.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
 +
...
 +
NA12874 ALL    bams/NA12874.mapped.LS454.ssaha2.CEU.low_coverage.20101123.chrom20.20000001.20300000.bam
 +
 +
You can use this file directly if you change your current directory to {ROOT_DIR}/test/umake/.
 +
 +
Alternately, if you want to copy and use this index file to a different directory, you can create a symbolic link to the bams folder as follows:
 +
 +
ln -s {ROOT_DIR}/test/umake/bams bams
 +
 +
===BED file===
 +
 +
This file contains a single line:
 +
 +
chr20  20000050        20300000
 +
 +
You can copy this to the current directory and use it as-is.
 +
 +
===Configuration file===
 +
 +
A configuration file (umake_test.conf) already exists in {ROOT_DIR}/test/umake/.  It contains the following information:
 +
 +
CHRS = 20
 +
TEST_ROOT = $(UMAKE_ROOT)/test/umake
 +
BAM_INDEX = $(TEST_ROOT)/umake_test.index
 +
OUT_PREFIX = umake_test
 +
REF_ROOT = $(TEST_ROOT)/ref
 +
#
 +
REF = $(REF_ROOT)/karma.ref/human.g1k.v37.chr20.fa
 +
INDEL_PREFIX = $(REF_ROOT)/indels/1kg.pilot_release.merged.indels.sites.hg19
 +
DBSNP_PREFIX =  $(REF_ROOT)/dbSNP/dbsnp_135_b37.rod
 +
HM3_PREFIX =  $(REF_ROOT)/HapMap3/hapmap3_r3_b37_fwd.consensus.qc.poly
 +
#
 +
RUN_INDEX = TRUE        # create BAM index file
 +
RUN_PILEUP = TRUE      # create GLF file from BAM
 +
RUN_GLFMULTIPLES = TRUE # create unfiltered SNP calls
 +
RUN_VCFPILEUP = TRUE    # create PVCF files using vcfPileup and run infoCollector
 +
RUN_FILTER = TRUE      # filter SNPs using vcfCooker
 +
RUN_SPLIT = TRUE        # split SNPs into chunks for genotype refinement
 +
RUN_BEAGLE = FALSE  # BEAGLE - MUST SET AFTER FINISHING PREVIOUS STEPS
 +
RUN_SUBSET = FALSE  # SUBSET FOR THUNDER - MAY BE SET WITH BEAGLE STEP TOGETHER
 +
RUN_THUNDER = FALSE # THUNDER - MUST SET AFTER FINISHING PREVIOUS STEPS
 +
###############################################################################
 +
WRITE_TARGET_LOCI = TRUE  # FOR TARGETED SEQUENCING ONLY -- Write loci file when performing pileup
 +
UNIFORM_TARGET_BED = $(TEST_ROOT)/umake_test.bed # Targeted sequencing : When all individuals has the same target. Otherwise, comment it out
 +
OFFSET_OFF_TARGET = 50 # Extend target by given # of bases
 +
MULTIPLE_TARGET_MAP =  # Target per individual : Each line contains [SM_ID] [TARGET_BED]
 +
TARGET_DIR = target    # Directory to store target information
 +
SAMTOOLS_VIEW_TARGET_ONLY = TRUE # When performing samtools view, exclude off-target regions (may make command line too long)
 +
 +
If you are running this from a different directory, you will want to change some of the lines as follows:
 +
 +
BAM_INDEX = {CURRENT_DIR}/umake_test.index
 +
UNIFORM_TARGET_BED = {CURRENT_DIR}/umake_test.bed
 +
 +
where {CURRENT_DIR} is the absolute path to the directory that contains the index and bed files.  You may also want to change the name of the output folder:
 +
 +
OUT_PREFIX = {output_prefix}
 +
 +
where {output_prefix} is the name of the folder in which you want the output to be stored.
75

edits

Navigation menu