Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 25: Line 25:  
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.   
 
In order to run this tutorial, you need to make sure you have GotCloud installed on your system.   
   −
Try to run the following commands sequentially to follow the instruction
+
Try to run the following commands sequentially to follow the instruction.
 +
 
 +
First, let's set an environment variable for your convenience.
    
  % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
 
  % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/
 +
 +
The raw sequence reads can be found by
    
  % ls ${GC}/examples/fastq
 
  % ls ${GC}/examples/fastq
Line 33: Line 37:  
  SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
 
  SRR035022_2.fastq.gz  SRR035023_2.fastq.gz  SRR035024_2.fastq.gz  SRR035025_2.fastq.gz  SRR035026_2.fastq.gz  SRR035027_2.fastq.gz  SRR035669_2.fastq.gz  SRR622461_2.fastq.gz
 
  SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz
 
  SRR035022.fastq.gz    SRR035023.fastq.gz    SRR035024.fastq.gz    SRR035025.fastq.gz    SRR035026.fastq.gz    SRR035027.fastq.gz    SRR035669.fastq.gz    SRR622461.fastq.gz
 +
 +
This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR.
 +
 +
The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling).
    
  % ls ${GC}/examples
 
  % ls ${GC}/examples
 
  bams  chr7Ref  fastq  index
 
  bams  chr7Ref  fastq  index
 +
 +
In order to run alignment pipeline, we need an index file for FASTQ files.
    
  % cat ${GC}/examples/index/chr7.CFTR.fastq.index
 
  % cat ${GC}/examples/index/chr7.CFTR.fastq.index
Line 55: Line 65:  
  NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 
  NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 
  NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 
  NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA
 +
 +
Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples).
    
  % cat ${GC}/examples/index/chr7.CFTR.align.conf
 
  % cat ${GC}/examples/index/chr7.CFTR.align.conf
Line 65: Line 77:  
  DBSNP_VCF =  $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 
  DBSNP_VCF =  $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz
 
  HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 
  HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz
 +
 +
Create a test directory and enter it for running alignment pipeline
    
  % mkdir test
 
  % mkdir test
Line 86: Line 100:  
  Processing finished in 42 secs with no errors reported
 
  Processing finished in 42 secs with no errors reported
   −
Examine the QC metrics by
+
Let's see where the BAM files are generated
    
  % ls align/bams
 
  % ls align/bams
 
  NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
 
  NA06984.recal.bam      NA06984.recal.bam.bai.done  NA06984.recal.bam.metrics  NA12878.recal.bam      NA12878.recal.bam.bai.done  NA12878.recal.bam.metrics
 
  NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp    NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp
 
  NA06984.recal.bam.bai  NA06984.recal.bam.done      NA06984.recal.bam.qemp    NA12878.recal.bam.bai  NA12878.recal.bam.done      NA12878.recal.bam.qemp
+
 
 +
Examine the QC metrics by
 +
 
 
  % ls align/QCFiles
 
  % ls align/QCFiles
 
  NA06984.genoCheck.depthRG  NA06984.genoCheck.selfRG  NA06984.qplot.R            NA12878.genoCheck.depthSM  NA12878.genoCheck.selfSM  NA12878.qplot.stats
 
  NA06984.genoCheck.depthRG  NA06984.genoCheck.selfRG  NA06984.qplot.R            NA12878.genoCheck.depthSM  NA12878.genoCheck.selfSM  NA12878.qplot.stats
 
  NA06984.genoCheck.depthSM  NA06984.genoCheck.selfSM  NA06984.qplot.stats        NA12878.genoCheck.done    NA12878.qplot.done
 
  NA06984.genoCheck.depthSM  NA06984.genoCheck.selfSM  NA06984.qplot.stats        NA12878.genoCheck.done    NA12878.qplot.done
 
  NA06984.genoCheck.done    NA06984.qplot.done        NA12878.genoCheck.depthRG  NA12878.genoCheck.selfRG  NA12878.qplot.R
 
  NA06984.genoCheck.done    NA06984.qplot.done        NA12878.genoCheck.depthRG  NA12878.genoCheck.selfRG  NA12878.qplot.R
 +
 +
Text-version of QC metrics looks as follows
    
  % cat align/QCFiles/NA06984.qplot.stats
 
  % cat align/QCFiles/NA06984.qplot.stats
Line 124: Line 142:  
  BaseComp_T(%) 31.6
 
  BaseComp_T(%) 31.6
 
  BaseComp_O(%) 0.0
 
  BaseComp_O(%) 0.0
 +
 +
You can also run the R command to create PDF files (see lecture slides for examples).
 +
 +
To check whether the samples are contaminated, look at the verifyBamID results.
    
  % cat align/QCFiles/NA06984.genoCheck.selfSM
 
  % cat align/QCFiles/NA06984.genoCheck.selfSM

Navigation menu