Line 25: |
Line 25: |
| In order to run this tutorial, you need to make sure you have GotCloud installed on your system. | | In order to run this tutorial, you need to make sure you have GotCloud installed on your system. |
| | | |
− | Try to run the following commands sequentially to follow the instruction | + | Try to run the following commands sequentially to follow the instruction. |
| + | |
| + | First, let's set an environment variable for your convenience. |
| | | |
| % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/ | | % export GC=/home/presenter02/day2/session1/uwcmg_2013_08/ |
| + | |
| + | The raw sequence reads can be found by |
| | | |
| % ls ${GC}/examples/fastq | | % ls ${GC}/examples/fastq |
Line 33: |
Line 37: |
| SRR035022_2.fastq.gz SRR035023_2.fastq.gz SRR035024_2.fastq.gz SRR035025_2.fastq.gz SRR035026_2.fastq.gz SRR035027_2.fastq.gz SRR035669_2.fastq.gz SRR622461_2.fastq.gz | | SRR035022_2.fastq.gz SRR035023_2.fastq.gz SRR035024_2.fastq.gz SRR035025_2.fastq.gz SRR035026_2.fastq.gz SRR035027_2.fastq.gz SRR035669_2.fastq.gz SRR622461_2.fastq.gz |
| SRR035022.fastq.gz SRR035023.fastq.gz SRR035024.fastq.gz SRR035025.fastq.gz SRR035026.fastq.gz SRR035027.fastq.gz SRR035669.fastq.gz SRR622461.fastq.gz | | SRR035022.fastq.gz SRR035023.fastq.gz SRR035024.fastq.gz SRR035025.fastq.gz SRR035026.fastq.gz SRR035027.fastq.gz SRR035669.fastq.gz SRR622461.fastq.gz |
| + | |
| + | This set of FASTQ are extracted from two samples from the 1000 Genomes project, focusing around 500kb region are CFTR. |
| + | |
| + | The example directory contain other set of files, such as reference sequence, index, and BAM files (for variant calling). |
| | | |
| % ls ${GC}/examples | | % ls ${GC}/examples |
| bams chr7Ref fastq index | | bams chr7Ref fastq index |
| + | |
| + | In order to run alignment pipeline, we need an index file for FASTQ files. |
| | | |
| % cat ${GC}/examples/index/chr7.CFTR.fastq.index | | % cat ${GC}/examples/index/chr7.CFTR.fastq.index |
Line 55: |
Line 65: |
| NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA | | NA12878 fastq/SRR622461.fastq.gz . SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA |
| NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA | | NA12878 fastq/SRR622461_1.fastq.gz fastq/SRR622461_2.fastq.gz SRR622461 NA12878 Illumina_NA12878 ILLUMINA ILLUMINA |
| + | |
| + | Also, we need a configuration file for GotCloud (default configuration usually works if you install GotCloud on your own, but here we're using reduced examples). |
| | | |
| % cat ${GC}/examples/index/chr7.CFTR.align.conf | | % cat ${GC}/examples/index/chr7.CFTR.align.conf |
Line 65: |
Line 77: |
| DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz | | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.chr7.CFTR.vcf.gz |
| HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz | | HM3_VCF = $(REF_DIR)/hapmap_3.3.b37.sites.chr7.CFTR.vcf.gz |
| + | |
| + | Create a test directory and enter it for running alignment pipeline |
| | | |
| % mkdir test | | % mkdir test |
Line 86: |
Line 100: |
| Processing finished in 42 secs with no errors reported | | Processing finished in 42 secs with no errors reported |
| | | |
− | Examine the QC metrics by
| + | Let's see where the BAM files are generated |
| | | |
| % ls align/bams | | % ls align/bams |
| NA06984.recal.bam NA06984.recal.bam.bai.done NA06984.recal.bam.metrics NA12878.recal.bam NA12878.recal.bam.bai.done NA12878.recal.bam.metrics | | NA06984.recal.bam NA06984.recal.bam.bai.done NA06984.recal.bam.metrics NA12878.recal.bam NA12878.recal.bam.bai.done NA12878.recal.bam.metrics |
| NA06984.recal.bam.bai NA06984.recal.bam.done NA06984.recal.bam.qemp NA12878.recal.bam.bai NA12878.recal.bam.done NA12878.recal.bam.qemp | | NA06984.recal.bam.bai NA06984.recal.bam.done NA06984.recal.bam.qemp NA12878.recal.bam.bai NA12878.recal.bam.done NA12878.recal.bam.qemp |
− |
| + | |
| + | Examine the QC metrics by |
| + | |
| % ls align/QCFiles | | % ls align/QCFiles |
| NA06984.genoCheck.depthRG NA06984.genoCheck.selfRG NA06984.qplot.R NA12878.genoCheck.depthSM NA12878.genoCheck.selfSM NA12878.qplot.stats | | NA06984.genoCheck.depthRG NA06984.genoCheck.selfRG NA06984.qplot.R NA12878.genoCheck.depthSM NA12878.genoCheck.selfSM NA12878.qplot.stats |
| NA06984.genoCheck.depthSM NA06984.genoCheck.selfSM NA06984.qplot.stats NA12878.genoCheck.done NA12878.qplot.done | | NA06984.genoCheck.depthSM NA06984.genoCheck.selfSM NA06984.qplot.stats NA12878.genoCheck.done NA12878.qplot.done |
| NA06984.genoCheck.done NA06984.qplot.done NA12878.genoCheck.depthRG NA12878.genoCheck.selfRG NA12878.qplot.R | | NA06984.genoCheck.done NA06984.qplot.done NA12878.genoCheck.depthRG NA12878.genoCheck.selfRG NA12878.qplot.R |
| + | |
| + | Text-version of QC metrics looks as follows |
| | | |
| % cat align/QCFiles/NA06984.qplot.stats | | % cat align/QCFiles/NA06984.qplot.stats |
Line 124: |
Line 142: |
| BaseComp_T(%) 31.6 | | BaseComp_T(%) 31.6 |
| BaseComp_O(%) 0.0 | | BaseComp_O(%) 0.0 |
| + | |
| + | You can also run the R command to create PDF files (see lecture slides for examples). |
| + | |
| + | To check whether the samples are contaminated, look at the verifyBamID results. |
| | | |
| % cat align/QCFiles/NA06984.genoCheck.selfSM | | % cat align/QCFiles/NA06984.genoCheck.selfSM |