Line 4: |
Line 4: |
| | | |
| In the following sections, we will guide you through: [[#Where to Find It |how to obtain qplot]], [[#Usage |how to use qplot]], [[#Built-in example |example outputs]], [[#anchorOfInteractiveQplot |interactive diagnostic plots]], and [[#Diagnose sequencing quality |real applications]] in which qplot has helped identify sequencing problems. | | In the following sections, we will guide you through: [[#Where to Find It |how to obtain qplot]], [[#Usage |how to use qplot]], [[#Built-in example |example outputs]], [[#anchorOfInteractiveQplot |interactive diagnostic plots]], and [[#Diagnose sequencing quality |real applications]] in which qplot has helped identify sequencing problems. |
| + | |
| + | = Citing QPLOT = |
| + | |
| + | If you found QPLOT useful and wants to cite in your paper, please copy and paste the information below. |
| + | |
| + | * Bingshan Li, Xiaowei Zhan, Mary-Kate Wing, Paul Anderson, Hyun Min Kang, and Goncalo R. Abecasis, “QPLOT: A Quality Assessment Tool for Next Generation Sequencing Data,” BioMed Research International, vol. 2013, Article ID 865181, 4 pages, 2013. doi:10.1155/2013/865181 http://www.hindawi.com/journals/bmri/2013/865181/ |
| | | |
| = Where to Find It = | | = Where to Find It = |
Line 15: |
Line 21: |
| == Binary Download == | | == Binary Download == |
| | | |
− | We have prepared a pre-compiled (under Ubuntu) qplot along with source code . You can download it from: [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot.20120602.tar.gz qplot.20120602.tar.gz (File Size: 1.7G)] | + | We have prepared a pre-compiled (under Ubuntu) qplot along with source code . You can download it from: [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot.20130627.tar.gz qplot.20130627.tar.gz (File Size: 1.7G)] |
| | | |
| The executable file is under qplot/bin/qplot. | | The executable file is under qplot/bin/qplot. |
Line 25: |
Line 31: |
| == Source Code Distribution == | | == Source Code Distribution == |
| | | |
− | We provide a source code only download in [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-source.20120602.tar.gz qplot-source.20120602.tar.gz]. Optionally, you can download example file and/or data file: | + | We provide a source code only download in [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-source.20130627.tar.gz qplot-source.20130627.tar.gz]. Optionally, you can download example file and/or data file: |
| | | |
− | [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-example.tar.gz example]: example input file, and expected outputs if you following the [[#Built-in example | direction]]. | + | [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-example.tar.gz example]: example input file, and expected outputs if you following the [[#Built-in example | direction]]. |
| | | |
− | [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-data.tar.gz resources data]: necessary input files for qplot, including NCBI human genome build v37, dbSNP 130, and pre-computed GC file with windows size 100. | + | [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-data.tar.gz resources data]: necessary input files for qplot, including NCBI human genome build v37, dbSNP 130, and pre-computed GC file with windows size 100. |
| | | |
| You can put above file(s) in the same folder and follow these steps: | | You can put above file(s) in the same folder and follow these steps: |
| | | |
| * 1. Unarchive downloaded file | | * 1. Unarchive downloaded file |
− | tar zvxf qplot-source.20120602.tar.gz | + | tar zvxf qplot-source.20130627.tar.gz |
| | | |
| A new folder ''qplot'' will be created. | | A new folder ''qplot'' will be created. |
Line 40: |
Line 46: |
| * 2. Build libStatGen | | * 2. Build libStatGen |
| cd qplot | | cd qplot |
− | make libStatGen | + | (cd ../libStatGen; make cloneLib) |
| | | |
| This step will download a necessary software library [http://genome.sph.umich.edu/wiki/C%2B%2B_Library:_libStatGen libStatGen] and compile source code into a binary code library. | | This step will download a necessary software library [http://genome.sph.umich.edu/wiki/C%2B%2B_Library:_libStatGen libStatGen] and compile source code into a binary code library. |
| | | |
| * 3. Build qplot | | * 3. Build qplot |
− | make all | + | make |
| | | |
| This step will then build qplot. Upon success, the executable qplot can be found under qplot/bin/. | | This step will then build qplot. Upon success, the executable qplot can be found under qplot/bin/. |
Line 71: |
Line 77: |
| | | |
| some_linux_host > qplot/bin/qplot | | some_linux_host > qplot/bin/qplot |
− | | + | The following parameters are available. Ones with "[]" are in effect: |
− | References : --reference [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.v37.fa],
| + | |
− | --dbsnp [/net/fantasia/home/zhanxw/software/qplot/data/dbSNP130.UCSC.coordinates.tbl],
| + | |
− | --gccontent [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.w100.gc]
| + | |
− | Create gcContent file : --create_gc [], --winsize [100]
| + | References : --reference [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.v37.fa], |
− | Region list : --regions [], --invertRegion
| + | --dbsnp [/net/fantasia/home/zhanxw/software/qplot/data/dbSNP130.UCSC.coordinates.tbl] |
− | Flag filters : --read1_skip, --read2_skip, --paired_skip,
| + | GC content file options : --winsize [100] |
− | --unpaired_skip
| + | Region list : --regions [], --invertRegion |
− | Dup and QCFail : --dup_keep, --qcfail_keep
| + | Flag filters : --read1_skip, --read2_skip, --paired_skip, |
− | Mapping filters : --minMapQuality [0.00]
| + | --unpaired_skip |
− | Records to process : --first_n_record [-1]
| + | Dup and QCFail : --dup_keep, --qcfail_keep |
− | Lanes to process : --lanes []
| + | Mapping filters : --minMapQuality [0.00] |
− | Read group to process : --readGroup []
| + | Records to process : --first_n_record [-1] |
− | Input file options : --noeof
| + | Lanes to process : --lanes [] |
− | Output files : --plot [], --stats [], --Rcode [], --xml []
| + | Read group to process : --readGroup [] |
− | Plot labels : --label [], --bamLabel []
| + | Input file options : --noeof |
| + | Output files : --plot [], --stats [], --Rcode [], --xml [] |
| + | Plot labels : --label [], --bamLabel [] |
| + | Obsoleted (DO NOT USE) : --gccontent [], --create_gc |
| | | |
| == Input files == | | == Input files == |
Line 102: |
Line 111: |
| This file has two columns. First column is the chromosome name which must be consistent with the reference created above. Second column is 1-based SNP position. If you want to create your own dbSNP data from downloaded UCSC dbSNP file, one way to do it is: <code>cat dbsnp_129_b36.rod|grep "single" | awk '$4-$3==1' |cut -f2,4 > dbSNP_129_b36.tbl</code> | | This file has two columns. First column is the chromosome name which must be consistent with the reference created above. Second column is 1-based SNP position. If you want to create your own dbSNP data from downloaded UCSC dbSNP file, one way to do it is: <code>cat dbsnp_129_b36.rod|grep "single" | awk '$4-$3==1' |cut -f2,4 > dbSNP_129_b36.tbl</code> |
| | | |
− | * <code>--gccontent</code> | + | * <code> **OBSOLETED** --gccontent, --create_gc </code> |
| | | |
− | Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file. To generate the file, use the following command: | + | Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file. |
− | qplot --rerefence reference.fa --windowsize winsize --create_gc reference.gc
| + | GC content file name is automatically determined in this format: <reference_genome_base_file_name>.winsize<gc_content_window_size>.gc. |
| + | For example, if your reference genome is human.g1k.v37.fa and the window size is 100, then the GC content file name is: human.g1k.v37.winsize100.gc . |
| + | |
| + | As it said, there is no need to use --gccontent to specify GC content file in each run. |
| + | |
| + | * <code> input files </code> |
| + | |
| + | QPLOT take SAM/BAM files. |
| | | |
| ''Note'': Before running qplot, it is critical to check how the chromosome names are coded. Some BAM/SAM files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome names from the reference and dbSNP are consistent with the BAM/SAM files.''' | | ''Note'': Before running qplot, it is critical to check how the chromosome names are coded. Some BAM/SAM files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome names from the reference and dbSNP are consistent with the BAM/SAM files.''' |
Line 123: |
Line 139: |
| or | | or |
| --qcfail_keep | | --qcfail_keep |
| + | |
| | | |
| *Records to process | | *Records to process |
Line 133: |
Line 150: |
| | | |
| '''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that the lane number is the second field with the delimiter ":". | | '''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that the lane number is the second field with the delimiter ":". |
| + | |
| | | |
| * Read group to process : | | * Read group to process : |
| | | |
− | Read group option can restrict qplot to process a subset of reads. For example, if BAM contain the following @RG tags:
| + | The read group option can restrict qplot to process a subset of reads. For example, if the BAM contains the following @RG tags: |
| | | |
| @RG ID:UM0348_1:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM | | @RG ID:UM0348_1:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM |
Line 147: |
Line 165: |
| @RG ID:UM0360_4:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM | | @RG ID:UM0360_4:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM |
| | | |
− | If specify nothing or not using "--readGroup", QPLOT by default will process all reads;
| + | QPLOT will by default (without specifying --readgroup) process all reads. |
− | If specify "--readGroup UM0348", then only read group UM0348_1, UM_0348_2, UM_0348_3, UM_0348_4 will be processed; | + | |
− | If specify "--readGroup UM0348_1", then only one read group UM0348_1 will be processed. | + | If you specify "--readGroup UM0348", then only read groups UM0348_1, UM_0348_2, UM_0348_3, UM_0348_4 will be processed. |
| + | |
| + | If you specify "--readGroup UM0348_1", then only one read group, UM0348_1, will be processed. |
| + | |
| | | |
| * Input file options : | | * Input file options : |
| | | |
− | BAM files are compress by BGZF algorithm and it should contain EOF by default. QPLOT will by default stop working when it does not found a valid EOF tag inside BAM files. | + | BAM files are compressed using BGZF and should contain the EOF indicator by default. QPLOT will, by default, stop working if it does not find a valid EOF indicator inside the BAM files. |
− | However, you can force QPLOT to continue process using --noeof. But you should be award the input files may be corrupted. | + | However, you can force QPLOT to continue processing BAM files without an EOF indicator using --noeof. But you should be aware the input files may be corrupted. |
| | | |
| | | |
Line 160: |
Line 181: |
| | | |
| Qplot will exclude reads with lower mapping qualities than the user specified parameter, <code>--minMapQuality</code>. By default, mapped reads with all mapping quality will be included in the analysis. | | Qplot will exclude reads with lower mapping qualities than the user specified parameter, <code>--minMapQuality</code>. By default, mapped reads with all mapping quality will be included in the analysis. |
| + | |
| | | |
| *Region list | | *Region list |
Line 279: |
Line 301: |
| = Contact = | | = Contact = |
| | | |
− | Questions and requests should be sent to Bingshan Li ([mailto:bingshan@umich.edu bingshan@umich.edu]) or Goncalo Abecasis ([mailto:goncalo@umich.edu goncalo@umich.edu]) | + | Questions and requests should be sent to Bingshan Li ([mailto:bingshan@umich.edu bingshan@umich.edu]) or Xiaowei Zhan ([mailto:zhanxw@umich.edu zhanxw@umich.edu]) or Goncalo Abecasis ([mailto:goncalo@umich.edu goncalo@umich.edu]) |