Changes

From Genome Analysis Wiki
Jump to navigationJump to search
962 bytes added ,  11:42, 2 February 2017
Line 4: Line 4:     
In the following sections, we will guide you through: [[#Where to Find It |how to obtain qplot]], [[#Usage |how to use qplot]], [[#Built-in example |example outputs]], [[#anchorOfInteractiveQplot |interactive diagnostic plots]], and [[#Diagnose sequencing quality |real applications]] in which qplot has helped identify sequencing problems.
 
In the following sections, we will guide you through: [[#Where to Find It |how to obtain qplot]], [[#Usage |how to use qplot]], [[#Built-in example |example outputs]], [[#anchorOfInteractiveQplot |interactive diagnostic plots]], and [[#Diagnose sequencing quality |real applications]] in which qplot has helped identify sequencing problems.
 +
 +
= Citing QPLOT =
 +
 +
If you found QPLOT useful and wants to cite in your paper, please copy and paste the information below.
 +
 +
* Bingshan Li, Xiaowei Zhan, Mary-Kate Wing, Paul Anderson, Hyun Min Kang, and Goncalo R. Abecasis, “QPLOT: A Quality Assessment Tool for Next Generation Sequencing Data,” BioMed Research International, vol. 2013, Article ID 865181, 4 pages, 2013. doi:10.1155/2013/865181  http://www.hindawi.com/journals/bmri/2013/865181/
    
= Where to Find It =
 
= Where to Find It =
Line 15: Line 21:  
== Binary Download ==
 
== Binary Download ==
   −
We have prepared a pre-compiled (under Ubuntu) qplot along with source code . You can download it from: [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot.20120602.tar.gz qplot.20120602.tar.gz (File Size: 1.7G)]  
+
We have prepared a pre-compiled (under Ubuntu) qplot along with source code . You can download it from: [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot.20130627.tar.gz qplot.20130627.tar.gz (File Size: 1.7G)]  
    
The executable file is under qplot/bin/qplot.  
 
The executable file is under qplot/bin/qplot.  
Line 25: Line 31:  
== Source Code Distribution ==
 
== Source Code Distribution ==
   −
We provide a source code only download in [http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-source.20120602.tar.gz qplot-source.20120602.tar.gz]. Optionally, you can download example file and/or data file:
+
We provide a source code only download in [http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-source.20130627.tar.gz qplot-source.20130627.tar.gz]. Optionally, you can download example file and/or data file:
   −
[http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-example.tar.gz  example]: example input file, and expected outputs if you following the [[#Built-in example | direction]].  
+
[http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-example.tar.gz  example]: example input file, and expected outputs if you following the [[#Built-in example | direction]].  
   −
[http://www.sph.umich.edu/csg/zhanxw/software/qplot/qplot-data.tar.gz resources data]: necessary input files for qplot, including NCBI human genome build v37, dbSNP 130, and pre-computed GC file with windows size 100.
+
[http://csg.sph.umich.edu//zhanxw/software/qplot/qplot-data.tar.gz resources data]: necessary input files for qplot, including NCBI human genome build v37, dbSNP 130, and pre-computed GC file with windows size 100.
    
You can put above file(s) in the same folder and follow these steps:
 
You can put above file(s) in the same folder and follow these steps:
    
* 1. Unarchive downloaded file
 
* 1. Unarchive downloaded file
  tar zvxf qplot-source.20120602.tar.gz
+
  tar zvxf qplot-source.20130627.tar.gz
    
A new folder ''qplot'' will be created.
 
A new folder ''qplot'' will be created.
Line 40: Line 46:  
* 2. Build libStatGen
 
* 2. Build libStatGen
 
  cd qplot
 
  cd qplot
  make libStatGen
+
  (cd ../libStatGen; make cloneLib)
    
This step will download a necessary software library [http://genome.sph.umich.edu/wiki/C%2B%2B_Library:_libStatGen libStatGen] and compile source code into a binary code library.
 
This step will download a necessary software library [http://genome.sph.umich.edu/wiki/C%2B%2B_Library:_libStatGen libStatGen] and compile source code into a binary code library.
    
* 3. Build qplot
 
* 3. Build qplot
  make all
+
  make  
    
This step will then build qplot. Upon success, the executable qplot can be found under qplot/bin/.
 
This step will then build qplot. Upon success, the executable qplot can be found under qplot/bin/.
Line 71: Line 77:     
   some_linux_host > qplot/bin/qplot
 
   some_linux_host > qplot/bin/qplot
   
+
    The following parameters are available. Ones with "[]" are in effect:
              References : --reference [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.v37.fa],
+
   
                          --dbsnp [/net/fantasia/home/zhanxw/software/qplot/data/dbSNP130.UCSC.coordinates.tbl],
+
   
                          --gccontent [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.w100.gc]
+
   
  Create gcContent file : --create_gc [], --winsize [100]
+
                    References : --reference [/net/fantasia/home/zhanxw/software/qplot/data/human.g1k.v37.fa],
            Region list : --regions [], --invertRegion
+
                                --dbsnp [/net/fantasia/home/zhanxw/software/qplot/data/dbSNP130.UCSC.coordinates.tbl]
            Flag filters : --read1_skip, --read2_skip, --paired_skip,
+
      GC content file options : --winsize [100]
                          --unpaired_skip
+
                  Region list : --regions [], --invertRegion
          Dup and QCFail : --dup_keep, --qcfail_keep
+
                  Flag filters : --read1_skip, --read2_skip, --paired_skip,
        Mapping filters : --minMapQuality [0.00]
+
                                --unpaired_skip
      Records to process : --first_n_record [-1]
+
                Dup and QCFail : --dup_keep, --qcfail_keep
        Lanes to process : --lanes []
+
              Mapping filters : --minMapQuality [0.00]
  Read group to process : --readGroup []
+
            Records to process : --first_n_record [-1]
      Input file options : --noeof
+
              Lanes to process : --lanes []
            Output files : --plot [], --stats [], --Rcode [], --xml []
+
        Read group to process : --readGroup []
            Plot labels : --label [], --bamLabel []
+
            Input file options : --noeof
 +
                  Output files : --plot [], --stats [], --Rcode [], --xml []
 +
                  Plot labels : --label [], --bamLabel []
 +
        Obsoleted (DO NOT USE) : --gccontent [], --create_gc
    
== Input files ==
 
== Input files ==
Line 102: Line 111:  
This file has two columns. First column is the chromosome name which must be consistent with the reference created above. Second column is 1-based SNP position. If you want to create your own dbSNP data from downloaded UCSC dbSNP file, one way to do it is: <code>cat dbsnp_129_b36.rod|grep "single" | awk '$4-$3==1' |cut -f2,4 > dbSNP_129_b36.tbl</code>  
 
This file has two columns. First column is the chromosome name which must be consistent with the reference created above. Second column is 1-based SNP position. If you want to create your own dbSNP data from downloaded UCSC dbSNP file, one way to do it is: <code>cat dbsnp_129_b36.rod|grep "single" | awk '$4-$3==1' |cut -f2,4 > dbSNP_129_b36.tbl</code>  
   −
* <code>--gccontent</code>
+
* <code> **OBSOLETED** --gccontent, --create_gc </code>
 +
 
 +
Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file.
 +
GC content file name is automatically determined in this format: <reference_genome_base_file_name>.winsize<gc_content_window_size>.gc.
 +
For example, if your reference genome is human.g1k.v37.fa and the window size is 100, then the GC content file name is: human.g1k.v37.winsize100.gc .
 +
 
 +
As it said, there is no need to use --gccontent to specify GC content file in each run.
 +
 
 +
* <code> input files </code>
   −
Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file. To generate the file, use the following command:
+
QPLOT take SAM/BAM files.
qplot --rerefence reference.fa --windowsize winsize --create_gc reference.gc
      
''Note'': Before running qplot, it is critical to check how the chromosome names are coded. Some BAM/SAM files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome names from the reference and dbSNP are consistent with the BAM/SAM files.'''
 
''Note'': Before running qplot, it is critical to check how the chromosome names are coded. Some BAM/SAM files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome names from the reference and dbSNP are consistent with the BAM/SAM files.'''
Line 123: Line 139:  
or  
 
or  
 
  --qcfail_keep
 
  --qcfail_keep
 +
    
*Records to process  
 
*Records to process  
Line 133: Line 150:     
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that the lane number is the second field with the delimiter ":".
 
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that the lane number is the second field with the delimiter ":".
 +
    
* Read group to process :  
 
* Read group to process :  
   −
Read group option can restrict qplot to process a subset of reads. For example, if BAM contain the following @RG tags:
+
The read group option can restrict qplot to process a subset of reads. For example, if the BAM contains the following @RG tags:
    
  @RG ID:UM0348_1:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM
 
  @RG ID:UM0348_1:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM
Line 147: Line 165:  
  @RG ID:UM0360_4:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM
 
  @RG ID:UM0360_4:1 PL:ILLUMINA LB:M5390 SM:M5390 CN:UM
   −
If specify nothing or not using "--readGroup", QPLOT by default will process all reads;
+
QPLOT will by default (without specifying --readgroup) process all reads.
If specify "--readGroup UM0348", then only read group UM0348_1, UM_0348_2, UM_0348_3, UM_0348_4 will be processed;
+
 
If specify "--readGroup UM0348_1", then only one read group UM0348_1 will be processed.
+
If you specify "--readGroup UM0348", then only read groups UM0348_1, UM_0348_2, UM_0348_3, UM_0348_4 will be processed.
 +
 
 +
If you specify "--readGroup UM0348_1", then only one read group, UM0348_1, will be processed.
 +
 
    
* Input file options :
 
* Input file options :
   −
BAM files are compress by BGZF algorithm and it should contain EOF by default. QPLOT will by default stop working when it does not found a valid EOF tag inside BAM files.  
+
BAM files are compressed using BGZF and should contain the EOF indicator by default. QPLOT will, by default, stop working if it does not find a valid EOF indicator inside the BAM files.  
However, you can force QPLOT to continue process using --noeof. But you should be award the input files may be corrupted.
+
However, you can force QPLOT to continue processing BAM files without an EOF indicator using --noeof. But you should be aware the input files may be corrupted.
      Line 160: Line 181:     
Qplot will exclude reads with lower mapping qualities than the user specified parameter, <code>--minMapQuality</code>. By default, mapped reads with all mapping quality will be included in the analysis.
 
Qplot will exclude reads with lower mapping qualities than the user specified parameter, <code>--minMapQuality</code>. By default, mapped reads with all mapping quality will be included in the analysis.
 +
    
*Region list
 
*Region list
96

edits

Navigation menu