Changes

From Genome Analysis Wiki
Jump to navigationJump to search
417 bytes added ,  16:01, 14 February 2012
Line 31: Line 31:     
== Command line ==
 
== Command line ==
After you obtain qplot executable (either from compiling source codes or download pre-compiled binary file), you will find executable file under qplot/bin/qplot. Here is the qplot help page
+
After you obtain the qplot executable (either by compiling the source code or by downloading the pre-compiled binary file), you will find the executable file under qplot/bin/qplot.  
 +
 
 +
Here is the qplot help page:
    
   some_linux_host > qplot/bin/qplot
 
   some_linux_host > qplot/bin/qplot
Line 52: Line 54:  
== Input files ==
 
== Input files ==
   −
Three (3) precomputed files are required. Multiple bam/sam files should be appended after all other parameters.
+
qplot runs on the input BAM/SAM file(s) specified on the command-line after all other parameters.
 +
 
 +
Additoinally, three (3) precomputed files are required.  
   −
* --reference
+
* <code>--reference</code>
    
The reference genome is the same as karma reference genome. If the index files do not exist, qplot will create the index files using the input reference fasta file.
 
The reference genome is the same as karma reference genome. If the index files do not exist, qplot will create the index files using the input reference fasta file.
   −
* --dbsnp
+
* <code>--dbsnp</code>
   −
This file has two columns. First column is the chromosome name wich have to be consistent with the reference created above.
+
This file has two columns. First column is the chromosome name which must be consistent with the reference created above.
   −
* --gccontent
+
* <code>--gccontent</code>
   −
Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file. To generate the file, use the following command
+
Although GC content can be calculated on the fly each time, it is much more efficient to load a precomputed GC content from a file. To generate the file, use the following command:
 
  qplot --rerefence reference.fa --windowsize winsize --create_gc reference.gc
 
  qplot --rerefence reference.fa --windowsize winsize --create_gc reference.gc
   −
''Note'': Before running the qplot, it is critical to check how the chromosome numbers are coded. Some bam files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome numbers from reference and dbSNP are consistent with the bam file.'''
+
''Note'': Before running qplot, it is critical to check how the chromosome numbers are coded. Some bam files use just numbers, others use chr + numbers. '''You need to make sure that the chromosome numbers from the reference and dbSNP are consistent with the bam file.'''
    
== Parameters ==
 
== Parameters ==
   −
Most of command line parameters are self explanatory and some of them are described here
+
Some of the command line parameters are described here, but most are self explanatory.
    
*Flag filter
 
*Flag filter
   −
By default all reads are processed. If it is desired to check only the first read of a pair, use --read2_skip to ignore the second read. And so on.
+
By default all reads are processed. If it is desired to check only the first read of a pair, use <code>--read2_skip</code> to ignore the second read. And so on.
    
*Duplication and QCFail
 
*Duplication and QCFail
Line 85: Line 89:     
*Records to process  
 
*Records to process  
This option will enable qplot to read the first '''n''' reads to test the bam files and check whether it works.
+
The <code>--first_n_record</code> option followed by a number, '''n''', will enable qplot to read the first '''n''' reads to test the bam files and verify it works.
    
* Lanes to process
 
* Lanes to process
   −
If the input bam files have more than one lane and only some of them need to be checked, they can be specified by --lanes 1,3,5 whatever the number of lanes needed.
+
If the input bam files have more than one lane and only some of them need to be checked, use something like <code>--lanes 1,3,5</code> to specify that only lanes 1, 3, and 5 need to be checked.
   −
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that lane number is the second field with the delimit of ":".
+
'''NOTE''' In order for this to work, the lane info has to be encoded in the read name such that lane number is the second field with the delimiter ":".
    
*Mapping filters
 
*Mapping filters
   −
Qplot will exclude reads with lower mapping qualities than user specified parameter. By default, all reads will be included in analysis.
+
Qplot will exclude reads with lower mapping qualities than the user specified parameter, <code>--minMapQuality</code>. By default, all reads will be included in analysis.
    
*Region list
 
*Region list
   −
If the interest of qplot is a list of regions, e.g. exons, this can be achieved by providing a list of region. The regions should be in the form of "chr start end label" each line in the file. In order for this option to work, within each chromosome (contig) the regions have to be sorted by starting position, and also the input bam files have to be sorted.  
+
If the interest of qplot is a list of regions, e.g. exons, this can be achieved by providing a list of regions. The regions should be in the form of "chr start end label" each line in the file. In order for this option to work, within each chromosome (contig) the regions have to be sorted by starting position, and also the input bam files have to be sorted.  
 
  1 100 500 region_A
 
  1 100 500 region_A
 
  1 600 800 region_B
 
  1 600 800 region_B
Line 105: Line 109:  
  ...
 
  ...
   −
Qplot also provide --invertRegion option. Enabling this option will let qplot calculate those sequence bases that are off the given region.
+
Qplot also provides the <code>--invertRegion</code> option. Enabling this option tells qplot to operate on those sequence bases that are outside the given region.
       
* Plot labels
 
* Plot labels
   −
Two kinds of labels are enabled. First one is the label for the plot (default is empty), e.g. label on the title of each subplot. Second one is a set of labels for each input bam files, e.g. sample ID (default is numbers 1, 2, ... until the number of input bam files. For example:
+
Two kinds of labels are enabled. <code>--label</code> is the label for the plot (default is empty) which is prepended to the title of each subplot. <code>--bamLabels</code> followed by a column separated list of labels provides the labels for each input SAM/BAM file, e.g. sample ID (default is numbers 1, 2, ... until the number of input bam files). For example:
 
  --label Run100 --bamLabels s1,s2,s3,s4,s5,s6,s7,s8
 
  --label Run100 --bamLabels s1,s2,s3,s4,s5,s6,s7,s8
    
* Multiple threading (not officially supported)
 
* Multiple threading (not officially supported)
   −
Number of concurrent threads running for the input bam files. One bam file will be processed by one thread. Therefore using a number which is dividable by the number of input bam files will make it efficient. One extra thread requires memory about 375Mb on top of around 4Gb memory used to hold reference and GC content file.
+
Number of concurrent threads running for the input bam files. One bam file at a time will be processed by one thread. Therefore using a number which is dividable by the number of input bam files will make it more efficient. One extra thread requires about 375Mb more memory on top of the around 4Gb of memory used to hold reference and GC content files.
    
== Output files ==
 
== Output files ==
    
There are three (optional) output files.
 
There are three (optional) output files.
* --plot ''qa.pdf''
+
* <code>--plot ''qa.pdf''</code>
   −
Qplot will generate a PDF file named ''qa.pdf'' containing 2 pages each with 4 figures. If --pages 1 is specified, only page 1 is output. The plot is generated using Rscript.
+
Qplot will generate a PDF file named ''qa.pdf'' containing 2 pages each with 4 figures. If <code>--pages 1</code> is specified, only page 1 is output. The plot is generated using Rscript.
   −
* --stats ''qa.stats''
+
* <code>--stats ''qa.stats''</code>
   −
Qplot will generate a text file names ''qa.stats'' containing various summary statistics for each input bam/sam file.
+
Qplot will generate a text file named ''qa.stats'' containing various summary statistics for each input BAM/SAM file.
   −
* --Rcode ''qa.R''
+
* <code>--Rcode ''qa.R''</code>
   −
Qplot will generate ''qa.R'' which is R code used for plotting the figures in ''qa.pdf'' file. If Rscript is not installed in the system, you can use the qa.R to generate the figures in other machines, or extract plotting data from each run and combine multiple runs together to generate more comprehensive plots (See [[Example]]).
+
Qplot will generate ''qa.R'' which is R code used for plotting the figures in ''qa.pdf'' file. If Rscript is not installed in the system, you can use the qa.R to generate the figures on other machines, or extract plotting data from each run and combine multiple runs together to generate more comprehensive plots (See [[Example]]).
    
= Example =
 
= Example =

Navigation menu