Pileup

From Genome Analysis Wiki
Jump to: navigation, search


This program will be re-released soon (April, 2012).

Features:

  1. Option to include a VCF file to select chromosome locations of interest
    • Input VCF file can be gzipped or plain text.
    • The output VCF file when an input VCF file is specified will contain a subset of locations found in the input VCF file
  2. Reads BAM index file to have direct access to chromosome locations of interest
    • BAM index file is assumed to be in the same directory as the bam file with the additional extension .bai
  3. The user is responsible for ensuring that the chromosome name in the input VCF file and the BAM file are consistent with one another.

HELP SCREEN

./mgpileup -h


USAGE: 


   ./mgpileup  [-v <string>] [-i <string>] -r <string> -b <string> [-d] [--]
              [--version] [-h]




Where: 


   -v <string>,  --ouputvcf <string>
     VCF file - if the extension is .gz, the written file will be a gzip
     file, (default is STDOUT)


   -i <string>,  --inputvcf <string>
     VCF file listing the loci of interest (can be gzipped), bam index file
     is automatically assumed to be in the same location as the bam file.


   -r <string>,  --reference <string>
     (required)  Reference Sequence file


   -b <string>,  --bam <string>
     (required)  BAM file


   -d,  --adddelasbase
     Adds deletions as base


   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.


   --version
     Displays version information and exits.


   -h,  --help
     Displays usage information and exits.




   Example:


   ./mgpileup -r /data/local/ref/karma.ref/human.g1k.v37 -b
   ../data/HG00160.chrom20.ILLUMINA.bwa.GBR.low_coverage.20100517.bam -v
   HG00160.chrom20.ILLUMINA.bwa.GBR.low_coverage.20100517.vcf -i -i
   ../data/LDL_b37.modified.genos.EUR.vcf -d 


   Pileup takes in a BAM file and a genome reference file to return the
   following 


   Alignment statistics are written to the VCF file specified, if the file
   name ends with .gz, the file output will be in gzip format.


   1. CHROM   : chromosome.


   2. POS     : position on chromosome.


   3. ID      : id.


   4. REF     : Reference base in the reference genome. A,C,T,G,N.


   5. ALT     : alt.


   6. QUAL    : quality score. 


   7. FILTER  : filter.


   8. INFO    : info.


   9. FORMAT  : headers of custom data.


   a. N         : number of contigs mapped at this locus.


   b. BASE      : bases of respective contigs, '-' for deletions.


   c. MAPQ      : quality score of the mapping for the contig, this is also
   shown for deletions.


   d. BASEQ     : phred quality score, -1 for deletions.


   e. STRAND    : F - forward, R - reverse.


   f. CYCLE     : sequencing cycle, -1 for deletions.


   g. GL        : genotype likelihood scores - AA,AC,AG,AT,CC,CG,CT,GG,GT
   ,TT.


   10. <vcf output file name> : contains data described in FORMAT.