Pileup
From Genome Analysis Wiki
Jump to navigationJump to searchFeatures:
- Option to include a VCF file to select chromosome locations of interest
- Input VCF file can be gzipped or plain text.
- The output VCF file when an input VCF file is specified will contain a subset of locations found in the input VCF file
- Reads BAM index file to have direct access to chromosome locations of interest
- BAM index file is assumed to be in the same directory as the bam file with the additional extension .bai
- The user is responsible for ensuring that the chromosome name in the input VCF file and the BAM file are consistent with one another.
HELP SCREEN
./mgpileup -h USAGE: ./mgpileup [-v <string>] [-i <string>] -r <string> -b <string> [-d] [--] [--version] [-h] Where: -v <string>, --ouputvcf <string> VCF file - if the extension is .gz, the written file will be a gzip file, (default is STDOUT) -i <string>, --inputvcf <string> VCF file listing the loci of interest (can be gzipped), bam index file is automatically assumed to be in the same location as the bam file. -r <string>, --reference <string> (required) Reference Sequence file -b <string>, --bam <string> (required) BAM file -d, --adddelasbase Adds deletions as base --, --ignore_rest Ignores the rest of the labeled arguments following this flag. --version Displays version information and exits. -h, --help Displays usage information and exits. Example: ./mgpileup -r /data/local/ref/karma.ref/human.g1k.v37 -b ../data/HG00160.chrom20.ILLUMINA.bwa.GBR.low_coverage.20100517.bam -v HG00160.chrom20.ILLUMINA.bwa.GBR.low_coverage.20100517.vcf -i -i ../data/LDL_b37.modified.genos.EUR.vcf -d Pileup takes in a BAM file and a genome reference file to return the following Alignment statistics are written to the VCF file specified, if the file name ends with .gz, the file output will be in gzip format. 1. CHROM : chromosome. 2. POS : position on chromosome. 3. ID : id. 4. REF : Reference base in the reference genome. A,C,T,G,N. 5. ALT : alt. 6. QUAL : quality score. 7. FILTER : filter. 8. INFO : info. 9. FORMAT : headers of custom data. a. N : number of contigs mapped at this locus. b. BASE : bases of respective contigs, '-' for deletions. c. MAPQ : quality score of the mapping for the contig, this is also shown for deletions. d. BASEQ : phred quality score, -1 for deletions. e. STRAND : F - forward, R - reverse. f. CYCLE : sequencing cycle, -1 for deletions. g. GL : genotype likelihood scores - AA,AC,AG,AT,CC,CG,CT,GG,GT ,TT. 10. <vcf output file name> : contains data described in FORMAT.