Line 24: |
Line 24: |
| capt pileup [options] | | capt pileup [options] |
| | | |
− | Required Options (Run epacts single -man or see wiki for more info): | + | Required Options (Run capt-pileup -man or see wiki for more info): |
| -loci STR Input genomic position to perform pileup | | -loci STR Input genomic position to perform pileup |
| -index STR Index file containing sample IDs and BAM file path | | -index STR Index file containing sample IDs and BAM file path |
Line 60: |
Line 60: |
| * Overlapping pair of read fragments are not specially handled by default. **It is recommended** to explicitly turn on --clip-overlap option to clip either side of overlapping read fragment to improve the filtering performance. | | * Overlapping pair of read fragments are not specially handled by default. **It is recommended** to explicitly turn on --clip-overlap option to clip either side of overlapping read fragment to improve the filtering performance. |
| * By default, it assume that it runs in one machine. If you are running in MOSIX enable cluster, mosix-nodes [node1,node2,node3,..,noden] will allow to spread the jobs to multiple nodes in parallel | | * By default, it assume that it runs in one machine. If you are running in MOSIX enable cluster, mosix-nodes [node1,node2,node3,..,noden] will allow to spread the jobs to multiple nodes in parallel |
| + | |
| + | Internally, the current implementation runs samtools to collect this pileup. We have a separate software package that handles indels and SNPs together and will replace samtools soon. |
| + | |
| + | Examples from the 1000 Genomes project is available at |
| + | cat /net/1000g/hmkang/1KG/phase3/scripts/m02-create-pileups.sh |
| + | |
| + | For example, you can modify from the following command |
| + | /net/fantasia/home/hmkang/bin/captTest/bin/capt-pileup --index /net/1000g/hmkang/1KG/phase3/index/20130502.gotcloud.low_coverage.2col.index \\ |
| + | --out /net/1000g/hmkang/1KG/phase3/wg.consensus/lcmpus/phase3.low_coverage.wgs \\ |
| + | --loci /net/1000g/hmkang/1KG/phase3/wg.consensus/union/union.snps.sites.loci \\ |
| + | --mosix-nodes 10,11,12,13 \\ |
| + | --ref /net/1000g/hmkang/1KG/phase3/gotcloud/gotcloud.ref/hs37d5.fa \\ |
| + | --clip-overlap |
| + | |
| + | Have a peek of the each input file to better understand what you actually need to prepare |
| + | $ head /net/1000g/hmkang/1KG/phase3/index/20130502.gotcloud.low_coverage.2col.index |
| + | HG00096 /net/1000g/1000g/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam |
| + | HG00097 /net/1000g/1000g/data/HG00097/alignment/HG00097.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00099 /net/1000g/1000g/data/HG00099/alignment/HG00099.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00100 /net/1000g/1000g/data/HG00100/alignment/HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00101 /net/1000g/1000g/data/HG00101/alignment/HG00101.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00102 /net/1000g/1000g/data/HG00102/alignment/HG00102.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00103 /net/1000g/1000g/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam |
| + | HG00105 /net/1000g/1000g/data/HG00105/alignment/HG00105.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | HG00106 /net/1000g/1000g/data/HG00106/alignment/HG00106.mapped.ILLUMINA.bwa.GBR.low_coverage.20121211.bam |
| + | HG00107 /net/1000g/1000g/data/HG00107/alignment/HG00107.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |
| + | |
| + | $ head /net/1000g/hmkang/1KG/phase3/wg.consensus/union/union.snps.sites.loci |
| + | 1 10002 |
| + | 1 10004 |
| + | 1 10005 |
| + | 1 10327 |
| + | 1 10469 |
| + | 1 10470 |
| + | 1 10471 |
| + | 1 10472 |
| + | 1 10473 |
| + | 1 10478 |
| + | |
| + | The output file has a bit cryptic format, but it will be readable to the downstream software. See [[http://samtools.sourceforge.net/pileup.shtml Samtools Pileup Format]] web page to understand the details of the output format. |
| + | |
| + | $ zcat /net/1000g/hmkang/1KG/phase3/wg.consensus/lcmpus/phase3.low_coverage.wgs.HG00096.txt.gz | head |
| + | 1 10327 T 4 ,.,. E@,H >>>F 76,30,36,18 |
| + | 1 10469 C 5 .g,$,. A?/;P >>>FF 78,66,100,38,12 |
| + | 1 10470 G 4 .,,. 8?CH >>FF 79,67,39,13 |
| + | 1 10471 C 4 .,,. D=:Q >>FF 80,68,40,14 |
| + | 1 10472 G 4 .,,. <DLH >>FF 81,69,41,15 |
| + | 1 10473 G 4 .,,. C=DR >>FF 82,70,42,16 |
| + | 1 10478 C 5 .,,., DJQR> >>FF> 87,75,47,21,2 |
| + | 1 10492 C 5 ,,T,, QKAA@ >FF>> 89,61,35,16,7 |
| + | 1 10494 G 5 ,,.,, QQ<G> >FF>> 91,63,37,18,9 |
| + | 1 10503 T 5 ,$,.,, /QDAG >FF>> 100,72,46,27,18 |
| + | |
| + | (TO BE CONTINUED).. |