Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 24: Line 24:  
     capt pileup [options]
 
     capt pileup [options]
 
   
 
   
     Required Options (Run epacts single -man or see wiki for more info):
+
     Required Options (Run capt-pileup -man or see wiki for more info):
 
       -loci STR        Input genomic position to perform pileup
 
       -loci STR        Input genomic position to perform pileup
 
       -index STR        Index file containing sample IDs and BAM file path
 
       -index STR        Index file containing sample IDs and BAM file path
Line 60: Line 60:  
* Overlapping pair of read fragments are not specially handled by default. **It is recommended** to explicitly turn on --clip-overlap option to clip either side of overlapping read fragment to improve the filtering performance.
 
* Overlapping pair of read fragments are not specially handled by default. **It is recommended** to explicitly turn on --clip-overlap option to clip either side of overlapping read fragment to improve the filtering performance.
 
* By default, it assume that it runs in one machine. If you are running in MOSIX enable cluster, mosix-nodes [node1,node2,node3,..,noden] will allow to spread the jobs to multiple nodes in parallel
 
* By default, it assume that it runs in one machine. If you are running in MOSIX enable cluster, mosix-nodes [node1,node2,node3,..,noden] will allow to spread the jobs to multiple nodes in parallel
 +
 +
Internally, the current implementation runs samtools to collect this pileup. We have a separate software package that handles indels and SNPs together and will replace samtools soon.
 +
 +
Examples from the 1000 Genomes project is available at
 +
  cat /net/1000g/hmkang/1KG/phase3/scripts/m02-create-pileups.sh
 +
 +
For example, you can modify from the following command
 +
/net/fantasia/home/hmkang/bin/captTest/bin/capt-pileup --index /net/1000g/hmkang/1KG/phase3/index/20130502.gotcloud.low_coverage.2col.index \\
 +
    --out /net/1000g/hmkang/1KG/phase3/wg.consensus/lcmpus/phase3.low_coverage.wgs \\
 +
    --loci /net/1000g/hmkang/1KG/phase3/wg.consensus/union/union.snps.sites.loci \\
 +
    --mosix-nodes 10,11,12,13 \\
 +
    --ref /net/1000g/hmkang/1KG/phase3/gotcloud/gotcloud.ref/hs37d5.fa \\
 +
    --clip-overlap
 +
 +
Have a peek of the each input file to better understand what you actually need to prepare
 +
$ head /net/1000g/hmkang/1KG/phase3/index/20130502.gotcloud.low_coverage.2col.index
 +
HG00096 /net/1000g/1000g/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
 +
HG00097 /net/1000g/1000g/data/HG00097/alignment/HG00097.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00099 /net/1000g/1000g/data/HG00099/alignment/HG00099.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00100 /net/1000g/1000g/data/HG00100/alignment/HG00100.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00101 /net/1000g/1000g/data/HG00101/alignment/HG00101.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00102 /net/1000g/1000g/data/HG00102/alignment/HG00102.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00103 /net/1000g/1000g/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
 +
HG00105 /net/1000g/1000g/data/HG00105/alignment/HG00105.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
HG00106 /net/1000g/1000g/data/HG00106/alignment/HG00106.mapped.ILLUMINA.bwa.GBR.low_coverage.20121211.bam
 +
HG00107 /net/1000g/1000g/data/HG00107/alignment/HG00107.mapped.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
 +
 +
$ head /net/1000g/hmkang/1KG/phase3/wg.consensus/union/union.snps.sites.loci
 +
1 10002
 +
1 10004
 +
1 10005
 +
1 10327
 +
1 10469
 +
1 10470
 +
1 10471
 +
1 10472
 +
1 10473
 +
1 10478
 +
 +
The output file has a bit cryptic format, but it will be readable to the downstream software. See [[http://samtools.sourceforge.net/pileup.shtml  Samtools Pileup Format]] web page to understand the details of the output format.
 +
 +
$ zcat /net/1000g/hmkang/1KG/phase3/wg.consensus/lcmpus/phase3.low_coverage.wgs.HG00096.txt.gz | head
 +
1 10327 T 4 ,.,. E@,H >>>F 76,30,36,18
 +
1 10469 C 5 .g,$,. A?/;P >>>FF 78,66,100,38,12
 +
1 10470 G 4 .,,. 8?CH >>FF 79,67,39,13
 +
1 10471 C 4 .,,. D=:Q >>FF 80,68,40,14
 +
1 10472 G 4 .,,. <DLH >>FF 81,69,41,15
 +
1 10473 G 4 .,,. C=DR >>FF 82,70,42,16
 +
1 10478 C 5 .,,., DJQR> >>FF> 87,75,47,21,2
 +
1 10492 C 5 ,,T,, QKAA@ >FF>> 89,61,35,16,7
 +
1 10494 G 5 ,,.,, QQ<G> >FF>> 91,63,37,18,9
 +
1 10503 T 5 ,$,.,, /QDAG >FF>> 100,72,46,27,18
 +
 +
(TO BE CONTINUED)..

Navigation menu