From Genome Analysis Wiki
Jump to: navigation, search


394 bytes added, 14:46, 4 February 2013
Process sequencing file (BAM)
We illustrate how to obtain .seq file from BAM files in this section.
In this example, we use HGDP data set as a reference, which contains 938 individuals and 632,958 markers.
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]
1. Obtain pileup files from BAM files
We The first step is to generate a BED file:  cat ../resource/HGDP/ |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed This BED file contains the positions of all the reference markers.  Then we use ''samtools'' to extract the sequence bases overlapping the these 632,958 reference markers.Assuming your BAM file name is ''NA12878.chrom22.recal.bam'' (our example BAM file), you can use this
samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
to obtain a pileup file named ''NA12878.chrom22.pileup''. It is required to keep the ''.pileup' suffix.
2. Obtain a seq file from pileup files.
To convert After obtaining pileup files from each BAM file, you can convert them into a single seq file before running LASER, we first . Use the same site file and all generated pileup files from step 1 to generate a site seq file:  cat ../resource/HGDP/ |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
Then use this python -m ../resource/HGDP/ file and all generated -o test NA12878.chrom22.pileup files from step 1 to generate a seq file:
python -m ../resource/HGDP/ -o You should obtain test NA12878.chrom22seq file after this step.pileup
== Estimate ancestries of sequence samples ==

Navigation menu