Changes

From Genome Analysis Wiki
Jump to: navigation, search

LASER

394 bytes added, 14:46, 4 February 2013
Process sequencing file (BAM)
We illustrate how to obtain .seq file from BAM files in this section.
In this example, we use HGDP data set as a reference, which contains 938 individuals and 632,958 markers.
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]
1. Obtain pileup files from BAM files
We The first step is to generate a BED file:  cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed This BED file contains the positions of all the reference markers.  Then we use ''samtools'' to extract the sequence bases overlapping the these 632,958 reference markers.Assuming your BAM file name is ''NA12878.chrom22.recal.bam'' (our example BAM file), you can use this
samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
 
to obtain a pileup file named ''NA12878.chrom22.pileup''. It is required to keep the ''.pileup' suffix.
2. Obtain a seq file from pileup files.
To convert After obtaining pileup files from each BAM file, you can convert them into a single seq file before running LASER, we first . Use the same site file and all generated pileup files from step 1 to generate a site seq file:  cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
Then use this python pileup2seq.py -m ../resource/HGDP/HGDP_938.site file and all generated -o test NA12878.chrom22.pileup files from step 1 to generate a seq file:
python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o You should obtain test NA12878.chrom22seq file after this step.pileup
== Estimate ancestries of sequence samples ==
255
edits

Navigation menu