Changes

From Genome Analysis Wiki
Jump to navigationJump to search
394 bytes added ,  14:46, 4 February 2013
Line 39: Line 39:     
We illustrate how to obtain .seq file from BAM files in this section.  
 
We illustrate how to obtain .seq file from BAM files in this section.  
In this example, we use HGDP data set as reference, which contains 938 individuals and 632,958 markers.
+
In this example, we use HGDP data set as a reference, which contains 938 individuals and 632,958 markers.
 
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]  
 
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]  
    
1. Obtain pileup files from BAM files   
 
1. Obtain pileup files from BAM files   
   −
We use ''samtools'' to extract the sequence bases overlapping the 632,958 reference markers:
+
The first step is to generate a BED file:
 +
 
 +
cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
 +
 
 +
This BED file contains the positions of all the reference markers. 
 +
 
 +
Then we use ''samtools'' to extract the sequence bases overlapping these 632,958 reference markers.
 +
Assuming your BAM file name is ''NA12878.chrom22.recal.bam'' (our example BAM file), you can use this:
 +
 
 
  samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
 
  samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
 +
 +
to obtain a pileup file named ''NA12878.chrom22.pileup''. It is required to keep the ''.pileup' suffix.
    
2. Obtain a seq file from pileup files.  
 
2. Obtain a seq file from pileup files.  
   −
To convert pileup files into a single seq file before running LASER, we first generate a site file:
+
After obtaining pileup files from each BAM file, you can convert them into a single seq file before running LASER.
 
+
Use the same site file and all generated pileup files from step 1 to generate a seq file:
cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
     −
Then use this site file and all generated pileup files from step 1 to generate a seq file:
+
python pileup2seq.py  -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup
   −
python pileup2seq.py  -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup
+
You should obtain test.seq file after this step.
    
== Estimate ancestries of sequence samples  ==
 
== Estimate ancestries of sequence samples  ==
255

edits

Navigation menu