From Genome Analysis Wiki
Jump to navigationJump to search
394 bytes added
, 14:46, 4 February 2013
Line 39: |
Line 39: |
| | | |
| We illustrate how to obtain .seq file from BAM files in this section. | | We illustrate how to obtain .seq file from BAM files in this section. |
− | In this example, we use HGDP data set as reference, which contains 938 individuals and 632,958 markers. | + | In this example, we use HGDP data set as a reference, which contains 938 individuals and 632,958 markers. |
| [[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]] | | [[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]] |
| | | |
| 1. Obtain pileup files from BAM files | | 1. Obtain pileup files from BAM files |
| | | |
− | We use ''samtools'' to extract the sequence bases overlapping the 632,958 reference markers:
| + | The first step is to generate a BED file: |
| + | |
| + | cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed |
| + | |
| + | This BED file contains the positions of all the reference markers. |
| + | |
| + | Then we use ''samtools'' to extract the sequence bases overlapping these 632,958 reference markers. |
| + | Assuming your BAM file name is ''NA12878.chrom22.recal.bam'' (our example BAM file), you can use this: |
| + | |
| samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup | | samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup |
| + | |
| + | to obtain a pileup file named ''NA12878.chrom22.pileup''. It is required to keep the ''.pileup' suffix. |
| | | |
| 2. Obtain a seq file from pileup files. | | 2. Obtain a seq file from pileup files. |
| | | |
− | To convert pileup files into a single seq file before running LASER, we first generate a site file:
| + | After obtaining pileup files from each BAM file, you can convert them into a single seq file before running LASER. |
− | | + | Use the same site file and all generated pileup files from step 1 to generate a seq file: |
− | cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
| |
| | | |
− | Then use this site file and all generated pileup files from step 1 to generate a seq file:
| + | python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup |
| | | |
− | python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup
| + | You should obtain test.seq file after this step. |
| | | |
| == Estimate ancestries of sequence samples == | | == Estimate ancestries of sequence samples == |