Line 14: |
Line 14: |
| In coord files of reference, LASER calculated the Principal Component Analysis (PCA) results of the reference samples; In coord files of sequencing samples, LASER infers their ancestries by placing their ancestry coordinates on the reference coordinates. | | In coord files of reference, LASER calculated the Principal Component Analysis (PCA) results of the reference samples; In coord files of sequencing samples, LASER infers their ancestries by placing their ancestry coordinates on the reference coordinates. |
| | | |
− | An example results is shown below: | + | An example result is shown below: |
| | | |
| popID indivID L1 Ci t PC1 PC2 | | popID indivID L1 Ci t PC1 PC2 |
Line 37: |
Line 37: |
| [[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]] | | [[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]] |
| | | |
− | 1. BAM file => pileup file | + | 1. Obtain pileup files from BAM files |
| | | |
| We use samtools to extract the bases on the 632,958 reference markers using: | | We use samtools to extract the bases on the 632,958 reference markers using: |
| samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup | | samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup |
| | | |
− | 2. pileup file => seq file | + | 2. Obtain seq files from pileup files. |
| | | |
− | To convert pile up file to seq file format, we first generate site file: | + | To convert pile up files into seq file format, we first generate site file: |
| | | |
| cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed | | cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed |
| | | |
− | Then use this site file and all generated pileup files from step 1 to generate seq file: | + | Then use this site file and all generated pileup files from step 1 to generate a seq file: |
| | | |
| python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup | | python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup |
Line 65: |
Line 65: |
| | | |
| == Geno file == | | == Geno file == |
− | In our resource folder, we provided an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno): | + | In our resource folder, we provide an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno): |
| | | |
| Brahui HGDP00001 1 2 1 1 0 2 0 2 1 2 2 2 1 1 2 1 0 | | Brahui HGDP00001 1 2 1 1 0 2 0 2 1 2 2 2 1 1 2 1 0 |
Line 78: |
Line 78: |
| Brahui HGDP00019 0 2 2 0 0 1 0 1 0 2 1 2 2 1 2 2 0 | | Brahui HGDP00019 0 2 2 0 0 1 0 1 0 2 1 2 2 1 2 2 0 |
| | | |
− | The first and second columns represents the population id and individual id. | + | The first and second columns represent the population id and individual id. |
| From the third column, the number represents the genotype. | | From the third column, the number represents the genotype. |
| In this geno file, we have 632,960 columns which contains 632,956 markers from column 3 to the last column. | | In this geno file, we have 632,960 columns which contains 632,956 markers from column 3 to the last column. |
| | | |
| == Seq file == | | == Seq file == |
− | Seq file organize the sequencing information into LASER readable format. | + | Seq file organizes the sequencing information into LASER readable format. |
| The first two columns are intended for population id and individual id. | | The first two columns are intended for population id and individual id. |
| Subsequent columns are total read depth and reference base count. | | Subsequent columns are total read depth and reference base count. |
| For example, column 3 and 4 are 0, 0 in the following example, meaning at first marker, the read depth is 0 and none of read has reference base. | | For example, column 3 and 4 are 0, 0 in the following example, meaning at first marker, the read depth is 0 and none of read has reference base. |
− | We enforce tab delimiters between markers and space delimiters between read depth and reference base counts. | + | We enforce tab delimiters between markers and space delimiters between read depth and reference base counts. |
| An example seq file is shown below: | | An example seq file is shown below: |
| | | |
Line 120: |
Line 120: |
| | | |
| == Coord file == | | == Coord file == |
− | Coord file are used to represents the ancestries of both reference samples and sequencing samples. | + | Coord files are used to represents the ancestries of both reference samples and sequencing samples. |
− | An example coord file look like below: | + | An example coord file looks like below: |
| | | |
| popID indivID L1 Ci t PC1 PC2 | | popID indivID L1 Ci t PC1 PC2 |
Line 135: |
Line 135: |
| | | |
| == Site file == | | == Site file == |
− | Site file are equivalent to BED file and it is used here to represent marker positions. An example site file looks like below: | + | Site file is equivalent to BED file and it is used here to represent marker positions. An example site file looks like below: |
| CHR POS ID REF ALT | | CHR POS ID REF ALT |
| 1 752566 rs3094315 G A | | 1 752566 rs3094315 G A |
Line 152: |
Line 152: |
| = Contact = | | = Contact = |
| Please contact [mailto:aaron.wcl@gmail.com Chaolong Wang] if you have questions regarding the main program of LASER, | | Please contact [mailto:aaron.wcl@gmail.com Chaolong Wang] if you have questions regarding the main program of LASER, |
− | and [mailto:bingshan@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER. | + | and [mailto:zhanxw@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER. |
| This project was directed by Gonçalo Abecasis and Sebastian Zöllner at the University of Michigan. | | This project was directed by Gonçalo Abecasis and Sebastian Zöllner at the University of Michigan. |