Difference between revisions of "LASER"
Line 94: | Line 94: | ||
You will need to download software from Chaonglong... | You will need to download software from Chaonglong... | ||
− | = How to interpret your | + | = How to interpret your results = |
You results will be output as xxx. | You results will be output as xxx. |
Revision as of 23:16, 24 December 2012
LASER Manual
LASER is xxxx. To use LASER, please read part1 and part2. Also example will guide you through a complete procedure of how to use laser to infer ancestry. If you have feedbacks, please see contact.
Part 1: Prepare data
Generate pileup data from BAM file
To prepare data for LASER, you will need BAM files, reference SNP list and region list (in BED format). We first use samtools to generate pileup file. Those files will summarize sequenced bases at reference SNP sites. We use samtools 0.1.18 (r982:295) as an example. A typical samtools command is listed below:
samtools mpileup -q 30 -Q 20 -f hs37d5.fa -l referenceSNP.bed A.bam > A.pileup
We recommend using '-q 30' to remove all mapped read with mapping quality less than 30, and using '-Q 20' to remove bases with base qualities less than 20. Here we use reference 'hs37d5.fa' which is NCBI build 37. Make sure your reference SNP sites also in NCBI build 37. samtools also requires referenceSNP.bed, which is in BED format and defines locations of reference SNPs. Please note BED format used 0-based index as starting position. If you have a SNP, rs3094315, located in chromosome 1 position 752566 (1-based), make sure your reference SNP file has this line:
1 752565 752566 rs3094315
After this step, you will have A.pileup file for sample A. Repeat the same pileup procedure for sample B, sample C, ....
Generate seq, cov files from pileup data
In this step, you need to use pileup2laser.py script. For CSG users, this file is located in /net/t2dgenes/zhanxw/amd/analyze/chaolong/pileup2laser.py
You will need target definition file, MAPA file and pileup data from previous step.
Target definition should be in BED format.
MAPA file is similar to MAP file used in PLINK, and use reference allele and alternative allele as two additional columns. Example:
1 rs3094315 0 752566 G A
Example pileup2laser.py command:
python pileup2laser.py -b target.bed -i amd.idFile -m HGDP_938.mapa -o pca A.pileup B.pileup C.pileup
You will obtain pca.seq and pca.cov file. These two files are needed in following LASER analysis. pca.seq looks like below:
A A 0 0 9 9 2 1 0 0 0 0 B B 0 0 9 9 4 3 0 0 0 0 C C 0 0 9 9 2 2 0 0 0 0
The first 6 columns follows normal PED file convention. The 7, 8 column listed the total coverage and number of reference alleles at the first marker; the 9, 10 column listed the total coverage and number of reference alleles at the second marker; and so on.
pca.map looks like below:
1 rs3094315 0 752566 G
It's content is the typical MAP in the first 4 columns and use reference allele as the 5th column.
It is sometimes more convenient to use alternative set of sample names in pca.seq files. So instead using A, B, C... from pileup file names,
we can conveniently define a conversion table to map pileup file name to specified, meaningful, sample names. For example, you can create an idFile:
A EUR1 B EUR2 C AFR1
Then use
python pileup2laser.py -b target.bed -i idFile -m HGDP_938.mapa -o pca A.pileup B.pileup C.pileup
In the generated pca.seq file, you will have these outputs:
EUR1 EUR1 0 0 9 9 2 1 0 0 0 0 EUR2 EUR2 0 0 9 9 4 3 0 0 0 0 AFR1 AFR1 0 0 9 9 2 2 0 0 0 0
NOTE: you genotype should be forward strand, mapping quality filter and base quality filter should be appropriately set up.
(Optionally) Summarize Coverage
To evaluate the coverage on off-target reference markers, you can generate per-sample and per-marker coverage file using script /net/t2dgenes/zhanxw/amd/analyze/chaolong/calculateCoverage.py
python calculateCoverage.py -o pca.pca.seq pca.map
The output files are pca.cov.sample and pca.cov.marker:
pca.cov.sample file:
PersonId length min q1 median mean q3 max sd ERU1 632958.000 0.000 0.000 0.000 0.027 0.000 47.000 0.257 EUR2 632958.000 0.000 0.000 0.000 0.031 0.000 69.000 0.264 AFR1 632958.000 0.000 0.000 0.000 0.026 0.000 35.000 0.244
pca.cov.marker file:
chrom markerName cm pos ref length min q1 median mean q3 max sd 1 rs3094315 0 752566 G 3159.000 0.000 0.000 0.000 0.797 1.000 8.000 1.202 1 rs12562034 0 768448 G 3159.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 rs3934834 0 1005806 C 3159.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Those files help you evaluate off-target coverage.
Part 2: LASER
We plan to address (maybe three) tasks here: simulation, PCA, and Procrustes You will need to download software from Chaonglong...
How to interpret your results
You results will be output as xxx. An example of outputs are like below
XXX
Example
We provide a toy example (downloadble from: xxx) The workflow of LASER analysis are explained step by step.
1. generate pileup
2. prepare laser input
3. run laser
Contact
Xiaowei Zhan zhanxw_AT_gmail.com (for data preparation)
Chaolong Wang chaolong_AT_umich.edu (for using Laser and interpreting results)