Changes

LASER (view source)

Revision as of 23:06, 30 January 2013

19 bytes added , 23:06, 30 January 2013

no edit summary

Line 14: Line 14:

In coord files of reference, LASER calculated the Principal Component Analysis (PCA) results of the reference samples; In coord files of sequencing samples, LASER infers their ancestries by placing their ancestry coordinates on the reference coordinates.

−

An example ~~results~~ is shown below:

+

An example result is shown below:

popID indivID L1 Ci t PC1 PC2

Line 37: Line 37:

−

1. BAM ~~file => pileup file~~

+

1. Obtain pileup files from BAM files

We use samtools to extract the bases on the 632,958 reference markers using:

samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup

−

2. pileup ~~file => seq file~~

+

2. Obtain seq files from pileup files.

−

To convert pile up ~~file to~~ seq file format, we first generate site file:

+

To convert pile up files into seq file format, we first generate site file:

cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed

−

Then use this site file and all generated pileup files from step 1 to generate seq file:

+

Then use this site file and all generated pileup files from step 1 to generate a seq file:

python pileup2seq.py -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup

Line 65: Line 65:

== Geno file ==

−

In our resource folder, we ~~provided~~ an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno):

+

In our resource folder, we provide an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno):

Brahui HGDP00001 1 2 1 1 0 2 0 2 1 2 2 2 1 1 2 1 0

Line 78: Line 78:

Brahui HGDP00019 0 2 2 0 0 1 0 1 0 2 1 2 2 1 2 2 0

−

The first and second columns ~~represents~~ the population id and individual id.

+

The first and second columns represent the population id and individual id.

From the third column, the number represents the genotype.

In this geno file, we have 632,960 columns which contains 632,956 markers from column 3 to the last column.

== Seq file ==

−

Seq file ~~organize~~ the sequencing information into LASER readable format.

+

Seq file organizes the sequencing information into LASER readable format.

The first two columns are intended for population id and individual id.

Subsequent columns are total read depth and reference base count.

For example, column 3 and 4 are 0, 0 in the following example, meaning at first marker, the read depth is 0 and none of read has reference base.

−

We enforce tab delimiters between markers and space delimiters between read depth and reference base counts.

+

We enforce tab delimiters between markers and space delimiters between read depth and reference base counts.

An example seq file is shown below:

Line 120: Line 120:

== Coord file ==

−

Coord ~~file~~ are used to represents the ancestries of both reference samples and sequencing samples.

+

Coord files are used to represents the ancestries of both reference samples and sequencing samples.

−

An example coord file ~~look~~ like below:

+

An example coord file looks like below:

popID indivID L1 Ci t PC1 PC2

Line 135: Line 135:

== Site file ==

−

Site file ~~are~~ equivalent to BED file and it is used here to represent marker positions. An example site file looks like below:

+

Site file is equivalent to BED file and it is used here to represent marker positions. An example site file looks like below:

CHR POS ID REF ALT

1 752566 rs3094315 G A

Line 152: Line 152:

= Contact =

Please contact [mailto:aaron.wcl@gmail.com Chaolong Wang] if you have questions regarding the main program of LASER,

−

and [mailto:~~bingshan~~@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER.

+

and [mailto:zhanxw@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER.

This project was directed by Gonçalo Abecasis and Sebastian Zöllner at the University of Michigan.

Zhanxw

255

edits

Changes

LASER (view source)

Revision as of 23:06, 30 January 2013

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools