Changes

From Genome Analysis Wiki
Jump to navigationJump to search
19 bytes added ,  23:06, 30 January 2013
no edit summary
Line 14: Line 14:  
In coord files of reference, LASER calculated the Principal Component Analysis (PCA) results of the reference samples; In coord files of sequencing samples, LASER infers their ancestries by placing their ancestry coordinates on the reference coordinates.  
 
In coord files of reference, LASER calculated the Principal Component Analysis (PCA) results of the reference samples; In coord files of sequencing samples, LASER infers their ancestries by placing their ancestry coordinates on the reference coordinates.  
   −
An example results is shown below:
+
An example result is shown below:
    
  popID  indivID  L1    Ci        t        PC1      PC2
 
  popID  indivID  L1    Ci        t        PC1      PC2
Line 37: Line 37:  
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]  
 
[[File:LASER-DataProcessing.png|thumb|center|alt=LASER workflow|400px|LASER Data Processing Procedure]]  
   −
1. BAM file => pileup file
+
1. Obtain pileup files from BAM files 
    
We use samtools to extract the bases on the 632,958 reference markers using:
 
We use samtools to extract the bases on the 632,958 reference markers using:
 
  samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
 
  samtools mpileup -q 30 -Q 20 -f ../../LASER-resource/reference/hs37d5.fa -l HGDP_938.bed exampleBAM/NA12878.chrom22.recal.bam > NA12878.chrom22.pileup
   −
2. pileup file => seq file
+
2. Obtain seq files from pileup files.
   −
To convert pile up file to seq file format, we first generate site file:
+
To convert pile up files into seq file format, we first generate site file:
    
  cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
 
  cat ../resource/HGDP/HGDP_938.site |awk '{if (NR > 1) {print $1, $2-1, $2;}}' > HGDP_938.bed
   −
Then use this site file and all generated pileup files from step 1 to generate seq file:
+
Then use this site file and all generated pileup files from step 1 to generate a seq file:
    
  python pileup2seq.py  -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup  
 
  python pileup2seq.py  -m ../resource/HGDP/HGDP_938.site -o test NA12878.chrom22.pileup  
Line 65: Line 65:     
== Geno file  ==
 
== Geno file  ==
In our resource folder, we provided an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno):
+
In our resource folder, we provide an example geno file for the HGDP data set (resource/HGDP/HGDP_938.geno):
    
  Brahui HGDP00001 1 2 1 1 0 2 0 2 1 2 2 2 1 1 2 1 0
 
  Brahui HGDP00001 1 2 1 1 0 2 0 2 1 2 2 2 1 1 2 1 0
Line 78: Line 78:  
  Brahui HGDP00019 0 2 2 0 0 1 0 1 0 2 1 2 2 1 2 2 0
 
  Brahui HGDP00019 0 2 2 0 0 1 0 1 0 2 1 2 2 1 2 2 0
   −
The first and second columns represents the population id and individual id.  
+
The first and second columns represent the population id and individual id.  
 
From the third column, the number represents the genotype.
 
From the third column, the number represents the genotype.
 
In this geno file, we have 632,960 columns which contains 632,956 markers from column 3 to the last column.
 
In this geno file, we have 632,960 columns which contains 632,956 markers from column 3 to the last column.
    
== Seq file  ==
 
== Seq file  ==
Seq file organize the sequencing information into LASER readable format.
+
Seq file organizes the sequencing information into LASER readable format.
 
The first two columns are intended for population id and individual id.
 
The first two columns are intended for population id and individual id.
 
Subsequent columns are total read depth and reference base count.
 
Subsequent columns are total read depth and reference base count.
 
For example, column 3 and 4 are 0, 0 in the following example, meaning at first marker, the read depth is 0 and none of read has reference base.
 
For example, column 3 and 4 are 0, 0 in the following example, meaning at first marker, the read depth is 0 and none of read has reference base.
We enforce tab delimiters between markers and space delimiters between read depth and reference base counts.
+
We enforce tab delimiters between markers and space delimiters between read depth and reference base counts.
 
An example seq file is shown below:
 
An example seq file is shown below:
   Line 120: Line 120:     
== Coord file  ==
 
== Coord file  ==
Coord file are used to represents the ancestries of both reference samples and sequencing samples.
+
Coord files are used to represents the ancestries of both reference samples and sequencing samples.
An example coord file look like below:
+
An example coord file looks like below:
    
  popID  indivID  L1    Ci        t        PC1      PC2
 
  popID  indivID  L1    Ci        t        PC1      PC2
Line 135: Line 135:     
== Site file ==
 
== Site file ==
Site file are equivalent to BED file and it is used here to represent marker positions. An example site file looks like below:
+
Site file is equivalent to BED file and it is used here to represent marker positions. An example site file looks like below:
 
  CHR  POS      ID          REF  ALT
 
  CHR  POS      ID          REF  ALT
 
  1    752566  rs3094315  G    A
 
  1    752566  rs3094315  G    A
Line 152: Line 152:  
= Contact  =
 
= Contact  =
 
Please contact [mailto:aaron.wcl@gmail.com Chaolong Wang] if you have questions regarding the main program of LASER,  
 
Please contact [mailto:aaron.wcl@gmail.com Chaolong Wang] if you have questions regarding the main program of LASER,  
and [mailto:bingshan@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER.
+
and [mailto:zhanxw@umich.edu Xiaowei Zhan] for questions related to preparing input files for LASER.
 
This project was directed by Gonçalo Abecasis and Sebastian Zöllner at the University of Michigan.
 
This project was directed by Gonçalo Abecasis and Sebastian Zöllner at the University of Michigan.
255

edits

Navigation menu