Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 30: Line 30:     
== Preparing input files for LASER ==
 
== Preparing input files for LASER ==
= Step 0: vcf --> geno =
+
=== Step 0: vcf --> geno ===
    
This step prepares the reference panel by converting a VCF genotype file to a GENO file. We will skip this step and use a ready-to-use HGDP reference panel. A typical command to run the vcf2geno tool is given in the file "./LASER-2.01/vcf2geno/cmd.sh":
 
This step prepares the reference panel by converting a VCF genotype file to a GENO file. We will skip this step and use a ready-to-use HGDP reference panel. A typical command to run the vcf2geno tool is given in the file "./LASER-2.01/vcf2geno/cmd.sh":
Line 37: Line 37:       −
Step 1: bam --> pileup
+
=== Step 1: bam --> pileup ===
    
This step uses samtools to generate pileup files from bam files.  
 
This step uses samtools to generate pileup files from bam files.  
Line 50: Line 50:  
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5.fa.rz -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
 
  # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5.fa.rz -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &
   −
Step 2: pileup --> seq
+
=== Step 2: pileup --> seq ===
    
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
 
In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.
Line 72: Line 72:  
== Estimating ancestry coordinates ==
 
== Estimating ancestry coordinates ==
   −
Step 0: Generate the reference ancestry space (using the PCA mode of the LASER program)
+
=== Step 0: Generate the reference ancestry space ===
 +
 
 +
LASER can perform principal components analysis (PCA) on genotype data of the reference panel to generate a reference ancestry space.
    
  # ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -pca 1 -k 30 -o HGDP_938
 
  # ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -pca 1 -k 30 -o HGDP_938
Line 82: Line 84:  
  less -S $HGDP/HGDP_938.RefPC.coord
 
  less -S $HGDP/HGDP_938.RefPC.coord
   −
Step 1: Place sequenced samples into the reference ancestry space:
+
=== Step 1: Estimate ancestry for sequenced samples ===
 +
 
 +
Submit two jobs to place sequenced samples into the reference ancestry space:
    
  ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s hapmap_trios.seq -K 20 -k 4 -x 1 -y 3 -o hapmap_trios.1-3 &
 
  ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s hapmap_trios.seq -K 20 -k 4 -x 1 -y 3 -o hapmap_trios.1-3 &
Line 91: Line 95:  
The running time is ~10 minutes for processing 3 samples in each job.
 
The running time is ~10 minutes for processing 3 samples in each job.
   −
Step 2: Combine results  
+
=== Step 2: Combine results ===
    
Results from previous step will be output to two files "hapmap_trios.1-3.SeqPC.coord" and "hapmap_trios.4-6.SeqPC.coord".  
 
Results from previous step will be output to two files "hapmap_trios.1-3.SeqPC.coord" and "hapmap_trios.4-6.SeqPC.coord".  
111

edits

Navigation menu