From Genome Analysis Wiki
Jump to navigationJump to search
196 bytes added
, 21:46, 18 June 2014
Line 30: |
Line 30: |
| | | |
| == Preparing input files for LASER == | | == Preparing input files for LASER == |
− | = Step 0: vcf --> geno = | + | === Step 0: vcf --> geno === |
| | | |
| This step prepares the reference panel by converting a VCF genotype file to a GENO file. We will skip this step and use a ready-to-use HGDP reference panel. A typical command to run the vcf2geno tool is given in the file "./LASER-2.01/vcf2geno/cmd.sh": | | This step prepares the reference panel by converting a VCF genotype file to a GENO file. We will skip this step and use a ready-to-use HGDP reference panel. A typical command to run the vcf2geno tool is given in the file "./LASER-2.01/vcf2geno/cmd.sh": |
Line 37: |
Line 37: |
| | | |
| | | |
− | Step 1: bam --> pileup | + | === Step 1: bam --> pileup === |
| | | |
| This step uses samtools to generate pileup files from bam files. | | This step uses samtools to generate pileup files from bam files. |
Line 50: |
Line 50: |
| # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5.fa.rz -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup & | | # $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5.fa.rz -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup & |
| | | |
− | Step 2: pileup --> seq | + | === Step 2: pileup --> seq === |
| | | |
| In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run. | | In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run. |
Line 72: |
Line 72: |
| == Estimating ancestry coordinates == | | == Estimating ancestry coordinates == |
| | | |
− | Step 0: Generate the reference ancestry space (using the PCA mode of the LASER program) | + | === Step 0: Generate the reference ancestry space === |
| + | |
| + | LASER can perform principal components analysis (PCA) on genotype data of the reference panel to generate a reference ancestry space. |
| | | |
| # ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -pca 1 -k 30 -o HGDP_938 | | # ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -pca 1 -k 30 -o HGDP_938 |
Line 82: |
Line 84: |
| less -S $HGDP/HGDP_938.RefPC.coord | | less -S $HGDP/HGDP_938.RefPC.coord |
| | | |
− | Step 1: Place sequenced samples into the reference ancestry space: | + | === Step 1: Estimate ancestry for sequenced samples === |
| + | |
| + | Submit two jobs to place sequenced samples into the reference ancestry space: |
| | | |
| ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s hapmap_trios.seq -K 20 -k 4 -x 1 -y 3 -o hapmap_trios.1-3 & | | ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s hapmap_trios.seq -K 20 -k 4 -x 1 -y 3 -o hapmap_trios.1-3 & |
Line 91: |
Line 95: |
| The running time is ~10 minutes for processing 3 samples in each job. | | The running time is ~10 minutes for processing 3 samples in each job. |
| | | |
− | Step 2: Combine results | + | === Step 2: Combine results === |
| | | |
| Results from previous step will be output to two files "hapmap_trios.1-3.SeqPC.coord" and "hapmap_trios.4-6.SeqPC.coord". | | Results from previous step will be output to two files "hapmap_trios.1-3.SeqPC.coord" and "hapmap_trios.4-6.SeqPC.coord". |