Changes

SeqShop: Estimates of Genetic Ancestry Practical, June 2014 (view source)

Revision as of 21:46, 18 June 2014

196 bytes added , 21:46, 18 June 2014

→‎Step 0: vcf --> geno

Line 30: Line 30:

== Preparing input files for LASER ==

−

= Step 0: vcf --> geno =

+

=== Step 0: vcf --> geno ===

This step prepares the reference panel by converting a VCF genotype file to a GENO file. We will skip this step and use a ready-to-use HGDP reference panel. A typical command to run the vcf2geno tool is given in the file "./LASER-2.01/vcf2geno/cmd.sh":

Line 37: Line 37: −

Step 1: bam --> pileup

+

=== Step 1: bam --> pileup ===

This step uses samtools to generate pileup files from bam files.

Line 50: Line 50:

# $GC/bin/samtools mpileup -q 30 -Q 20 -f $REF/hs37d5.fa.rz -l $HGDP/HGDP_938.bed $BAM/121101861.recal.bam > 121101861.recal.pileup &

−

Step 2: pileup --> seq

+

=== Step 2: pileup --> seq ===

In this step, we will generate a file called "hapmap_trios.seq", containing the information of 6 samples. It takes about 30 seconds to run.

Line 72: Line 72:

== Estimating ancestry coordinates ==

−

Step 0: Generate the reference ancestry space (~~using the~~ PCA ~~mode~~ of the ~~LASER program)~~

+

=== Step 0: Generate the reference ancestry space ===

+

LASER can perform principal components analysis (PCA) on genotype data of the reference panel to generate a reference ancestry space.

# ./LASER-2.01/laser -g $HGDP/HGDP_938.geno -pca 1 -k 30 -o HGDP_938

Line 82: Line 84:

less -S $HGDP/HGDP_938.RefPC.coord

−

Step 1: ~~Place~~ sequenced samples into the reference ancestry space:

+

=== Step 1: Estimate ancestry for sequenced samples ===

+

Submit two jobs to place sequenced samples into the reference ancestry space:

./LASER-2.01/laser -g $HGDP/HGDP_938.geno -c $HGDP/HGDP_938.RefPC.coord -s hapmap_trios.seq -K 20 -k 4 -x 1 -y 3 -o hapmap_trios.1-3 &

Line 91: Line 95:

The running time is ~10 minutes for processing 3 samples in each job.

−

Step 2: Combine results

+

=== Step 2: Combine results ===

Results from previous step will be output to two files "hapmap_trios.1-3.SeqPC.coord" and "hapmap_trios.4-6.SeqPC.coord".

Chaolong Wang

111

edits

Changes

SeqShop: Estimates of Genetic Ancestry Practical, June 2014 (view source)

Revision as of 21:46, 18 June 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools