Changes

3,506 bytes added , 17:09, 17 December 2014

→‎What does a real SV look like?

Line 2: Line 2:

Main Workshop wiki page: [[SeqShop: December 2014]]

−

See ~~the~~ [[Media:Seqshop cnv partb 2014 06.pdf|~~introductory~~ slides]] for ~~an intro to~~ this tutorial.

+

See [[Media:Seqshop cnv partb 2014 06.pdf|lecture slides]] for the lecture slides associated with this tutorial.

== Goals of This Session ==

Line 24: Line 24:

== Setup in person at the SeqShop Workshop ==

''This section is specifically for the SeqShop Workshop computers.''

−

+

''If you are not running during the SeqShop Workshop, please skip this section.''

Line 38: Line 38:

* Setup an output directory

** It will leave your output directory from the previous tutorial in tact.

−

source /home/mktrost/seqshop/setup.txt

+

source /net/seqshop-server/home/mktrost/seqshop/setup.txt

* You won't see any output after running <code>source</code>

** It silently sets up your environment

** If you want to view the detail of the setup, type

−

less /home/mktrost/seqshop/setup.txt

+

less /net/seqshop-server/home/mktrost/seqshop/setup.txt

and press 'q' to finish.

Line 53: Line 53:

</div>

−

== Setup when running on your own outside of the SeqShop Workshop ==

Line 168: Line 167:

Parameters files required just for Structural Variation:

−

${GC}/src/svtoolkit/conf

+

ls ${GC}/src/svtoolkit/conf

<ul>

Line 229: Line 228:

In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.

* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.

+

== Command Line Usage of GenomeSTRiP pipeline ==

+

To see how to use GenomeSTRiP pipeline, type

+

perl $GC/bin/genomestrip.pl

+

+

''View Results''

+

+

ERROR: One of command options among --run-metadata, --run-discovery, --run-genotype, --run-thirdparty must be specified

+

ERROR: Missing required option, outdir

+

Usage:

+

/net/seqshop-server/home/mktrost/seqshop/gotcloud/bin/genomestrip.pl

+

[options]

+

Help Options:

+

-help Print out brief help message [OFF]

+

-man Print the full documentation in man page style [OFF]

+

Command options:

+

-run-metadata Create metadata [OFF]

+

-run-discovery Run variant discovery and filtering. Can run with --run-metadata together [OFF]

+

-run-genotype Run genotyping - requires to finish run-metadata and run-discovery [OFF]

+

-run-thirdparty Run genotyping and filtering of third-party sites [OFF]

+

Options for input/output data:

+

-gotcloudroot|gcroot STRGotCloud Root Directory []

+

-conf STR GotCloud configuration files []

+

-outdir STR Override's conf file's OUT_DIR. Used as the genomestrip output directory unless --out or GENOMESTRIP_OUT is set []

+

-list STR BAM list file containing ID and BAM path []

+

-out STR Output directory which stores subdirectories such as metadata/, discovery/, genotypes/, thirdparty/ unless overriden individually []

+

-metadata STR Output directory to store --run-metadata results. Default is [OUT]/metadata/ []

+

-discovery STR Output directory to store --run-discovery results. Default is [OUT]/discovery/ []

+

-genotype STR Output directory to store --run-genotype results. Default is [OUT]/genotype/ []

+

-thirdparty STR Output directory to store --run-thirdparty results. Default is [OUT]/thirdparty/ []

+

Advanced Options:

+

-tmp-dir STR temporary directory to store temporary files. Default is [OUT]/tmp []

+

-gs-dir STR GenomeSTRiP svtoolkit directory []

+

-param STR GenomeSTRIP parameter file []

+

-ref STR Reference FASTA file []

+

-mask STR Reference mask FASTA file []

+

-ploidy-map STR Ploidy map file []

+

-mosix-opt STR MOSIX options []

+

-region STR Region to focus on the variants []

+

-unit INT Number of variants to be genotyped per parallel run [100]

+

Additional Inputs:

+

-in-vcf STR Input site VCF files used for --run-genotype or --run-thirdparty. For --run-thirdparty, this argument is required. For --run-genotype, default is [OUT]/discovery/discovery.vcf []

+

-pass-only Genotype only PASS-filtered variants, default is OFF [OFF]

+

-skip-rc Skip precomputing read count [OFF]

+

-base-prefix STR Prefix of all files []

+

-bam-prefix STR Prefix of BAM files []

+

-ref-prefix STR Prefix of Reference FASTA files []

+

-no-phonehome Skip phone home functionality [OFF]

+

-make-base-name STR Specifies the basename for the makefile []

+

-verbose Specifies that additional details are to be printed out [OFF]

+

-dry-run Perform a dry-run that only produces Makefile but not run it [OFF]

+

-numjobs INT Number of jobs to concurrently run [1]

+

-autosomes Perform analysis only on autosomes [OFF]

+

</div></div>

== Running GotCloud/GenomeSTRiP Metadata Pipeline ==

Line 235: Line 295:

In principle, the metadata can be created from the input BAM files by running the following command

−

perl ${SS}~~/svtoolkit~~/bin/genomestrip.pl -run-metadata --conf ${SS}/gotcloud.conf --numjobs 2 --base-prefix ${SS} --outdir ${OUT}

+

perl ${GC}/bin/genomestrip.pl --run-metadata --conf ${SS}/gotcloud.conf --numjobs 12 --base-prefix ${SS} --outdir ${OUT}

−

'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take ~~~50 minutes~~ to finish'''.

+

'''WAIT!!!!! DO NOT RUN THIS COMMAND, because it will take >1 hour to finish'''.

Instead, let's look what the output would have looked like.

−

ls ${SS}~~/svtoolkit~~/metadata

+

ls ${SS}/metadata

−

cpt depth depth.dat gcprofile gcprofiles.zip genome_sizes.txt isd isd.dist.bin spans spans.dat

+

computerc.args.list

+

cpt

+

depth

+

depth.args.list

+

depth.dat

+

gcprofile

+

gcprofiles.list

+

gcprofiles.zip

+

genome_sizes.txt

+

isd

+

isd.dist.args.list

+

isd.dist.bin

+

rccache

+

rccache.bin

+

rccache.bin.idx

+

rccache.list

+

rccache.merge

+

spans

+

spans.args.list

+

spans.dat

The directory contains metadata output and other intermediate files produced by "GenomeSTRiP SVProcess" step.

Line 254: Line 333:

To discover large deletions from the 62 BAMs we are using for this workshop, you can run the following command

−

~~time~~ perl ${SS}~~/svtoolkit~~/bin/genomestrip.pl -run-discovery --metadata ${SS}/~~svtoolkit~~/~~metadata~~ --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}

+

perl ${GC}/bin/genomestrip.pl --run-discovery --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --conf ${SS}/gotcloud.conf --numjobs 2 --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT}

−

* <code>${SS}~~/svtoolkit~~/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline

+

* <code>${GC}/bin/genomestrip.pl -run-discovery</code> runs the GenomeSTRiP Discovery Pipeline

−

* <code>--metadata ${SS}~~/svtoolkit~~/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].

+

* <code>--metadata ${SS}/metadata</code> points to the pre-made metadata file as explained in the previous section, [[#Running GotCloud/GenomeSTRiP Metadata Pipeline|Running GotCloud/GenomeSTRiP Metadata Pipeline]].

* <code>--conf ${SS}/gotcloud.conf</code> points to the configuration file to use.

** The configuration for this test was downloaded with the seqshop input files (same as other tutorials).

Line 294: Line 373:

7 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL

−

17 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL;PAIRSPERSAMPLE

+

18 COHERENCE;COVERAGE;DEPTH;DEPTHPVAL;PAIRSPERSAMPLE

3 COHERENCE;COVERAGE;DEPTH;PAIRSPERSAMPLE

2 COHERENCE;COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE

Line 304: Line 383:

2 COVERAGE;DEPTH;PAIRSPERSAMPLE

4 COVERAGE;DEPTHPVAL

−

5 COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE

+

4 COVERAGE;DEPTHPVAL;PAIRSPERSAMPLE

5 COVERAGE;PAIRSPERSAMPLE

</div>

Line 323: Line 402:

The discovery pipeline only performs discovery of variant sites with filtering. You will need to iterate BAMs again to perform genotyping.

* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.

−

~~time~~ perl ${SS}~~/svtoolkit~~/bin/genomestrip.pl -run-genotype --metadata ${SS}~~/svtoolkit~~/metadata --conf ${SS}/gotcloud.conf --numjobs 4 ~~--region 22:36000000-37000000~~ --base-prefix ${SS} --outdir ${OUT} ~~--gcroot ${GC}~~

+

perl ${GC}/bin/genomestrip.pl --run-genotype --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --numjobs 4 --base-prefix ${SS} --outdir ${OUT}

−

* The added <code>--gcroot ${GC}</code> option directs the pipeline to tabix/bgzip programs found within gotcloud.

This will take ~3 minutes to finish.

Line 344: Line 422:

You can take a 3rd-party site and genotype with GenomeSTRiP. Here we take a 1000 Genomes phase 1 sites and genotype them.

* If running on a small machine, you may want to reduce <code>--numjobs</code> from 4 to 1.

−

~~time~~ perl ${SS}~~/svtoolkit~~/bin/genomestrip.pl -run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}~~/svtoolkit~~/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT~~} --gcroot ${GC~~} --numjobs 4

+

perl ${GC}/bin/genomestrip.pl --run-thirdparty --in-vcf ${SS}/ext/1kg.phase1.chr22.36Mb.sites.vcf --metadata ${SS}/metadata --conf ${SS}/gotcloud.conf --region 22:36000000-37000000 --base-prefix ${SS} --outdir ${OUT} --numjobs 2

This will take ~1 minute to finish.

Line 372: Line 450:

</div>

−

== ~~Starting SNP Call on your own Genome~~ ==

+

−

Go to [[SeqShop: ~~Calling Your Own Genome,~~ December 2014]] ~~so we can run SNP calling overnight.~~

+

== Return to Workshop Wiki Page ==

+

Return to main workshop wiki page: [[SeqShop: December 2014]]

Mktrost

Administrators

3,045

edits

Changes

SeqShop: Analysis of Structural Variation Practical, December 2014 (view source)

Revision as of 17:09, 17 December 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools