From Genome Analysis Wiki
Jump to: navigation, search

GotCloud: GenomeSTRiP Pipeline

1,592 bytes added, 17:36, 9 February 2015
no edit summary
In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.
* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.
== Input Data ==
=== Configuration File ===
{{:GotCloud: Configuration}}
==== GenomeSTRiP specific configuration settings ====
When using GenomeSTRiP, you need to specify the following configuration settings:
GENOMESTRIP_MASK_FASTA = /net/bipolar/hmkang/ref/hs37d5/genomeSTRiP/human_g1k_v37.mask.100.fasta
GENOMESTRIP_PLOIDY_MAP = /net/bipolar/hmkang/2013_09/seqshop/reference/svtoolkit/conf/
'''Replace the specified paths to the path to these files.'''
== Running GotCloud/GenomeSTRiP ==
=== Metadata Pipeline ===
The metadata pipeline creates metadata summarizing genome-wide statistics such as GC profiles, depth distribution, insert size distributions.
This metadata pipeline runs the "GenomeSTRiP SVProcess" step, generating metadata output and other intermediate files. See for the details of the Preprocess step.
NOTE: You don't always have to create the metadata on your own. You can in principle use the public metadata generated for 1000G samples, under the assumption that the metadata share similar characteristics to your samples. But if you have enough computing resources, the best practice is to create metadata specifically for your sequence data.
Command-line to run the metadata step:
gotcloud genomestrip --run-metadata --conf gotcloud.conf --outdir outputDirectory --numjobs 10
The metadata pipeline may take a long time to run:
* Chromosomes 21 & 22 for 10 BAMs took 1 hour to run with 10 jobs
* May take a few weeks to run whole genome on many BAMs

Navigation menu