Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 37: Line 37:  
In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.
 
In addition, if one wants to genotype structural variants from other structural variant caller, there is a step available.
 
* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.
 
* Third-party Genotyping and Filtering step : Perform genotyping on the variant sites specified by an input VCF, and also perform variant filtering.
 +
 +
 +
== Input Data ==
 +
=== Configuration File ===
 +
{{:GotCloud: Configuration}}
 +
 +
==== GenomeSTRiP specific configuration settings ====
 +
When using GenomeSTRiP, you need to specify the following configuration settings:
 +
GENOMESTRIP_MASK_FASTA = /net/bipolar/hmkang/ref/hs37d5/genomeSTRiP/human_g1k_v37.mask.100.fasta
 +
GENOMESTRIP_PLOIDY_MAP = /net/bipolar/hmkang/2013_09/seqshop/reference/svtoolkit/conf/humgen_g1k_v37_ploidy.map
 +
 +
'''Replace the specified paths to the path to these files.'''
 +
 +
 +
== Running GotCloud/GenomeSTRiP ==
 +
 +
=== Metadata Pipeline ===
 +
The metadata pipeline creates metadata summarizing genome-wide statistics such as GC profiles, depth distribution, insert size distributions.
 +
 +
This metadata pipeline runs the "GenomeSTRiP SVProcess" step, generating metadata output and other intermediate files.  See http://gatkforums.broadinstitute.org/discussion/1514/svpreprocess-queue-script for the details of the Preprocess step.
 +
 +
NOTE: You don't always have to create the metadata on your own. You can in principle use the public metadata generated for 1000G samples, under the assumption that the metadata share similar characteristics to your samples. But if you have enough computing resources, the best practice is to create metadata specifically for your sequence data.
 +
 +
Command-line to run the metadata step:
 +
gotcloud genomestrip --run-metadata --conf gotcloud.conf --outdir outputDirectory --numjobs 10
 +
 +
The metadata pipeline may take a long time to run:
 +
* Chromosomes 21 & 22 for 10 BAMs took 1 hour to run with 10 jobs
 +
* May take a few weeks to run whole genome on many BAMs

Navigation menu