Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 52: Line 52:     
== Running GotCloud/GenomeSTRiP ==
 
== Running GotCloud/GenomeSTRiP ==
 +
The general command-line for running GenomeSTRiP via GotCloud is:
 +
gotcloud genomestrip --run-<step> --conf <gotcloud.conf> --outdir <outputDirectory> --numjobs <#>
 +
Where:
 +
* <code>--run-<step></code> indicates which pipeline to run.  Options are:
 +
** <code>--run-metadata</code> - [[#Metadata Pipeline|Metadata Pipeline]]
 +
** <code>--run-discovery</code> - [[#Discovery Pipeline|Discovery Pipeline]]
 +
** <code>--run-genotype</code> - [[#Genotyping Pipeline|Genotyping Pipeline]]
 +
** <code>--run-thirdparty</code> - [[#3rd-party Site Genotyping/Filtering Pipeline|3rd-party Site Genotyping/Filtering Pipeline]]
 +
* <code>--conf <gotcloud.conf></code> - points to the configuration file to use
 +
* <code>--outdir <outputDirectory></code> - tells GotCloud where to write the output
 +
* <code>--numjobs <#></code> - number of jobs to run in parallel
 +
 +
Optional Parameters:
 +
* <code>--metadata <metadataDirectory></code> - points to a directory containing pre-made metadata files
 +
** Only required if skipping the <code>--run-metadata</code> step.
    
=== Metadata Pipeline ===
 
=== Metadata Pipeline ===
Line 59: Line 74:     
NOTE: You don't always have to create the metadata on your own. You can in principle use the public metadata generated for 1000G samples, under the assumption that the metadata share similar characteristics to your samples. But if you have enough computing resources, the best practice is to create metadata specifically for your sequence data.
 
NOTE: You don't always have to create the metadata on your own. You can in principle use the public metadata generated for 1000G samples, under the assumption that the metadata share similar characteristics to your samples. But if you have enough computing resources, the best practice is to create metadata specifically for your sequence data.
  −
Command-line to run the metadata step:
  −
gotcloud genomestrip --run-metadata --conf gotcloud.conf --outdir outputDirectory --numjobs 10
      
Timing:
 
Timing:
Line 70: Line 82:  
=== Discovery Pipeline ===
 
=== Discovery Pipeline ===
 
The discovery pipeline performs variant discovery across all samples as well as variant filtering based on expert knowledge.
 
The discovery pipeline performs variant discovery across all samples as well as variant filtering based on expert knowledge.
  −
gotcloud genomestrip --run-discovery --conf gotcloud.conf --outdir outputDirectory --numjobs 10
      
Timing:
 
Timing:
Line 79: Line 89:  
The genotyping pipeline iterates the discovered variants across the samples, calculating the genotype likelihood for each possible genotype.
 
The genotyping pipeline iterates the discovered variants across the samples, calculating the genotype likelihood for each possible genotype.
   −
gotcloud genomestrip --run-genotype --conf gotcloud.conf --outdir outputDirectory --numjobs 10
   
Timing:
 
Timing:
 
* 10 BAMs, chr 21 and 22: 4 mins with 10 jobs
 
* 10 BAMs, chr 21 and 22: 4 mins with 10 jobs

Navigation menu