Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 18: Line 18:  
** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples.
 
** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples.
 
*Run <code>ldrefine</code> pipeline test:
 
*Run <code>ldrefine</code> pipeline test:
  gotcloud snpcall --test OUTPUT_DIR
+
  gotcloud ldrefine --test OUTPUT_DIR
 
** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results
 
** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results
 
** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples.
 
** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples.
Line 61: Line 61:  
*** only used by Thunder (part of ldrefine pipeline)
 
*** only used by Thunder (part of ldrefine pipeline)
 
*** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample.
 
*** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample.
 +
 +
The path to the BAM List file is defaulted to the <code>outputDirectory/bam.list</code>.  It can be overridden by setting <code>--bamlist</code>, <code>--bam_list</code>, or <code>--list</code> on the command-line or by setting BAM_LIST in your configuration file to the path to the BAM List File.  See [[#Required_Options|Required Options]] for more information.
    
=== Reference Files ===
 
=== Reference Files ===
Line 79: Line 81:  
{{:GotCloud: Configuration}}
 
{{:GotCloud: Configuration}}
   −
====Additional Required User Config Files Settings====
+
See [[#Variant Calling Command-line Options/Configuration Settings|Variant Calling Command-line Options/Configuration Settings]] for more information on Configuration options.
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
! Configuration Key !! Command-line Flag !! Value Description !! Default Value
  −
|-
  −
|CHRS||--chrs || pace separated list of chromosomes to process || 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
  −
|-
  −
| BAM_LIST|| --list || path to the [[#BAM List File|BAM List File]] || $(OUT_DIR)/bam.list
  −
|}
  −
 
  −
====Targeted/Exome Sequencing Settings====
  −
If you are running Targeted/Exome Sequencing, the user should specify:
  −
 
  −
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
! Configuration Key !! Value Description
  −
|-
  −
|UNIFORM_TARGET_BED|| Bed file of targeted regions (same bed for all samples)
  −
|-
  −
|MULTIPLE_TARGET_MAP|| Filename of file mapping: sample id -> bed file of targeted regions
  −
Each line of the file contains: [SM_ID] [TARGET_BED]
  −
|-
  −
|OFFSET_OFF_TARGET|| Number of bases by which to extend the target region
  −
(default is 0, do not extend the target region)
  −
|-
  −
|SAMTOOLS_VIEW_TARGET_ONLY || '''true''': speeds up processing by excluding off-target regions initially when performing samtools view
  −
 
  −
'''false''' (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step
  −
 
  −
'''Warning:''' You may not want to set this to true due to it may:
  −
*''make command line too long''
  −
*''produce an error if reads overlap multiple targeted regions''
  −
** see: [[GotCloud: FAQs#Targetted/Exome|GotCloud: FAQs->Targetted/Exome]]
  −
|}
  −
 
  −
==== Chromosome X Calling ====
  −
For proper Chromosome X calling, it is recommended to specify a PED file with sex information:
  −
{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
! Configuration Key !! Value Description
  −
|-
  −
|PED_INDEX|| ped file containing sampleID (2nd column) and sex (5th column)
  −
|}
  −
 
  −
Format of PED file:
  −
:<code>familyID  sampleID  fatherID  motherID  sex</code>
  −
* Only <code>sampleID</code> and <code>sex</code> are used
     −
=== Example Configuration File ===
+
==== Example Configuration File ====
 
Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5
 
Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5
 
  CHRS = 20 22
 
  CHRS = 20 22
Line 134: Line 93:  
  HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz
 
  HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz
 
  DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz
 
  DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz
 +
 +
 +
== Variant Calling Command-line Options/Configuration Settings ==
 +
{{:GotCloud: Variant Calling Options}}
 +
 +
 +
== Use Cases & Recommended Settings ==
 +
=== Single Sample Processing ===
 +
To run single sample processing we recommend adding the following settings to your configuration file:
 +
UNIT_CHUNK = 20000000
 +
MODEL_GLFSINGLE = TRUE
 +
MODEL_SKIP_DISCOVER = FALSE
 +
MODEL_AF_PRIOR = TRUE
 +
VCF_EXTRACT = $(REF_DIR)/snpOnly.vcf.gz
 +
EXT = $(REF_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(REF_DIR)/chrCHR.filtered.sites.vcf.gz
 +
 +
Explanation of these settings:
 +
* <code>UNIT_CHUNK</code> - since this is only 1 sample, process larger regions at a time than default
 +
* <code>MODEL_GLFSINGLE</code> - single sample, so model glfsingle
 +
* <code>MODEL_SKIP_DISCOVER</code> - do not skip the variant discovery step
 +
* <code>MODEL_AF_PRIOR</code> - use AF prior for genotyping
 +
* <code>VCF_EXTRACT</code> - VCF file to use for extracting the site information to genotype
 +
**  This file is included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]]
 +
* <code>EXT</code> - VCF reference files to use for the external filtering
 +
** These files are included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]]
 +
 +
    
== Running ==
 
== Running ==
Line 148: Line 134:  
* Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel
 
* Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel
 
* If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory.
 
* If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory.
      
=== Running on a Cluster ===
 
=== Running on a Cluster ===
Line 162: Line 147:  
* glfs with a bams & samples subdirectory
 
* glfs with a bams & samples subdirectory
 
* pvcfs with a subdirectory per chromosome and then per region
 
* pvcfs with a subdirectory per chromosome and then per region
* split with a subdirectory per chromosome
+
* '''split''' with a subdirectory per chromosome
* vcfs with a subdirectory per chromosome
+
* '''vcfs''' with a subdirectory per chromosome
 
* (optionally your target directory)
 
* (optionally your target directory)
   −
Under the vcf/chrXX directory, there should be:
+
Under the '''vcf/chrXX''' directory, there should be:
 
* chrXX.filtered.sites.vcf
 
* chrXX.filtered.sites.vcf
 
* chrXX.filtered.sites.vcf.norm.log
 
* chrXX.filtered.sites.vcf.norm.log
 
* chrXX.filtered.sites.vcf.summary
 
* chrXX.filtered.sites.vcf.summary
* chrXX.filtered.vcf.gz
+
* '''chrXX.filtered.vcf.gz''' - final filtered variant call file
 
* chrXX.filtered.vcf.gz.OK
 
* chrXX.filtered.vcf.gz.OK
 
* chrXX.filtered.vcf.gz.tbi
 
* chrXX.filtered.vcf.gz.tbi
Line 188: Line 173:  
The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL.
 
The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL.
   −
Under the split/chrXX directory, there should be:
+
Under the '''split/chrXX''' directory, there should be:
 
* chrXX.filtered.PASS.split.[N].vcf.gz
 
* chrXX.filtered.PASS.split.[N].vcf.gz
 
* chrXX.filtered.PASS.split.err
 
* chrXX.filtered.PASS.split.err
 
* chrXX.filtered.PASS.split.vcflist
 
* chrXX.filtered.PASS.split.vcflist
* chrXX.filtered.PASS.gz
+
* '''chrXX.filtered.PASS.gz''' - final variant call file with only PASS variants
 
* subset.OK
 
* subset.OK
87

edits

Navigation menu