Changes

206 bytes removed , 18:54, 30 March 2015

Line 18: Line 18:

** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples.

*Run <code>ldrefine</code> pipeline test:

−

gotcloud ~~snpcall~~ --test OUTPUT_DIR

+

gotcloud ldrefine --test OUTPUT_DIR

** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results

** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples.

Line 61: Line 61:

*** only used by Thunder (part of ldrefine pipeline)

*** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample.

+

The path to the BAM List file is defaulted to the <code>outputDirectory/bam.list</code>. It can be overridden by setting <code>--bamlist</code>, <code>--bam_list</code>, or <code>--list</code> on the command-line or by setting BAM_LIST in your configuration file to the path to the BAM List File. See [[#Required_Options|Required Options]] for more information.

=== Reference Files ===

Line 79: Line 81:

−

~~====Additional Required User Config Files Settings====~~

+

See [[#Variant Calling Command-line Options/Configuration Settings|Variant Calling Command-line Options/Configuration Settings]] for more information on Configuration options.

−

~~{| class="wikitable" style="margin: 1em 1em 1em 0; background-color:~~ #~~f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"~~

−

~~! Configuration Key !!~~ Command-line ~~Flag !! Value Description !! Default Value~~

−

|-

−

~~|CHRS||--chrs || pace separated list of chromosomes to process || 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X~~

−

|-

−

~~| BAM_LIST|| --list || path to the [[#BAM List File|BAM List File]] || $(OUT_DIR)~~/~~bam.list~~

−

|}

−

~~====Targeted/Exome Sequencing~~ Settings~~====~~

−

~~If you are running Targeted/Exome Sequencing, the user should specify:~~

−

{| ~~class="wikitable" style="margin: 1em 1em 1em 0; background~~-~~color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"~~

−

~~! Configuration Key !! Value Description~~

−

|-

−

~~|UNIFORM_TARGET_BED|| Bed file of targeted regions (same bed for all samples)~~

−

|-

−

~~|MULTIPLE_TARGET_MAP|| Filename of file mapping: sample id -> bed file of targeted regions~~

−

~~Each line of the file contains: [SM_ID] [TARGET_BED]~~

−

|-

−

~~|OFFSET_OFF_TARGET|| Number of bases by which to extend the target region~~

−

~~(default is 0, do not extend the target region)~~

−

|-

−

~~|SAMTOOLS_VIEW_TARGET_ONLY || '''true''': speeds up processing by excluding off-target regions initially when performing samtools view~~

−

~~'''false''' (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step~~

−

~~'''Warning:''' You may not want to set this to true due to it may:~~

−

*''make command line ~~too long''~~

−

*''produce an error if reads overlap multiple targeted regions''

−

** see: [[GotCloud: FAQs#Targetted/~~Exome|GotCloud: FAQs->Targetted/Exome~~]]

−

|}

−

~~==== Chromosome X Calling ====~~

−

~~For proper Chromosome X calling, it is recommended to specify a PED file with sex~~ information:

−

~~{| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"~~

−

! Configuration ~~Key !! Value Description~~

−

|-

−

~~|PED_INDEX|| ped file containing sampleID (2nd column) and sex (5th column)~~

−

|}

−

~~Format of PED file:~~

−

~~:<code>familyID sampleID fatherID motherID sex</code>~~

−

* Only <code>sampleID</code> and <code>sex</code> are used

−

=== Example Configuration File ===

+

==== Example Configuration File ====

Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5

CHRS = 20 22

Line 134: Line 93:

HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz

DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz

+

== Variant Calling Command-line Options/Configuration Settings ==

+

== Use Cases & Recommended Settings ==

+

=== Single Sample Processing ===

+

To run single sample processing we recommend adding the following settings to your configuration file:

+

UNIT_CHUNK = 20000000

+

MODEL_GLFSINGLE = TRUE

+

MODEL_SKIP_DISCOVER = FALSE

+

MODEL_AF_PRIOR = TRUE

+

VCF_EXTRACT = $(REF_DIR)/snpOnly.vcf.gz

+

EXT = $(REF_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(REF_DIR)/chrCHR.filtered.sites.vcf.gz

+

Explanation of these settings:

+

* <code>UNIT_CHUNK</code> - since this is only 1 sample, process larger regions at a time than default

+

* <code>MODEL_GLFSINGLE</code> - single sample, so model glfsingle

+

* <code>MODEL_SKIP_DISCOVER</code> - do not skip the variant discovery step

+

* <code>MODEL_AF_PRIOR</code> - use AF prior for genotyping

+

* <code>VCF_EXTRACT</code> - VCF file to use for extracting the site information to genotype

+

** This file is included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]]

+

* <code>EXT</code> - VCF reference files to use for the external filtering

+

** These files are included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]]

+

== Running ==

Line 148: Line 134:

* Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel

* If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory.

−

=== Running on a Cluster ===

Line 162: Line 147:

* glfs with a bams & samples subdirectory

* pvcfs with a subdirectory per chromosome and then per region

−

* split with a subdirectory per chromosome

+

* '''split''' with a subdirectory per chromosome

−

* vcfs with a subdirectory per chromosome

+

* '''vcfs''' with a subdirectory per chromosome

* (optionally your target directory)

−

Under the vcf/chrXX directory, there should be:

+

Under the '''vcf/chrXX''' directory, there should be:

* chrXX.filtered.sites.vcf

* chrXX.filtered.sites.vcf.norm.log

* chrXX.filtered.sites.vcf.summary

−

* chrXX.filtered.vcf.gz

+

* '''chrXX.filtered.vcf.gz''' - final filtered variant call file

* chrXX.filtered.vcf.gz.OK

* chrXX.filtered.vcf.gz.tbi

Line 188: Line 173:

The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL.

−

Under the split/chrXX directory, there should be:

+

Under the '''split/chrXX''' directory, there should be:

* chrXX.filtered.PASS.split.[N].vcf.gz

* chrXX.filtered.PASS.split.err

* chrXX.filtered.PASS.split.vcflist

−

* chrXX.filtered.PASS.gz

+

* '''chrXX.filtered.PASS.gz''' - final variant call file with only PASS variants

* subset.OK

Kleckner

87

edits

Changes

GotCloud: Variant Calling Pipeline (view source)

Revision as of 18:54, 30 March 2015

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools