Difference between revisions of "GotCloud: Variant Calling Pipeline"

Revision as of 14:32, 6 November 2012

Back to the beginning [1]

The Variant Calling Pipeline (UMAKE) takes recalibrated BAM files and detects SNPs and calls their genotypes, producing VCF files.

Input Data:

Aligned/Processed/Recalibrated BAM files
Index file containing Sample IDs & BAM file names
Reference files
(Optional) Configuration file to override default options

BAM files

The BAM files need to be duplicate-marked and base-quality recalibrated in order to obtain high quality SNP calls.

FASTQs can be converted to this type of BAM using the Mapping Pipeline.

Index File

Each line of the index file represents each individual under the following format. Note that multiple BAMs per individual may be provided.

[SAMPLE_ID]    [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...

Columns:

sample id
comma separated population labels
BAM File 1
BAM File 2 (if applicable)

...

# BAM File N

Reference Files

Reference files are required for doing Variant Calling.

Reference Sequence in fasta format.
- Configuration File Setting: REF = path/file.fa
Indel VCF File Prefix
- Configuration File Setting: INDEL_PREFIX = path/indels.sites.hg19
- path/ contains indels.sites.hg19.chr20.vcf for each chromosome being processed
DBSNP File Prefix
- Configuration File Setting: DBSNP_PREFIX = path/dbsnp_135_b37.rod
- path/ contains dbsnp_135_b37.rod.chr20.map for each chromosome being processed
HapMap3 polymorphic site prefix
- Configuration File Setting: HM3_PREFIX = path/hapmap3.qc.poly
- path/ contains hapmap3.qc.poly.chr20.bim & hapmap3.qc.poly.chr20.frq for each chromosome being processed

A set of reference files can be downloaded from: [| FTP Download of Full Resource Files]

Configuration File Example Reference Settings:

REF = path/file.fa
INDEL_PREFIX = path/indels.sites.hg19
DBSNP_PREFIX = path/dbsnp_135_b37.rod
HM3_PREFIX = path/hapmap3.qc.poly

Configuration File

Configuration file contains the run-time options including the software binaries and command line arguments. A default configuration file is automatically loaded. Users must specify their own configuration file specifying just the values different than the defaults.

Comments begin with a #

Format: KEY = value

Where KEY is the item being set and value is its new value

Required User Config Files Settings

The following Config File Settings must be specified by the user:

CHRS = space separated list of chromosomes you want
BAM_INDEX = path to the Index File of BAMs

Required on Command-Line or in Config File

The following Command-Line or Config File Settings must be specified by the user:

--outdir/OUTDIR= path to desired output directory

Targeted/Exome Sequencing Settings

If you are running Targeted/Exome Sequencing, the user should specify:

Write loci file when performing pileup
- WRITE_TARGET_LOCI = TRUE
Specify the directory to store target information, for example: targetDir
- TARGET_DIR = targetDir

If all individuals have the same target:

Specify the single bed file, for example: target.bed
- UNIFORM_TARGET_BED = target.bed

If not all individuals have the same target:

Specify the file containing the sample id -> bed map, for example: targetMap.txt
- MULTIPLE_TARGET_MAP = targetMap.txt
  - Each line of the file contains [SM_ID] [TARGET_BED]

Optional Settings:

Extend the target region by a given number of bases, for example: 50
- OFFSET_OFF_TARGET = 50
Exclude off-target regions when using samtools view (may make command line too long)
- SAMTOOLS_VIEW_TARGET_ONLY = TRUE

Configure Reference Files

See Reference Files for information on how to specify the reference files.

Chromosome X Calling

PED_INDEX = pedfile.ped

Running

Running umake is straightforward:

/usr/local/biopipe/bin/umake.pl --conf umake.conf --snpcall --numjobs 2

Replace umake.conf with the approprate path/name of the user's configuration file.

If OUTDIR is not defined in the configuration file, add --outdir followed by the path to the user's desired output directory.

Update the value following --numjobs to the appropriate number of jobs that the user wants to run in parallel.

Running on a Cluster

To run on the Cluster, the following settings need to be added to the configuration file:

SLEEP_MULT =     20
MOS_PREFIX =     # PREFIX FOR MOSIX COMMAND (BLANK IF UNUSED)
MOS_NODES =      # COMMA-SEPARATED LIST OF NODES TO SUBMIT JOBS
REMOTE_PREFIX =  # REMOTE_PREFIX : Set if cluster node see the directory differently (e.g. /net/mymachine/[original-dir])

Set the MOS_NODES to the appropriate node list.

Update MOS_PREFIX to the applicable prefix.

For MOSIX, use:

MOS_PREFIX = mosrun -E/tmp -t -i

@@ Line 14: / Line 14: @@
 FASTQs can be converted to this type of BAM using the [[Mapping Pipeline]].
-Additional input Files including Pedigree files (PED format) (to specify gender information in chrX calling), Target information (UCSC's BED format) in targeted or whole exome capture sequencing may be provided.
-Configuration file contains core information of run-time options including the software binaries and command line arguments. Refer to the example configuration file for further information
-[edit]
 === Index File ===
 Each line of the index file represents each individual under the following format. Note that multiple BAMs per individual may be provided.
@@ Line 57: / Line 53: @@
 === Configuration File ===
-A default configuration file is automatically loaded.  Users must specify their own configuration file specifying just the values different than the defaults.
+Configuration file contains the run-time options including the software binaries and command line arguments.  A default configuration file is automatically loaded.  Users must specify their own configuration file specifying just the values different than the defaults.
 Comments begin with a <code>#</code>
@@ Line 64: / Line 60: @@
 Where KEY is the item being set and value is its new value
 ====Required User Config Files Settings====
 The following Config File Settings must be specified by the user:
-* CHRS = # space separated list of chromosomes you want
+* CHRS = space separated list of chromosomes you want
-* BAM_INDEX = # path to the Index File of BAMs
+* BAM_INDEX = path to the Index File of BAMs
 ====Required on Command-Line or in Config File====
 The following Command-Line or Config File Settings must be specified by the user:
-* --outdir/OUTDIR= # path to desired output directory
+* --outdir/OUTDIR= path to desired output directory
 ====Targeted/Exome Sequencing Settings====
@@ Line 96: / Line 91: @@
 *  Exclude off-target regions when using samtools view (may make command line too long)
 ** SAMTOOLS_VIEW_TARGET_ONLY = TRUE
 ==== Configure Reference Files ====
@@ Line 110: / Line 104: @@
 <code>
- '''cd ~/myseq'''
+  '''/usr/local/biopipe/bin/umake.pl --conf umake.conf --snpcall --numjobs 2
-  '''/usr/local/biopipe/bin/umake --conf myconf ???'''
- '''make -f [out-prefix].Makefile -j [# parallel jobs]'''
 </code>
+Replace umake.conf with the approprate path/name of the user's configuration file.
+If <code>OUTDIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory.
+Update the value following <code>--numjobs</code> to the appropriate number of jobs that the user wants to run in parallel.
+== Running on a Cluster ==
+To run on the Cluster, the following settings need to be added to the configuration file:
+ SLEEP_MULT =     20
+ MOS_PREFIX =     # PREFIX FOR MOSIX COMMAND (BLANK IF UNUSED)
+ MOS_NODES =      # COMMA-SEPARATED LIST OF NODES TO SUBMIT JOBS
+ REMOTE_PREFIX =  # REMOTE_PREFIX : Set if cluster node see the directory differently (e.g. /net/mymachine/[original-dir])
+Set the MOS_NODES to the appropriate node list.
+Update MOS_PREFIX to the applicable prefix.
+* For MOSIX, use:
+ MOS_PREFIX = mosrun -E/tmp -t -i

Difference between revisions of "GotCloud: Variant Calling Pipeline"

Revision as of 14:32, 6 November 2012

Contents

Input Data:

BAM files

Index File

Reference Files

Configuration File

Required User Config Files Settings

Required on Command-Line or in Config File

Targeted/Exome Sequencing Settings

Configure Reference Files

Chromosome X Calling

Running

Running on a Cluster

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools