Line 18: |
Line 18: |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. |
| *Run <code>ldrefine</code> pipeline test: | | *Run <code>ldrefine</code> pipeline test: |
− | gotcloud snpcall --test OUTPUT_DIR | + | gotcloud ldrefine --test OUTPUT_DIR |
| ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results | | ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. |
Line 61: |
Line 61: |
| *** only used by Thunder (part of ldrefine pipeline) | | *** only used by Thunder (part of ldrefine pipeline) |
| *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. | | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | The path to the BAM List file is defaulted to the <code>outputDirectory/bam.list</code>. It can be overridden by setting <code>--bamlist</code>, <code>--bam_list</code>, or <code>--list</code> on the command-line or by setting BAM_LIST in your configuration file to the path to the BAM List File. See [[#Required_Options|Required Options]] for more information. |
| | | |
| === Reference Files === | | === Reference Files === |
Line 79: |
Line 81: |
| {{:GotCloud: Configuration}} | | {{:GotCloud: Configuration}} |
| | | |
− | ====Additional Required User Config Files Settings==== | + | See [[#Variant Calling Command-line Options/Configuration Settings|Variant Calling Command-line Options/Configuration Settings]] for more information on Configuration options. |
− | The following Config File Settings must be specified by the user:
| + | |
− | * CHRS = space separated list of chromosomes you want
| + | ==== Example Configuration File ==== |
− | * BAM_INDEX = path to the Index File of BAMs
| + | Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5 |
| + | CHRS = 20 22 |
| + | BAM_LIST = /path/freeze5.bam.list |
| + | OUT_DIR = /path/freeze5/output |
| + | REF_DIR = /path/reference/ |
| + | REF = $(REF_DIR)/hs37d5.fa |
| + | INDEL_PREFIX = $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
| + | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | |
| | | |
− | ====Targeted/Exome Sequencing Settings==== | + | == Variant Calling Command-line Options/Configuration Settings == |
− | If you are running Targeted/Exome Sequencing, the user should specify:
| + | {{:GotCloud: Variant Calling Options}} |
− | * Write loci file when performing pileup
| |
− | ** WRITE_TARGET_LOCI = TRUE
| |
− | * Specify the output sub-directory to store target information, for example: targetDir
| |
− | ** Should not be a full path as this will co under the OUT_DIR directory.
| |
− | ** TARGET_DIR = targetDir
| |
| | | |
− | If all individuals have the same target:
| |
− | * Specify the single bed file, for example: target.bed
| |
− | ** UNIFORM_TARGET_BED = target.bed
| |
| | | |
− | If not all individuals have the same target:
| + | == Use Cases & Recommended Settings == |
− | * Specify the file containing the sample id -> bed map, for example: targetMap.txt
| + | === Single Sample Processing === |
− | ** MULTIPLE_TARGET_MAP = targetMap.txt
| + | To run single sample processing we recommend adding the following settings to your configuration file: |
− | *** Each line of the file contains [SM_ID] [TARGET_BED]
| + | UNIT_CHUNK = 20000000 |
| + | MODEL_GLFSINGLE = TRUE |
| + | MODEL_SKIP_DISCOVER = FALSE |
| + | MODEL_AF_PRIOR = TRUE |
| + | VCF_EXTRACT = $(REF_DIR)/snpOnly.vcf.gz |
| + | EXT = $(REF_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(REF_DIR)/chrCHR.filtered.sites.vcf.gz |
| | | |
− | Optional Settings:
| + | Explanation of these settings: |
− | * Extend the target region by a given number of bases, for example: 50 | + | * <code>UNIT_CHUNK</code> - since this is only 1 sample, process larger regions at a time than default |
− | ** OFFSET_OFF_TARGET = 50 | + | * <code>MODEL_GLFSINGLE</code> - single sample, so model glfsingle |
| + | * <code>MODEL_SKIP_DISCOVER</code> - do not skip the variant discovery step |
| + | * <code>MODEL_AF_PRIOR</code> - use AF prior for genotyping |
| + | * <code>VCF_EXTRACT</code> - VCF file to use for extracting the site information to genotype |
| + | ** This file is included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| + | * <code>EXT</code> - VCF reference files to use for the external filtering |
| + | ** These files are included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| | | |
− | ==== Chromosome X Calling ====
| |
− | Making calls on the X chromosome requires the user to specifty a PED file with sex information.
| |
− | * PED_INDEX = pedfile.ped
| |
| | | |
− | === Example Configuration File ===
| |
− | Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5
| |
− | CHRS = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| |
− | BAM_INDEX = /path/freeze5/freeze5.bam.index ### The BAM index file described above
| |
− | OUT_DIR = /path/freeze5/output ### Directory in which to put all gotcloud output
| |
− | REF = /path/reference/hs37d5.fa ### Reference sequence
| |
− | INDEL_PREFIX = /path/reference/1kg.pilot_release.merged.indels.sites.hg19 ### Known indel sites
| |
− | HM3_VCF = /path/reference/hapmap3_r3_b37.sites.vcf.gz ### HapMap variants (requires tabix index file in same directory)
| |
− | DBSNP_VCF = /path/reference/dbsnp_135.b37.sites.vcf.gz ### dbSNP variants (requires tabix index file in same directory)
| |
| | | |
| == Running == | | == Running == |
Line 132: |
Line 134: |
| * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel | | * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel |
| * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. | | * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. |
− |
| |
| | | |
| === Running on a Cluster === | | === Running on a Cluster === |
− | See [[#Cluster Configuration|Cluster Configuration]] settings for information on how to run on a cluster. | + | See [[#Cluster Configuration|Cluster Configuration]] for information on how to configure GotCloud to run on a cluster. |
| | | |
| == Results == | | == Results == |
Line 146: |
Line 147: |
| * glfs with a bams & samples subdirectory | | * glfs with a bams & samples subdirectory |
| * pvcfs with a subdirectory per chromosome and then per region | | * pvcfs with a subdirectory per chromosome and then per region |
− | * split with a subdirectory per chromosome | + | * '''split''' with a subdirectory per chromosome |
− | * vcfs with a subdirectory per chromosome | + | * '''vcfs''' with a subdirectory per chromosome |
| * (optionally your target directory) | | * (optionally your target directory) |
| | | |
− | Under the vcf/chrXX directory, there should be: | + | Under the '''vcf/chrXX''' directory, there should be: |
| * chrXX.filtered.sites.vcf | | * chrXX.filtered.sites.vcf |
| * chrXX.filtered.sites.vcf.norm.log | | * chrXX.filtered.sites.vcf.norm.log |
| * chrXX.filtered.sites.vcf.summary | | * chrXX.filtered.sites.vcf.summary |
− | * chrXX.filtered.vcf.gz | + | * '''chrXX.filtered.vcf.gz''' - final filtered variant call file |
| * chrXX.filtered.vcf.gz.OK | | * chrXX.filtered.vcf.gz.OK |
| * chrXX.filtered.vcf.gz.tbi | | * chrXX.filtered.vcf.gz.tbi |
Line 172: |
Line 173: |
| The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. | | The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. |
| | | |
− | Under the split/chrXX directory, there should be: | + | Under the '''split/chrXX''' directory, there should be: |
| * chrXX.filtered.PASS.split.[N].vcf.gz | | * chrXX.filtered.PASS.split.[N].vcf.gz |
| * chrXX.filtered.PASS.split.err | | * chrXX.filtered.PASS.split.err |
| * chrXX.filtered.PASS.split.vcflist | | * chrXX.filtered.PASS.split.vcflist |
− | * chrXX.filtered.PASS.gz | + | * '''chrXX.filtered.PASS.gz''' - final variant call file with only PASS variants |
| * subset.OK | | * subset.OK |