Line 18: |
Line 18: |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. |
| *Run <code>ldrefine</code> pipeline test: | | *Run <code>ldrefine</code> pipeline test: |
− | gotcloud snpcall --test OUTPUT_DIR | + | gotcloud ldrefine --test OUTPUT_DIR |
| ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results | | ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. |
Line 61: |
Line 61: |
| *** only used by Thunder (part of ldrefine pipeline) | | *** only used by Thunder (part of ldrefine pipeline) |
| *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. | | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | The path to the BAM List file is defaulted to the <code>outputDirectory/bam.list</code>. It can be overridden by setting <code>--bamlist</code>, <code>--bam_list</code>, or <code>--list</code> on the command-line or by setting BAM_LIST in your configuration file to the path to the BAM List File. See [[#Required_Options|Required Options]] for more information. |
| | | |
| === Reference Files === | | === Reference Files === |
Line 79: |
Line 81: |
| {{:GotCloud: Configuration}} | | {{:GotCloud: Configuration}} |
| | | |
− | ====Additional Required User Config Files Settings==== | + | See [[#Variant Calling Command-line Options/Configuration Settings|Variant Calling Command-line Options/Configuration Settings]] for more information on Configuration options. |
− | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| + | |
− | ! Configuration Key !! Command-line Flag !! Value Description !! Default Value
| + | ==== Example Configuration File ==== |
− | |-
| + | Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5 |
− | |CHRS||--chrs || pace separated list of chromosomes to process || 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
| + | CHRS = 20 22 |
− | |-
| + | BAM_LIST = /path/freeze5.bam.list |
− | | BAM_LIST|| --list || path to the [[#BAM List File|BAM List File]] || $(OUT_DIR)/bam.list
| + | OUT_DIR = /path/freeze5/output |
− | |}
| + | REF_DIR = /path/reference/ |
| + | REF = $(REF_DIR)/hs37d5.fa |
| + | INDEL_PREFIX = $(REF_DIR)/1kg.pilot_release.merged.indels.sites.hg19 |
| + | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| + | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | |
| | | |
− | ====Targeted/Exome Sequencing Settings==== | + | == Variant Calling Command-line Options/Configuration Settings == |
− | If you are running Targeted/Exome Sequencing, the user should specify:
| + | {{:GotCloud: Variant Calling Options}} |
| | | |
− | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| |
− | ! Configuration Key !! Value Description
| |
− | |-
| |
− | |UNIFORM_TARGET_BED|| Bed file of targeted regions (same bed for all samples)
| |
− | |-
| |
− | |MULTIPLE_TARGET_MAP|| Filename of file mapping: sample id -> bed file of targeted regions
| |
− | Each line of the file contains: [SM_ID] [TARGET_BED]
| |
− | |-
| |
− | |OFFSET_OFF_TARGET|| Number of bases by which to extend the target region
| |
− | (default is 0, do not extend the target region)
| |
− | |-
| |
− | |SAMTOOLS_VIEW_TARGET_ONLY || When performing samtools view, if set to true, exclude off-target regions
| |
− | (default is false)
| |
| | | |
− | You may not want to set this to true due to it may:
| + | == Use Cases & Recommended Settings == |
− | *''make command line too long'' | + | === Single Sample Processing === |
− | *''produce an error if reads overlap multiple targeted regions'' | + | To run single sample processing we recommend adding the following settings to your configuration file: |
− | ** see: [[GotCloud: FAQs#Targetted/Exome|GotCloud: FAQs->Targetted/Exome]] | + | UNIT_CHUNK = 20000000 |
− | |}
| + | MODEL_GLFSINGLE = TRUE |
| + | MODEL_SKIP_DISCOVER = FALSE |
| + | MODEL_AF_PRIOR = TRUE |
| + | VCF_EXTRACT = $(REF_DIR)/snpOnly.vcf.gz |
| + | EXT = $(REF_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(REF_DIR)/chrCHR.filtered.sites.vcf.gz |
| + | |
| + | Explanation of these settings: |
| + | * <code>UNIT_CHUNK</code> - since this is only 1 sample, process larger regions at a time than default |
| + | * <code>MODEL_GLFSINGLE</code> - single sample, so model glfsingle |
| + | * <code>MODEL_SKIP_DISCOVER</code> - do not skip the variant discovery step |
| + | * <code>MODEL_AF_PRIOR</code> - use AF prior for genotyping |
| + | * <code>VCF_EXTRACT</code> - VCF file to use for extracting the site information to genotype |
| + | ** This file is included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| + | * <code>EXT</code> - VCF reference files to use for the external filtering |
| + | ** These files are included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| | | |
− | ==== Chromosome X Calling ====
| |
− | Making calls on the X chromosome requires the user to specifty a PED file with sex information.
| |
− | * PED_INDEX = pedfile.ped
| |
| | | |
− | === Example Configuration File ===
| |
− | Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5
| |
− | CHRS = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| |
− | BAM_INDEX = /path/freeze5/freeze5.bam.index ### The BAM index file described above
| |
− | OUT_DIR = /path/freeze5/output ### Directory in which to put all gotcloud output
| |
− | REF = /path/reference/hs37d5.fa ### Reference sequence
| |
− | INDEL_PREFIX = /path/reference/1kg.pilot_release.merged.indels.sites.hg19 ### Known indel sites
| |
− | HM3_VCF = /path/reference/hapmap3_r3_b37.sites.vcf.gz ### HapMap variants (requires tabix index file in same directory)
| |
− | DBSNP_VCF = /path/reference/dbsnp_135.b37.sites.vcf.gz ### dbSNP variants (requires tabix index file in same directory)
| |
| | | |
| == Running == | | == Running == |
Line 138: |
Line 134: |
| * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel | | * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel |
| * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. | | * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. |
− |
| |
| | | |
| === Running on a Cluster === | | === Running on a Cluster === |
Line 152: |
Line 147: |
| * glfs with a bams & samples subdirectory | | * glfs with a bams & samples subdirectory |
| * pvcfs with a subdirectory per chromosome and then per region | | * pvcfs with a subdirectory per chromosome and then per region |
− | * split with a subdirectory per chromosome | + | * '''split''' with a subdirectory per chromosome |
− | * vcfs with a subdirectory per chromosome | + | * '''vcfs''' with a subdirectory per chromosome |
| * (optionally your target directory) | | * (optionally your target directory) |
| | | |
− | Under the vcf/chrXX directory, there should be: | + | Under the '''vcf/chrXX''' directory, there should be: |
| * chrXX.filtered.sites.vcf | | * chrXX.filtered.sites.vcf |
| * chrXX.filtered.sites.vcf.norm.log | | * chrXX.filtered.sites.vcf.norm.log |
| * chrXX.filtered.sites.vcf.summary | | * chrXX.filtered.sites.vcf.summary |
− | * chrXX.filtered.vcf.gz | + | * '''chrXX.filtered.vcf.gz''' - final filtered variant call file |
| * chrXX.filtered.vcf.gz.OK | | * chrXX.filtered.vcf.gz.OK |
| * chrXX.filtered.vcf.gz.tbi | | * chrXX.filtered.vcf.gz.tbi |
Line 178: |
Line 173: |
| The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. | | The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. |
| | | |
− | Under the split/chrXX directory, there should be: | + | Under the '''split/chrXX''' directory, there should be: |
| * chrXX.filtered.PASS.split.[N].vcf.gz | | * chrXX.filtered.PASS.split.[N].vcf.gz |
| * chrXX.filtered.PASS.split.err | | * chrXX.filtered.PASS.split.err |
| * chrXX.filtered.PASS.split.vcflist | | * chrXX.filtered.PASS.split.vcflist |
− | * chrXX.filtered.PASS.gz | + | * '''chrXX.filtered.PASS.gz''' - final variant call file with only PASS variants |
| * subset.OK | | * subset.OK |