Line 18: |
Line 18: |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run snpcall on your own samples. |
| *Run <code>ldrefine</code> pipeline test: | | *Run <code>ldrefine</code> pipeline test: |
− | gotcloud snpcall --test OUTPUT_DIR | + | gotcloud ldrefine --test OUTPUT_DIR |
| ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results | | ** Where <code>OUTPUT_DIR</code> is the directory where you want to store the test results |
| ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. | | ** If you see <code>Successfully ran the test case, congratulations!</code>, then you are ready to run ldrefine on your own samples. |
Line 61: |
Line 61: |
| *** only used by Thunder (part of ldrefine pipeline) | | *** only used by Thunder (part of ldrefine pipeline) |
| *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. | | *** if all samples are from the same population, population label can be skipped or you can just specify <code>ALL</code> for the population label for each sample. |
| + | |
| + | The path to the BAM List file is defaulted to the <code>outputDirectory/bam.list</code>. It can be overridden by setting <code>--bamlist</code>, <code>--bam_list</code>, or <code>--list</code> on the command-line or by setting BAM_LIST in your configuration file to the path to the BAM List File. See [[#Required_Options|Required Options]] for more information. |
| | | |
| === Reference Files === | | === Reference Files === |
Line 79: |
Line 81: |
| {{:GotCloud: Configuration}} | | {{:GotCloud: Configuration}} |
| | | |
− | ====Additional Required User Config Files Settings====
| + | See [[#Variant Calling Command-line Options/Configuration Settings|Variant Calling Command-line Options/Configuration Settings]] for more information on Configuration options. |
− | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| |
− | ! Configuration Key !! Command-line Flag !! Value Description !! Default Value
| |
− | |-
| |
− | |CHRS||--chrs || pace separated list of chromosomes to process || 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
| |
− | |-
| |
− | | BAM_LIST|| --list || path to the [[#BAM List File|BAM List File]] || $(OUT_DIR)/bam.list
| |
− | |}
| |
− | | |
− | ====Targeted/Exome Sequencing Settings====
| |
− | If you are running Targeted/Exome Sequencing, the user should specify:
| |
− | | |
− | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| |
− | ! Configuration Key !! Value Description
| |
− | |-
| |
− | |UNIFORM_TARGET_BED|| Bed file of targeted regions (same bed for all samples)
| |
− | |-
| |
− | |MULTIPLE_TARGET_MAP|| Filename of file mapping: sample id -> bed file of targeted regions
| |
− | Each line of the file contains: [SM_ID] [TARGET_BED]
| |
− | |-
| |
− | |OFFSET_OFF_TARGET|| Number of bases by which to extend the target region
| |
− | (default is 0, do not extend the target region)
| |
− | |-
| |
− | |SAMTOOLS_VIEW_TARGET_ONLY || '''true''': speeds up processing by excluding off-target regions initially when performing samtools view
| |
− | | |
− | '''false''' (default): off-target regions are not excluded when performing samtools view, but are excluded at a later step
| |
− | | |
− | '''Warning:''' You may not want to set this to true due to it may:
| |
− | *''make command line too long''
| |
− | *''produce an error if reads overlap multiple targeted regions''
| |
− | ** see: [[GotCloud: FAQs#Targetted/Exome|GotCloud: FAQs->Targetted/Exome]]
| |
− | |}
| |
− | | |
− | ==== Chromosome X Calling ====
| |
− | For proper Chromosome X calling, it is recommended to specify a PED file with sex information:
| |
− | {| class="wikitable" style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| |
− | ! Configuration Key !! Value Description
| |
− | |-
| |
− | |PED_INDEX|| ped file containing sampleID (2nd column) and sex (5th column)
| |
− | |}
| |
− | | |
− | Format of PED file:
| |
− | :<code>familyID sampleID fatherID motherID sex</code>
| |
− | * Only <code>sampleID</code> and <code>sex</code> are used
| |
| | | |
− | === Example Configuration File === | + | ==== Example Configuration File ==== |
| Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5 | | Example configuration file where reference files happen to be stored in /path/reference, and bam index file in path/freeze5 |
| CHRS = 20 22 | | CHRS = 20 22 |
Line 134: |
Line 93: |
| HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz | | HM3_VCF = $(REF_DIR)/hapmap3_r3_b37.sites.vcf.gz |
| DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz | | DBSNP_VCF = $(REF_DIR)/dbsnp_135.b37.sites.vcf.gz |
| + | |
| + | |
| + | == Variant Calling Command-line Options/Configuration Settings == |
| + | {{:GotCloud: Variant Calling Options}} |
| + | |
| + | |
| + | == Use Cases & Recommended Settings == |
| + | === Single Sample Processing === |
| + | To run single sample processing we recommend adding the following settings to your configuration file: |
| + | UNIT_CHUNK = 20000000 |
| + | MODEL_GLFSINGLE = TRUE |
| + | MODEL_SKIP_DISCOVER = FALSE |
| + | MODEL_AF_PRIOR = TRUE |
| + | VCF_EXTRACT = $(REF_DIR)/snpOnly.vcf.gz |
| + | EXT = $(REF_DIR)/ALL.chrCHR.phase3.combined.sites.unfiltered.vcf.gz $(REF_DIR)/chrCHR.filtered.sites.vcf.gz |
| + | |
| + | Explanation of these settings: |
| + | * <code>UNIT_CHUNK</code> - since this is only 1 sample, process larger regions at a time than default |
| + | * <code>MODEL_GLFSINGLE</code> - single sample, so model glfsingle |
| + | * <code>MODEL_SKIP_DISCOVER</code> - do not skip the variant discovery step |
| + | * <code>MODEL_AF_PRIOR</code> - use AF prior for genotyping |
| + | * <code>VCF_EXTRACT</code> - VCF file to use for extracting the site information to genotype |
| + | ** This file is included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| + | * <code>EXT</code> - VCF reference files to use for the external filtering |
| + | ** These files are included in the latest reference release: [[GotCloud:_Genetic_Reference_and_Resource_Files#hs37d5-db142|hs37d5-db142]] |
| + | |
| + | |
| | | |
| == Running == | | == Running == |
Line 148: |
Line 134: |
| * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel | | * Replace <code>2</code> following <code>--numjobs</code> with the number of jobs to be run in parallel |
| * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. | | * If <code>OUT_DIR</code> is not defined in the configuration file, add <code>--outdir</code> followed by the path to the user's desired output directory. |
− |
| |
| | | |
| === Running on a Cluster === | | === Running on a Cluster === |
Line 162: |
Line 147: |
| * glfs with a bams & samples subdirectory | | * glfs with a bams & samples subdirectory |
| * pvcfs with a subdirectory per chromosome and then per region | | * pvcfs with a subdirectory per chromosome and then per region |
− | * split with a subdirectory per chromosome | + | * '''split''' with a subdirectory per chromosome |
− | * vcfs with a subdirectory per chromosome | + | * '''vcfs''' with a subdirectory per chromosome |
| * (optionally your target directory) | | * (optionally your target directory) |
| | | |
− | Under the vcf/chrXX directory, there should be: | + | Under the '''vcf/chrXX''' directory, there should be: |
| * chrXX.filtered.sites.vcf | | * chrXX.filtered.sites.vcf |
| * chrXX.filtered.sites.vcf.norm.log | | * chrXX.filtered.sites.vcf.norm.log |
| * chrXX.filtered.sites.vcf.summary | | * chrXX.filtered.sites.vcf.summary |
− | * chrXX.filtered.vcf.gz | + | * '''chrXX.filtered.vcf.gz''' - final filtered variant call file |
| * chrXX.filtered.vcf.gz.OK | | * chrXX.filtered.vcf.gz.OK |
| * chrXX.filtered.vcf.gz.tbi | | * chrXX.filtered.vcf.gz.tbi |
Line 188: |
Line 173: |
| The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. | | The filtered is the merged.vcf after it has been run through filters and is marked with PASS/FAIL. |
| | | |
− | Under the split/chrXX directory, there should be: | + | Under the '''split/chrXX''' directory, there should be: |
| * chrXX.filtered.PASS.split.[N].vcf.gz | | * chrXX.filtered.PASS.split.[N].vcf.gz |
| * chrXX.filtered.PASS.split.err | | * chrXX.filtered.PASS.split.err |
| * chrXX.filtered.PASS.split.vcflist | | * chrXX.filtered.PASS.split.vcflist |
− | * chrXX.filtered.PASS.gz | + | * '''chrXX.filtered.PASS.gz''' - final variant call file with only PASS variants |
| * subset.OK | | * subset.OK |