GotCloud: Versions

From Genome Analysis Wiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

For information on installing the releases, see: Install GotCloud Software

For information on issues/resolutions for specific versions, see: FAQ: Version Problems

Version 1.17 (Full Release on 5/14/2015)

Source can be downloaded from: https://github.com/statgen/gotcloud/releases/tag/gotcloud.1.17

General

  • Add ability to run custom pipelines
  • Fix bug in libVcfVcfFile.cpp
  • Fix some compatibility issues for CentOS5

Aligner

  • Add pipelines to run just recab & QC, and just QC.
  • VerifyBamID
    • Exclude ChrX & Y

SnpCall

Genotype Refinement

Indel

GenomeSTRiP


Version 1.16 (Full Release on 2/25/2015)

Source can be downloaded from: https://github.com/statgen/gotcloud/releases/tag/gotcloud.1.16

General

  • Update the default REF to hs37d5.fa (build 37 with decoy) and the default DBSNP_VCF to dbsnp version 142.
  • Upgrade perl scripts to use /usr/bin/env perl instead of /usr/bin/perl to make it compatible with more users
  • Upgrade to latest versions of libStatGen and bamUtil (versions 1.0.13)
    • Fixes bug in calculating the MD5s for the fasta in polishBam

Aligner

  • Update default aligner to bwa mem
    • you can still use bwa aln (the previous default) by adding the following setting to your configuration file:
      MAP_TYPE = BWA
  • Upgrade to bwa version 0.7.12
  • No longer call verifyBamID with the --verbose option


SnpCall

Genotype Refinement

Indel

  • Cleanup pipeline.pl to reduce errors in some versions of perl

GenomeSTRiP

Version 1.15 (Full Release on 12/16/2014)

General

  • Rename BAM_INDEX to BAM_LIST
  • Change default REF_DIR
  • Add ref_dir and list as command-line options to all pipelines
  • Add bed-diff script to compare VCFs

Aligner

  • By default, create BAM_LIST
  • Use SAMPLE instead of MERGE_NAME if MERGE_NAME is not specified in FASTQ_LIST
  • No longer require fastqs to end in 'fastq.gz' or 'fastq'
  • Rename INDEX_FILE to FASTQ_LIST and infer all fields except FASTQ1, FASTQ2, and either SAMPLE or MERGE_NAME
  • Change --numcs to --numjobs and what was --numjobs to --threads
  • Update to latest BWA
    • Update aligner to pass \t instead of tabs for the RG fieldto new version of BWA
  • By default, no longer store OQ

SnpCall

  • Add validation that:
    • Each BAM has only 1 sample
    • BAM's sampleID matches id in BAM_LIST
      • Use --ignoreSMcheck to disable this validation
  • Updated Exome/Targeted settings
    • Set TARGET_DIR and OFFSET_OFF_TARGET (0) in defaults
    • Remove WRITE_TARGET_LOCI and base it on whether or not UNIFORM_TARGET_BED/MULTIPLE_TARGET_MAP are set and either the loci file doesn't exist, is older than the bed, or was created by a different bed
  • Add validation that tabix in perl scripts succeed
  • Fix some bugs in glfFlex & add region option
  • Cleanup logs so they no longer spew to the screen
  • Add ext-filt option for single sample filtering
  • Add .OK file after vcflist file successfully created

Genotype Refinement

  • Add validation that tabix in perl scripts succeed
  • Add .OK file after vcflist file successfully created

Indel

  • Update default region settings
  • Move output directories to an "indel" folder

GenomeSTRiP

  • Add a GenomeSTRiP pipeline


Version 1.14 (Full Release on 8/29/2014)

General

  • Add initial beagle4 support (as a new pipeline)
  • Improve input validation
    • Add chromosome name consistency checks to all tools
  • Upgrade version of bgzf
  • Upgrade libStatGen to fix mergeBam issue.

Aligner

  • Cleanup reading of fastq index/info file
    • ignore empty lines (generates a warning)
    • compress extra tabs/trim white space
  • Validate that BWA_QUAL and BWA_THREADS settings are properly formatted

SnpCall

  • Replace glfMultiples with glfFlex
  • Validate format of BAM_INDEX file
  • Add INDEL_VCF as an alternate for INDEL_PREFIX for input indel vcfs that aren't split by chromosome.

Genotype Refinement

  • Only run beagle/thunder with more than 1 sample

Indel

  • mergeBams for a single sample as its own step (didn't work before)
  • Fix bug that it would fail if the list of files was too long
  • Add input validation
  • Validate format of BAM_INDEX file


Version 1.13 (Full Release on 7/15/2014)

General

  • Cleanup runcluster
  • Upgrade to bamUtil v1.0.12a
  • Upgrade to libStatGen v1.0.12
  • Update README to add build instructions & wiki references

Aligner

  • Increment to latest VerifyBamID

Variant Calling

  • Update glfMultiples to handle when first glf is empty
  • Add check for the output file before creating the .OK file
  • VcfPileup - improve return codes
  • Write jobfiles into a sub-directory
  • Added a snpcall monitoring utility
  • VcfSplit - update to only append .gz in the vcflist if there was at least one file
  • Write start/stop timestamps into a logfile (generated by runcluster)

Genotype Refinement

  • Update beagle2Vcf.pl to use 255 for missing PL/PL3 values
  • Update vcf2Beagle and beagle2Vcf to handle biallelic indels
    • Still doesn't handle any multiallelic variants
  • Added a ldrefine test.

Indel Calling

  • Initial version of Indel Caller
    • Still in testing phases, if you use, please provide feedback.

Version 1.12 (Full Release on 1/17/2014)

General

  • GotCloud now works when installed in a bin/ directory.
  • Add tabix source and build & bgzip build
  • Add some Copyright information
  • Fix printing of a failed run's return code
  • Upgrade to latest libStatGen & bamUtil. See links for version details.
    • Slightly newer than 1.0.10 for both - versions on 1/17/2014.
    • dedup & recab now ignore Secondary reads
    • mergeBam ignores PI header field when merging
    • Add PhoneHome - gotCloud applies a PhoneHome thinning (BAMUTIL_THINNING) defaulted to 10 (10% of the time bamUtil does PhoneHome)
  • Upgrade QPLOT to ignore secondary reads
  • samtools
    • Update samtools index to return an error code if it fails to build the index

Aligner

  • Updgrade BWA
    • BWA_MEM is now an option
  • Write timestamps to Makefile log as steps start & complete
  • Remove tmp files as gotCloud goes, rather than at the end.
  • Deprecate RUN_QPLOT & RUN_VERIFY_BAM_ID
    • Now the steps to run are specified in configuration.
  • Mosaik
    • Upgrade to version from Oct 29, 2013
    • Add premo for pre-Mosaik processing

Variant Calling

  • Update to properly handle empty VCFs
  • Run make with -k option to run as much as possible after a failure.
  • Update to allow steps to be dependent on BAMs (BAM_DEPEND) so they will rerun if a BAM has a newer timestamp.
  • Input Validation
    • Check that BAMs exist & are not empty prior to running steps that require BAMs.
    • Check that filters min/maxDP are numbers, not fractions.
  • GlfMultiples
    • update to use DP instead of GD and fix PL description in format field header
    • add region option
  • samtools-hybrid
    • fail on missing BGZF EOF indicator

Genotype Refinement

  • Add a default number of states to Thunder

Version 1.11 (Full Release on 9/6/2013)

Aligner

  • Remove an extra space from the Makefile for the dedup command.
  • Brought in latest bwa source, but it is not yet being used.

Variant Calling

  • Rename OUT_PREFIX to MAKE_BASE_NAME to specify the base filename for snpcall, ldrefine (beagle & thunder), & vc Makefiles. The typeOfRun.Makefile is appended to MAKE_BASE_NAME.
    • These Makefiles all used to have the same name and would overwrite each other
    • --makebasename/--make_basename/--make_base_name can be specified on the command-line
    • Default value for MAKE_BASE_NAME is umake
      • snpcall is now: $(MAKE_BASE_NAME).snpcall.Makefile (default umake.snpcall.Makefile)
      • ldrefine beagle step is now: $(MAKE_BASE_NAME).beagle.Makefile (default umake.beagle.Makefile)
      • ldrefine thunder step is now: $(MAKE_BASE_NAME).thunder.Makefile (default umake.thunder.Makefile)
      • vc is now: $(MAKE_BASE_NAME).vc.Makefile (default umake.vc.Makefile)
  • Added gotcloud beagle and gotcloud thunder commands so that beagle/thunder can be called independently rather than just through ldrefine.
  • Add command-line options to gotcloud vc for running just certain steps rather than having to set RUN...=true in the configuration
    • More than one --commandToRun can be specified at once
    • New commands-line options:
      • --index (or RUN_INDEX = true in the configuration file)
      • --pileup (or RUN_PILEUP = true in the configuration file)
      • --glfMultiples (or RUN_GLFMULTIPLES = true in the configuration file)
      • --vcfPileup (or RUN_VCFPILEUP = true in the configuration file)
      • --filter (or RUN_FILTER = true in the configuration file)
      • --svm (or RUN_SVM = true in the configuration file)
      • --split (or RUN_SPLIT = true in the configuration file)
  • Cleaned up the snpcall Makefile entries for pileup. It used to print targets/commands that were never executed. These unused targets have now been removed

Aligner & Variant Calling

  • Remove trailing spaces from configuration values
  • Add MAKE_OPTS configuration value that allows users to add Makefile options to the make calls that run the pipelines.
  • Update gccalcstorage for better estimates and to have option to print estimates from a starting size rather than from actually input files

Version 1.10 (Full Release on 8/22/2013)

Aligner

  • Update gccalcstorage for better align estimates

Variant Calling

  • Add additional comments to umake.pl
  • Update vcf-summary to print the skipped counts
  • Add option to specify the REF_FAI file used by the umake (gotcloud) script for determining CHRs and their lengths.

Aligner & Variant Calling

  • Only print Configuration settings to a file if the file doesn't exist

Version 1.09a (Full Release on 8/08/2013)

Aligner

  • Fix relative paths
  • Upgrade to newest samtools (and add source)
  • Update gcrunsummary.pl - summary stats for the run.
  • Upgrade to newer Mosaik

Variant Calling

  • Fix minNS filter for odd number of samples. It used to give a fraction and then would be ignored.

Aligner & Variant Calling

  • Cleanup phonehome script
  • Cleanup gotcloud script and add ability to run perf/audria for dev purposes.

Version 1.08 (Full Release on 7/31/2013)

Aligner

  • no aligner only changes

Variant Calling

  • Add the ability to copy a glf to a different directory prior to running glfExtract or glfMultiples
  • Remove chromosome Y from the default CHRS. Also allow CHRS to be set on the commandline via a comma separated list specified in --chrs
  • Update glfMerge to skip glf files that only have a header.
  • Change default FILTER_MAX_SAMPLE_DP to 1000 (from 20)
  • Some SVM updates
  • Added the vc option to gotcloud which uses the RUN_...settings to decide which steps to use.

Aligner & Variant Calling

  • Fix bug in Conf.pm that caused a failure in some versions of perl
  • Add the ability to set the GOTCLOUD_ROOT so you can test with an alternate align.pl/umake.pl script and still be able to access everything else from the standard gotcloud path.
  • Cleanup the perldoc for align/snpcall
  • Output all configuration settings into a file when running.
  • Upgrade to most current libStatGen
  • Compile as optimized

Version 1.07 (Full Release on 7/3/2013)

Aligner

  • DEPRECATED configuration settings:
    • 'BWA_MAX_MEM' is now 'SORT_MAX_MEM'
    • 'VERIFY_BAM_ID_OPTIONS' is now 'verifyBamID_USER_PARAMS'
  • ALN_TMP now defaults to $(TMP_DIR)/alignment.aln rather than $(TMP_DIR)/alignment.bwa
  • Upgrade to latest QPLOT
    • GC Content file has been renamed to have the extension: .winsize100.gc
  • Automatically generates the bam index file if BAM_INDEX is specified
  • Run DEDUP & RECAB as 1 step instead of 2
  • Update dedup, recab, qplot, & verifyBamID steps to be specified via configuration
    • Easier to insert steps between/before/after these
    • Use PER_MERGE_STEPS to disable any of these steps (see gotcloudDefaults.conf for its default setting)
      • RUN_QPLOT and RUN_VERIFY_BAM_ID are only used for validating executable/reference existence and will be deprecated completely soon
  • Fixed bug where the merge failed if there was only 1 fastq pair
  • Improve informational messages
  • Update to BWA version 0.6.1-r104
  • Bring in mergeBam updates from latest bamUtil
    • ignore PG lines with duplicate ids
  • General code cleanup
  • Add some Mosaik support
    • Added support to align.pl and a way to enable it, but the code doesn't compile
  • Calculate approximate storage needed for GotCloud so user can have an idea what is coming
  • Makefile now uses bash and pipefail to catch errors that occur within piped commands
  • Removed the md5sum calculation

Variant Calling

  • Update to always require REF
    • this fixes bug that ldrefine was not checking REF or adding the optional prefix to it.
  • SVM - fix bug on qual check in run_libsvm.pl
  • Update defaults for filtering
  • Fixed bug in libVcf/VcfFile that had FamID instead of FatID
  • Fixed bug in samtools-hybrid that caused it to fail when checking for BAI files if bam was elsewhere in the filename
  • Fix vcfPielup to accept .bam.bai or .bai in bam index filenames.
  • Fix the split logic to work if a VCF file had no PASS records

Aligner & Variant Calling

  • Add checks for required executables prior to running
  • Limit the number of jobs that can run locally (there is a flag to override this)
  • Extract configuration routines from the 2 .pl's to a common Conf.pm
  • Add FLUX support
  • 1st attempt at checking for new versions
    • Doesn't quite always work yet, but shouldn't cause a problem

Version 1.06 (Full Release on 4/17/2013)

Variant Calling

  • Update to always require REF
    • this fixes bug that ldrefine was not checking REF or adding the optional prefix to it.


Version 1.05 (Full Release on 4/17/2013)

Aligner & Variant Calling

  • Cleanup handling of BASE_PREFIX & added REF_PREFIX.
    • Allows user to specify --base_prefix or --baseprefix on command-line
    • Now used for index files & reference files in addition to fastqs (aligner) and bams (variant calling)


Version 1.04 (Full Release on 4/16/2013)

Aligner & Variant Calling

  • Update relative paths to be relative to the current working directory
    • Aligner effects:
      • INDEX_FILE as specified in the aligner configuration
      • fastq paths specified in the INDEX_FILE
    • Variant Calling effects:
    • BAM_INDEX as specified in the configuration
    • bam paths specified in the BAM_INDEX
  • Add getAbsPath() method for determining the absolute path with the additional capability of prepending an optional PREFIX (as specified in configuration) to the directory:
    • BASE_PREFIX
    • FASTQ_PREFIX (for aligner reading the fastq index file)
      • renamed from FASTQ/FASTQ_REF
    • BAM_PREFIX (for variant calling reading bam index file)
  • Improve Error detection
    • With --test option, check that the testdir exists before running the test

Cluster Support

  • Update the mosix option to run mosbatch instead of mosrun
  • Only attempt to "fix" the CWD for mosix/mosbatch
    • Remove the warning if this "fix" fails
    • This "fix" is specific for running at UM, but should not cause a failure when running elsewhere

Includes all updates from previous Internal Only Releases.

Version 1.03a6 (Internal Only Release on 4/10/2013)

  • Cleanup the cluster support code
    • Also add support for fixing the problem with UMich directories when using Mosix
  • Update the default Reference directory to be as expected for UM
  • Variant Calling changes:
    • SVM
      • Add option to merge all chromosome sites prior to running SVM (to better support targeted sequencing)
    • Cleanup some of the Makefile dependencies to depend on files rather than phony targets (this prevents it from always rerunning those steps)

Version 1.03a5 (Internal Only Release on 4/5/2013)

  • Add pre-checks for required files & reference files prior to running
  • Add checks for deprecated configuration settings
  • Merge aligner & variant calling default configurations into a single file (bin/gotcloudDefaults.conf)
  • Aligner
    • Update to put actual values into the Makefile recipes rather than using variables
  • Variant Calling
    • Fix vcf-summary to handle chromosomes that have string names (like X,Y)

Version 1.03a4 (Internal Only Release on 4/2/2013)

  • Variant Calling:
    • Update to by default run as local
    • Target Loci file updates:
      • When WRITE_TARGET_LOCI is set to true: only generate the .loci file if the specified bed is newer than the loci file
      • When WRITE_TARGET_LOCI is set to ALWAYS, generate the .loci file regardless of the timestamps
    • Only create the glf index file for a region if it does not exist or is older than the bam index file

Version 1.03a3 (Internal Only Release on 3/29/2013)

  • Attempted to Fix bug that it wasn't properly running batching
    • This version was not good (fixed in 1.034a.

Version 1.03a2 (Internal Only Release on 3/27/2013)

  • Add the qplot source code

Version 1.03a1 (Internal Only Release on 3/26/2013)

  • Variant Calling
    • Add FILTER_MIN_NS to add the option of filtering based on the number of samples
    • Add FILTER_ADDITIONAL to add the option of adding additional filters.

Version 1.03a (Full Release on 3/22/2013)

  • Cleanup README & INSTALL instructions
  • Variant Calling
    • Fix dependency bug/error in SVM
    • Fix commands that run locally to check for pipe failures
    • Improve file open error detection in SVM logic
  • Add option to obtain the version number

Version 1.03 (Full Release on 3/15/2013)

  • Add SVM Filtering
    • there was a bug in this, please do not use this version.
    • Version 1.03a fixes this bug.

Version 1.02 (Full Release on 3/13/2013)

  • Cleanup cluster scripts
  • Rename alinger to align.pl & umake to snp
  • Add VerifyBamID source
  • MANY Updates, please use a newer version.