BAM Review Action Items

From Genome Analysis Wiki
Revision as of 09:50, 17 September 2010 by Mktrost (talk | contribs)
Jump to: navigation, search

Review Sept 17th

Review Discussion Topics

Return Statuses

Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure:

   SamFile samIn;
   if(!samIn.OpenForRead(argv[1]))
   {
      fprintf(stderr, "%s\n", samIn.GetStatusMessage());
      return(samIn.GetStatus());
   }

   // Read the sam header.
   SamFileHeader samHeader;
   if(!samIn.ReadHeader(samHeader))
   {
      fprintf(stderr, "%s\n", samIn.GetStatusMessage());
      return(samIn.GetStatus());
   }

A previous recommendation was to "Add an option by class that says whether or not to abort on failure. (or even an option on each method)"

I am proposing modifying the classes to throw exceptions on failures. It would then be up to the user to catch them if they want to handle them or to let them exit the program (which would print out the error message)

   SamFile samIn;
   samIn.OpenForRead(argv[1]);

   // Read the sam header.
   SamFileHeader samHeader;
   samIn.ReadHeader(samHeader);

   // Open the output file for writing.
   SamFile samOut;
   try
   {
      samOut.OpenForWrite(argv[2]);
      samOut.WriteHeader(samHeader);
   }
   catch(GenomeException e)
   {
      std::cout << "Caught an Exception" << e.what() << std::endl;
   }
   std::cout << "Continue Processing\n";

For caught exceptions, you would see the following and processing would continue:

Caught Exception:
FAIL_IO: Failed to Open testFiles/unknown for writing
Continue Processing

For an uncaught exception, you would see the following and processing would be stopped:

terminate called after throwing an instance of 'GenomeException'
  what():  
FAIL_IO: Failed to Open testFiles/unknown for reading
Aborted
SamFileHeader
  • Should this be renamed to SamHeader?
  • Do we like the classes being named starting with Sam? Should it be Bam?

Review June 7th

  • Move the examples from the SamFile wiki page to their own page
    • include links from the main library page and the SamFile page.
    • look into why the one example have two if checks on SamIn status - one was printing the result and one was setting the return value - cleaned up to be in one if statement.
  • Create 1 library for all of our library code rather than having libcsg, libbam, libfqf separated.
    • What should this library be called?
      • libdna
      • libdna++
      • libsequence++
      • libDNA
      • libgenotype
  • Add an option by class that says whether or not to abort on failure. (or even an option on each method)
    • This allows calling code to set that option and then not have to check for failures since the code it calls would abort on a failure.
    • Could/should this be achieved using exceptions? User can decide to catch them or let them terminate the program.
  • SamFile add a constructor that takes the filename and a flag to indicate open for read/write. (abort on failure to open)
    • Also have 2 subclasses one that opens for read, one for write: SamReadFile, SamWriteFile? Or SamFileRead, SamFileWrite? - went with SamFileReader and SamFileWriter
  • Add a function that says: skipInvalidRecords, validateRecords, etc.
    • That way, ReadRecord will keep reading records until a valid/parseable one is found.
  • SamFileHeader::setTag - instead of having separate ones for PG, RG, etc, have a generic one that takes as a parameter which one it is.
    • KeyID, then Value as parameters....(keyID first, then value)
  • SamFileHEader::setProgramName, etc...have specific methods for setting fields so users don't need to know the specific tags, etc used for certain values in the header.
    • KeyID, then Value as parameters....(keyID first, then value)
  • BAM write utility could add a PG field with default settings (user could specify alternate settings) when it writes a file.
  • Future methods to add:
    • SamFile::setReadSection(const std::string& refName) - take in the reference name by string since that is what most people will know.
      • "" would indicate the ones not associated with a reference.