Changes

From Genome Analysis Wiki
Jump to navigationJump to search
5,983 bytes added ,  12:01, 2 February 2017
Line 1: Line 1: −
=== Review Sept 17th ===
+
[[Category:libStatGen]]
==== Review Discussion Topics ====
+
[[Category:libStatGen BAM]]
===== Return Statuses =====
+
 
 +
== Review Sept 20th ==
 +
=== Notes ===
 +
* returning const char*
 +
* SamFileHeader change referenceContigs, etc to private from public
 +
* Add way to copy a SAM record.
 +
 
 +
== Review Sept 17th ==
 +
=== Topics Discussed ===
 +
* [[#Return Statuses|Checking if methods succeeded/failed (checking return values/return statuses)]]
 +
* [[#Accessing String Values|Strings as return values]]
 +
 
 +
=== NOTES From Meeting ===
 +
* General Notes:
 +
**InputFile should not use <code>long int</code>.  Should instead use: <code>long long</code>
 +
* Error Handling Notes:
 +
**Anytime have an error could call handleError which would have a switch to return the error, throw exception, or abort.  Call it with an error code and a string.  Maybe an error handler class where you could use everywhere.  Each class would have a member of that class type that would contain that information.
 +
*Returning values of Strings Notes:
 +
** Problems with returning const char*
 +
*** If the pointer is stored when returned, it becomes invalid if the class modifies the underlying string.
 +
** Problems with passing in std::string& as a parameter to be set.
 +
*** people typically want to operate on the return of the method.
 +
** One idea was returning a reference to a string
 +
*** Does that solve the problem?  Won't the contents change when a new one is read?  Is that what we want?
 +
 
 +
 
 +
=== Useful Links ===
 +
BAM Library FAQs: http://genome.sph.umich.edu/wiki/SAM/BAM_Library_FAQs
 +
 
 +
Source Code: http://csg.sph.umich.edu//mktrost/doxygen/html/
 +
 
 +
Test code for setting values in the library: http://csg.sph.umich.edu//mktrost/doxygen/html/WriteFiles_8cpp-source.html
 +
 
 +
=== Topics for Discussion ===
 +
==== Return Statuses ====
 
Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure:
 
Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure:
 
<source lang="cpp">
 
<source lang="cpp">
Line 35: Line 69:  
   {
 
   {
 
       samOut.OpenForWrite(argv[2]);
 
       samOut.OpenForWrite(argv[2]);
 +
      samOut.WriteHeader(samHeader);
 
   }
 
   }
 
   catch(GenomeException e)
 
   catch(GenomeException e)
Line 40: Line 75:  
       std::cout << "Caught an Exception" << e.what() << std::endl;
 
       std::cout << "Caught an Exception" << e.what() << std::endl;
 
   }
 
   }
 +
  std::cout << "Continue Processing\n";
 
</source>
 
</source>
 
For caught exceptions, you would see the following and processing would continue:
 
For caught exceptions, you would see the following and processing would continue:
Line 45: Line 81:  
Caught Exception:
 
Caught Exception:
 
FAIL_IO: Failed to Open testFiles/unknown for writing
 
FAIL_IO: Failed to Open testFiles/unknown for writing
 +
Continue Processing
 
</pre>
 
</pre>
   Line 55: Line 92:  
</pre>
 
</pre>
   −
=== Review June 7th ===
+
 
 +
==== Accessing String Values ====
 +
SAM/BAM files have strings in them that people will want to read out.
 +
How should we handle this interface?
 +
Currently we do a mix of returning const char*, like:
 +
<source lang="cpp">
 +
const char* SamRecord::getSequence()
 +
{
 +
    myStatus = SamStatus::SUCCESS;
 +
    if(mySequence.Length() == 0)
 +
    {
 +
        // 0 Length, means that it is in the buffer, but has not yet
 +
        // been synced to the string, so do the sync.
 +
        setSequenceAndQualityFromBuffer();
 +
    }
 +
    return mySequence.c_str();
 +
}
 +
const std::string& SamRecord::getSequence()
 +
{
 +
    myStatus = SamStatus::SUCCESS;
 +
    if(mySequence.Length() == 0)
 +
    {
 +
        // 0 Length, means that it is in the buffer, but has not yet
 +
        // been synced to the string, so do the sync.
 +
        setSequenceAndQualityFromBuffer();
 +
    }
 +
    return &mySequence;
 +
}
 +
 
 +
</source>
 +
and passing in references to strings, like:
 +
<source lang="cpp">
 +
// Set the passed in string to the header line at the specified index.
 +
// It does NOT clear the current contents of header.
 +
// NOTE: some indexes will return blank if the entry was deleted.
 +
bool SamFileHeader::getHeaderLine(unsigned int index, std::string& header) const
 +
{
 +
    // Check to see if the index is in range of the header records vector.
 +
    if(index < myHeaderRecords.size())
 +
    {
 +
        // In range of the header records vector, so get the string for
 +
        // that record.
 +
        SamHeaderRecord* hdrRec = myHeaderRecords[index];
 +
        hdrRec->appendString(header);
 +
        return(true);
 +
    }
 +
    else
 +
    {
 +
        unsigned int commentIndex = index - myHeaderRecords.size();
 +
        // Check to see if it is in range of the comments.
 +
        if(commentIndex < myComments.size())
 +
        {
 +
            // It is in range of the comments, so add the type.
 +
            header += "@CO\t";
 +
            // Add the comment.
 +
            header += myComments[commentIndex];
 +
            // Add the new line.
 +
            header += "\n";
 +
            return(true);
 +
        }
 +
    }
 +
    // Invalid index.
 +
    return(false);
 +
}
 +
</source>
 +
 
 +
http://csg.sph.umich.edu//mktrost/doxygen/html/SamRecord_8h-source.html
 +
 
 +
==== SamFileHeader ====
 +
*Should this be renamed to SamHeader?
 +
*Do we like the classes being named starting with Sam?  Should it be Bam?
 +
 
 +
Should we add the following to SamFileHeader:
 +
<source lang="cpp">
 +
    //////////////////////////////////
 +
    // Set methods for header fields.
 +
    bool setVersion(const char* version);
 +
    bool setSortOrder(const char* sortOrder);
 +
    bool addSequenceName(const char* sequenceName);
 +
    bool setSequenceLength(const char* keyID, int sequenceLength);
 +
    bool setGenomeAssemblyId(const char* keyID, const char* genomeAssemblyId);
 +
    bool setMD5Checksum(const char* keyID, const char* md5sum);
 +
    bool setURI(const char* keyID, const char* uri);
 +
    bool setSpecies(const char* keyID, const char* species);
 +
    bool addReadGroupID(const char* readGroupID);
 +
    bool setSample(const char* keyID, const char* sample);
 +
    bool setLibrary(const char* keyID, const char* library);
 +
    bool setDescription(const char* keyID, const char* description);
 +
    bool setPlatformUnit(const char* keyID, const char* platform);
 +
    bool setPredictedMedianInsertSize(const char* keyID, const char* isize);
 +
    bool setSequencingCenter(const char* keyID, const char* center);
 +
    bool setRunDate(const char* keyID, const char* runDate);
 +
    bool setTechnology(const char* keyID, const char* technology);
 +
    bool addProgram(const char* programID);
 +
    bool setProgramVersion(const char* keyID, const char* version);
 +
    bool setCommandLine(const char* keyID, const char* commandLine);
 +
   
 +
    ///////////////////////////////////
 +
    // Get methods for header fields.
 +
    // Returns the number of SQ entries in the header.
 +
    int32_t getSequenceDictionaryCount();
 +
    // Return the Sort Order value that is set in the Header.
 +
    // If this field does not exist, "" is returned.
 +
    const char* getSortOrder();
 +
/// Additional gets for the rest of the fields.
 +
</source>
 +
Should these also be added to SamHeaderRG, SamHeaderSQ, etc as appropriate....
 +
 
 +
== Review June 7th ==
    
* <S>Move the examples from the SamFile wiki page to their own page</s>
 
* <S>Move the examples from the SamFile wiki page to their own page</s>
 
** <S>include links from the main library page and the SamFile page.</s>
 
** <S>include links from the main library page and the SamFile page.</s>
 
** <S>look into why the one example have two if checks on SamIn status</s> <span style="color:blue">- one was printing the result and one was setting the return value - cleaned up to be in one if statement.</span>
 
** <S>look into why the one example have two if checks on SamIn status</s> <span style="color:blue">- one was printing the result and one was setting the return value - cleaned up to be in one if statement.</span>
* Create 1 library for all of our library code rather than having libcsg, libbam, libfqf separated.
+
* <S>Create 1 library for all of our library code rather than having libcsg, libbam, libfqf separated.</s>
** What should this library be called?
+
** <S>What should this library be called?</s> <span style="color:blue">- Created library: libstatgen and reorganized into a new repository: statgen.</span>
*** libdna
+
*** <S>libdna</s>
*** libdna++
+
*** <S>libdna++</s>
*** libsequence++
+
*** <S>libsequence++</s>
*** libDNA
+
*** <S>libDNA</s>
*** libgenotype
+
*** <S>libgenotype</s>
 
* Add an option by class that says whether or not to abort on failure.  (or even an option on each method)
 
* Add an option by class that says whether or not to abort on failure.  (or even an option on each method)
 
** This allows calling code to set that option and then not have to check for failures since the code it calls would abort on a failure.
 
** This allows calling code to set that option and then not have to check for failures since the code it calls would abort on a failure.
96

edits

Navigation menu