Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,889 bytes added ,  10:16, 17 September 2010
no edit summary
Line 1: Line 1:  
=== Review Sept 17th ===
 
=== Review Sept 17th ===
 
==== Review Discussion Topics ====
 
==== Review Discussion Topics ====
 +
http://genome.sph.umich.edu/wiki/SAM/BAM_Library_FAQs
 +
http://www.sph.umich.edu/csg/mktrost/doxygen/html/
 +
 +
Example of using the library to set values: http://www.sph.umich.edu/csg/mktrost/doxygen/html/WriteFiles_8cpp-source.html
 
===== Return Statuses =====
 
===== Return Statuses =====
 
Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure:
 
Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure:
Line 57: Line 61:  
Aborted
 
Aborted
 
</pre>
 
</pre>
 +
 +
 +
===== Accessing String Values =====
 +
SAM/BAM files have strings in them that people will want to read out.
 +
How should we handle this interface?
 +
Currently we do a mix of returning const char*, like:
 +
<source lang="cpp">
 +
const char* SamRecord::getSequence()
 +
{
 +
    myStatus = SamStatus::SUCCESS;
 +
    if(mySequence.Length() == 0)
 +
    {
 +
        // 0 Length, means that it is in the buffer, but has not yet
 +
        // been synced to the string, so do the sync.
 +
        setSequenceAndQualityFromBuffer();
 +
    }
 +
    return mySequence.c_str();
 +
}
 +
</source>
 +
and passing in references to strings, like:
 +
<source lang="cpp">
 +
// Set the passed in string to the header line at the specified index.
 +
// It does NOT clear the current contents of header.
 +
// NOTE: some indexes will return blank if the entry was deleted.
 +
bool SamFileHeader::getHeaderLine(unsigned int index, std::string& header) const
 +
{
 +
    // Check to see if the index is in range of the header records vector.
 +
    if(index < myHeaderRecords.size())
 +
    {
 +
        // In range of the header records vector, so get the string for
 +
        // that record.
 +
        SamHeaderRecord* hdrRec = myHeaderRecords[index];
 +
        hdrRec->appendString(header);
 +
        return(true);
 +
    }
 +
    else
 +
    {
 +
        unsigned int commentIndex = index - myHeaderRecords.size();
 +
        // Check to see if it is in range of the comments.
 +
        if(commentIndex < myComments.size())
 +
        {
 +
            // It is in range of the comments, so add the type.
 +
            header += "@CO\t";
 +
            // Add the comment.
 +
            header += myComments[commentIndex];
 +
            // Add the new line.
 +
            header += "\n";
 +
            return(true);
 +
        }
 +
    }
 +
    // Invalid index.
 +
    return(false);
 +
}
 +
</source>
 +
 +
http://www.sph.umich.edu/csg/mktrost/doxygen/html/SamRecord_8h-source.html
    
===== SamFileHeader =====
 
===== SamFileHeader =====
 
*Should this be renamed to SamHeader?
 
*Should this be renamed to SamHeader?
 
*Do we like the classes being named starting with Sam?  Should it be Bam?
 
*Do we like the classes being named starting with Sam?  Should it be Bam?
 +
 +
Should we add the following to SamFileHeader:
 +
<source lang="cpp">
 +
    //////////////////////////////////
 +
    // Set methods for header fields.
 +
    bool setVersion(const char* version);
 +
    bool setSortOrder(const char* sortOrder);
 +
    bool addSequenceName(const char* sequenceName);
 +
    bool setSequenceLength(const char* keyID, int sequenceLength);
 +
    bool setGenomeAssemblyId(const char* keyID, const char* genomeAssemblyId);
 +
    bool setMD5Checksum(const char* keyID, const char* md5sum);
 +
    bool setURI(const char* keyID, const char* uri);
 +
    bool setSpecies(const char* keyID, const char* species);
 +
    bool addReadGroupID(const char* readGroupID);
 +
    bool setSample(const char* keyID, const char* sample);
 +
    bool setLibrary(const char* keyID, const char* library);
 +
    bool setDescription(const char* keyID, const char* description);
 +
    bool setPlatformUnit(const char* keyID, const char* platform);
 +
    bool setPredictedMedianInsertSize(const char* keyID, const char* isize);
 +
    bool setSequencingCenter(const char* keyID, const char* center);
 +
    bool setRunDate(const char* keyID, const char* runDate);
 +
    bool setTechnology(const char* keyID, const char* technology);
 +
    bool addProgram(const char* programID);
 +
    bool setProgramVersion(const char* keyID, const char* version);
 +
    bool setCommandLine(const char* keyID, const char* commandLine);
 +
   
 +
    ///////////////////////////////////
 +
    // Get methods for header fields.
 +
    // Returns the number of SQ entries in the header.
 +
    int32_t getSequenceDictionaryCount();
 +
    // Return the Sort Order value that is set in the Header.
 +
    // If this field does not exist, "" is returned.
 +
    const char* getSortOrder();
 +
/// Additional gets for the rest of the fields.
 +
</source>
 +
Should these also be added to SamHeaderRG, SamHeaderSQ, etc as appropriate....
    
=== Review June 7th ===
 
=== Review June 7th ===

Navigation menu