Line 1: |
Line 1: |
− | === Review Sept 17th === | + | [[Category:libStatGen]] |
− | ==== Review Discussion Topics ==== | + | [[Category:libStatGen BAM]] |
− | ===== Return Statuses ===== | + | |
| + | == Review Sept 20th == |
| + | === Notes === |
| + | * returning const char* |
| + | * SamFileHeader change referenceContigs, etc to private from public |
| + | * Add way to copy a SAM record. |
| + | |
| + | == Review Sept 17th == |
| + | === Topics Discussed === |
| + | * [[#Return Statuses|Checking if methods succeeded/failed (checking return values/return statuses)]] |
| + | * [[#Accessing String Values|Strings as return values]] |
| + | |
| + | === NOTES From Meeting === |
| + | * General Notes: |
| + | **InputFile should not use <code>long int</code>. Should instead use: <code>long long</code> |
| + | * Error Handling Notes: |
| + | **Anytime have an error could call handleError which would have a switch to return the error, throw exception, or abort. Call it with an error code and a string. Maybe an error handler class where you could use everywhere. Each class would have a member of that class type that would contain that information. |
| + | *Returning values of Strings Notes: |
| + | ** Problems with returning const char* |
| + | *** If the pointer is stored when returned, it becomes invalid if the class modifies the underlying string. |
| + | ** Problems with passing in std::string& as a parameter to be set. |
| + | *** people typically want to operate on the return of the method. |
| + | ** One idea was returning a reference to a string |
| + | *** Does that solve the problem? Won't the contents change when a new one is read? Is that what we want? |
| + | |
| + | |
| + | === Useful Links === |
| + | BAM Library FAQs: http://genome.sph.umich.edu/wiki/SAM/BAM_Library_FAQs |
| + | |
| + | Source Code: http://csg.sph.umich.edu//mktrost/doxygen/html/ |
| + | |
| + | Test code for setting values in the library: http://csg.sph.umich.edu//mktrost/doxygen/html/WriteFiles_8cpp-source.html |
| + | |
| + | === Topics for Discussion === |
| + | ==== Return Statuses ==== |
| Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure: | | Currently anytime you do anything on a SAM/BAM file, you have to check the status for failure: |
| <source lang="cpp"> | | <source lang="cpp"> |
Line 58: |
Line 92: |
| </pre> | | </pre> |
| | | |
− | ===== SamFileHeader ===== | + | |
| + | ==== Accessing String Values ==== |
| + | SAM/BAM files have strings in them that people will want to read out. |
| + | How should we handle this interface? |
| + | Currently we do a mix of returning const char*, like: |
| + | <source lang="cpp"> |
| + | const char* SamRecord::getSequence() |
| + | { |
| + | myStatus = SamStatus::SUCCESS; |
| + | if(mySequence.Length() == 0) |
| + | { |
| + | // 0 Length, means that it is in the buffer, but has not yet |
| + | // been synced to the string, so do the sync. |
| + | setSequenceAndQualityFromBuffer(); |
| + | } |
| + | return mySequence.c_str(); |
| + | } |
| + | const std::string& SamRecord::getSequence() |
| + | { |
| + | myStatus = SamStatus::SUCCESS; |
| + | if(mySequence.Length() == 0) |
| + | { |
| + | // 0 Length, means that it is in the buffer, but has not yet |
| + | // been synced to the string, so do the sync. |
| + | setSequenceAndQualityFromBuffer(); |
| + | } |
| + | return &mySequence; |
| + | } |
| + | |
| + | </source> |
| + | and passing in references to strings, like: |
| + | <source lang="cpp"> |
| + | // Set the passed in string to the header line at the specified index. |
| + | // It does NOT clear the current contents of header. |
| + | // NOTE: some indexes will return blank if the entry was deleted. |
| + | bool SamFileHeader::getHeaderLine(unsigned int index, std::string& header) const |
| + | { |
| + | // Check to see if the index is in range of the header records vector. |
| + | if(index < myHeaderRecords.size()) |
| + | { |
| + | // In range of the header records vector, so get the string for |
| + | // that record. |
| + | SamHeaderRecord* hdrRec = myHeaderRecords[index]; |
| + | hdrRec->appendString(header); |
| + | return(true); |
| + | } |
| + | else |
| + | { |
| + | unsigned int commentIndex = index - myHeaderRecords.size(); |
| + | // Check to see if it is in range of the comments. |
| + | if(commentIndex < myComments.size()) |
| + | { |
| + | // It is in range of the comments, so add the type. |
| + | header += "@CO\t"; |
| + | // Add the comment. |
| + | header += myComments[commentIndex]; |
| + | // Add the new line. |
| + | header += "\n"; |
| + | return(true); |
| + | } |
| + | } |
| + | // Invalid index. |
| + | return(false); |
| + | } |
| + | </source> |
| + | |
| + | http://csg.sph.umich.edu//mktrost/doxygen/html/SamRecord_8h-source.html |
| + | |
| + | ==== SamFileHeader ==== |
| *Should this be renamed to SamHeader? | | *Should this be renamed to SamHeader? |
| *Do we like the classes being named starting with Sam? Should it be Bam? | | *Do we like the classes being named starting with Sam? Should it be Bam? |
| | | |
− | === Review June 7th === | + | Should we add the following to SamFileHeader: |
| + | <source lang="cpp"> |
| + | ////////////////////////////////// |
| + | // Set methods for header fields. |
| + | bool setVersion(const char* version); |
| + | bool setSortOrder(const char* sortOrder); |
| + | bool addSequenceName(const char* sequenceName); |
| + | bool setSequenceLength(const char* keyID, int sequenceLength); |
| + | bool setGenomeAssemblyId(const char* keyID, const char* genomeAssemblyId); |
| + | bool setMD5Checksum(const char* keyID, const char* md5sum); |
| + | bool setURI(const char* keyID, const char* uri); |
| + | bool setSpecies(const char* keyID, const char* species); |
| + | bool addReadGroupID(const char* readGroupID); |
| + | bool setSample(const char* keyID, const char* sample); |
| + | bool setLibrary(const char* keyID, const char* library); |
| + | bool setDescription(const char* keyID, const char* description); |
| + | bool setPlatformUnit(const char* keyID, const char* platform); |
| + | bool setPredictedMedianInsertSize(const char* keyID, const char* isize); |
| + | bool setSequencingCenter(const char* keyID, const char* center); |
| + | bool setRunDate(const char* keyID, const char* runDate); |
| + | bool setTechnology(const char* keyID, const char* technology); |
| + | bool addProgram(const char* programID); |
| + | bool setProgramVersion(const char* keyID, const char* version); |
| + | bool setCommandLine(const char* keyID, const char* commandLine); |
| + | |
| + | /////////////////////////////////// |
| + | // Get methods for header fields. |
| + | // Returns the number of SQ entries in the header. |
| + | int32_t getSequenceDictionaryCount(); |
| + | // Return the Sort Order value that is set in the Header. |
| + | // If this field does not exist, "" is returned. |
| + | const char* getSortOrder(); |
| + | /// Additional gets for the rest of the fields. |
| + | </source> |
| + | Should these also be added to SamHeaderRG, SamHeaderSQ, etc as appropriate.... |
| + | |
| + | == Review June 7th == |
| | | |
| * <S>Move the examples from the SamFile wiki page to their own page</s> | | * <S>Move the examples from the SamFile wiki page to their own page</s> |
| ** <S>include links from the main library page and the SamFile page.</s> | | ** <S>include links from the main library page and the SamFile page.</s> |
| ** <S>look into why the one example have two if checks on SamIn status</s> <span style="color:blue">- one was printing the result and one was setting the return value - cleaned up to be in one if statement.</span> | | ** <S>look into why the one example have two if checks on SamIn status</s> <span style="color:blue">- one was printing the result and one was setting the return value - cleaned up to be in one if statement.</span> |
− | * Create 1 library for all of our library code rather than having libcsg, libbam, libfqf separated. | + | * <S>Create 1 library for all of our library code rather than having libcsg, libbam, libfqf separated.</s> |
− | ** What should this library be called? | + | ** <S>What should this library be called?</s> <span style="color:blue">- Created library: libstatgen and reorganized into a new repository: statgen.</span> |
− | *** libdna | + | *** <S>libdna</s> |
− | *** libdna++ | + | *** <S>libdna++</s> |
− | *** libsequence++ | + | *** <S>libsequence++</s> |
− | *** libDNA | + | *** <S>libDNA</s> |
− | *** libgenotype | + | *** <S>libgenotype</s> |
| * Add an option by class that says whether or not to abort on failure. (or even an option on each method) | | * Add an option by class that says whether or not to abort on failure. (or even an option on each method) |
| ** This allows calling code to set that option and then not have to check for failures since the code it calls would abort on a failure. | | ** This allows calling code to set that option and then not have to check for failures since the code it calls would abort on a failure. |