Line 23: |
Line 23: |
| | | |
| Documentation on the '''SamFile''' class can be found at [[C++ Class: SamFile]]. | | Documentation on the '''SamFile''' class can be found at [[C++ Class: SamFile]]. |
| + | |
| | | |
| == SAM/BAM Header == | | == SAM/BAM Header == |
Line 29: |
Line 30: |
| | | |
| | | |
− | == Setting fields in a SAM/BAM Record == | + | == SAM/BAM Record == |
− | The '''SamRecord''' class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record.
| |
− | The methods found in the '''SamRecord''' class for setting fields are:
| |
− | {| class="wikitable" style="width:100%" border="1"
| |
− | |+ style="font-size:150%"|'''SamRecord Class Methods'''
| |
− | ! width=""|Method Name
| |
− | ! width=""|Description
| |
− | |-
| |
− | | void resetRecord()
| |
− | | Resets the record to be an empty record. This is not necessary when you are reading a Sam/Bam file, but if you are setting fields, it is a good idea to clean out a record before reusing it. Clearing it allows you to not have to set any empty fields.
| |
− | |-
| |
− | | bool setReadName(const char* readName)
| |
− | | Sets QNAME to the passed in name.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setFlag(int16_t flag)
| |
− | | Sets the bitwise FLAG to the passed in value.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setReferenceName(SamFileHeader& header, const char* referenceName)
| |
− | | Sets the reference sequence name. The reference id is calculated using the header.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool set1BasedPosition(int32_t position)
| |
− | | Sets the leftmost position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool set0BasedPosition(int32_t position)
| |
− | | Sets the leftmost position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | |bool setMapQuality(int8_t mapQuality)
| |
− | | Sets the mapping quality.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setCigar(const char* cigar)
| |
− | | Sets the cigar string to the passed in CIGAR. This is a SAM formatted CIGAR string. Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setMateReferenceName(SamFileHeader& header, const char* referenceName)
| |
− | | Sets the mate reference sequence name. The mate reference id is calculated using the header.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool set1BasedMatePosition(int32_t matePosition)
| |
− | | Sets the leftmost mate position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool set0BasedMatePosition(int32_t matePosition)
| |
− | | Sets the leftmost mate position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setInsertSize(int32_t insertSize)
| |
− | | Sets the inferred insert size.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setSequence(const char* seq)
| |
− | | Sets the sequence string to the passed in string. This is a SAM formatted sequence string. Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool setQuality(const char* quality)
| |
− | | Sets the quality string to the passed in string. This is a SAM formatted quality string. Internal processing handles switching between SAM/BAM formats when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |-
| |
− | | bool addTag(const char* tag, char vtype, const char* value)
| |
− | | Adds a tag to the record with the specified tag, vtype, and value. Vtype can be SAM/BAM vtype. Internal processing handles switching between SAM/BAM vtypes when read/written.
| |
− | Returns true if successfully set, false if not.
| |
− | |}
| |
− | | |
− | When set, SAM fields are validated against: [[SAM Validation Criteria]]
| |
− | | |
− | | |
− | == Retrieving fields from a SAM/BAM Record ==
| |
− | The '''SamRecord''' class contains accessors to access the fields of a SAM/BAM record. They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord. Not all of the values that can be retrieved using these get accessors have set methods. That is because they are internally calculated values if they were not read from a file.
| |
− | | |
− | The methods found in the SamRecord class for setting fields are:
| |
− | {| class="wikitable" style="width:100%" border="1"
| |
− | |+ style="font-size:150%"|'''SamRecord Class Get Methods'''
| |
− | ! width=""|Method Name
| |
− | ! width=""|Description
| |
− | |-
| |
− | | bool isValid(SamFileHeader& header)
| |
− | | Returns true if the record is valid. This performs validation steps. TODO: the method exists, but it does not yet perform any checks, so just returns true.
| |
− | |-
| |
− | | int32_t getBlockSize()
| |
− | | Returns the BAM block size of the record.
| |
− | |-
| |
− | | const char* getReferenceName(SamFileHeader& header)
| |
− | | Returns the reference sequence name (SAM format).
| |
− | |-
| |
− | | int32_t getReferenceID()
| |
− | | Returns the reference sequence ID (BAM format).
| |
− | |-
| |
− | | int32_t get1BasedPosition()
| |
− | | Returns the 1-based (SAM formatted) leftmost position.
| |
− | |-
| |
− | | int32_t get0BasedPosition()
| |
− | | Returns the 0-based (BAM formatted) leftmost position.
| |
− | |-
| |
− | | int8_t getReadNameLength()
| |
− | | Returns the length of the ReadName (QNAME).
| |
− | |-
| |
− | | int8_t getMapQuality()
| |
− | | Returns the map quality.
| |
− | |-
| |
− | | int16_t getBin()
| |
− | | Returns the BAM bin for the record.
| |
− | |-
| |
− | | int16_t getCigarLength()
| |
− | | Returns the length of the CIGAR in BAM format.
| |
− | |-
| |
− | | int16_t getFlag()
| |
− | | Returns the flag.
| |
− | |-
| |
− | | int32_t getReadLength()
| |
− | | Returns the length of the read.
| |
− | |-
| |
− | | const char* getMateReferenceName(SamFileHeader& header)
| |
− | | Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name even if it is the same as the reference sequence name.
| |
− | |-
| |
− | | const char* getMateReferenceNameOrEqual(SamFileHeader& header)
| |
− | | Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned..
| |
− | |-
| |
− | | int32_t getMateReferenceID()
| |
− | | Returns the mate reference sequence id (BAM format).
| |
− | |-
| |
− | | int32_t get1BasedMatePosition()
| |
− | | Returns the 1-based (SAM formatted) mate leftmost position.
| |
− | |-
| |
− | | int32_t get0BasedMatePosition()
| |
− | | Returns the 0-based (BAM formatted) mate leftmost position.
| |
− | |-
| |
− | | int32_t getInsertSize()
| |
− | | Returns the insert size.
| |
− | |-
| |
− | | int32_t get0BasedAlignmentEnd();
| |
− | | Returns the 0-based inclusive right-most position of the clipped sequence.
| |
− | |-
| |
− | | int32_t get1BasedAlignmentEnd();
| |
− | | Returns the 1-based inclusive right-most position of the clipped sequence.
| |
− | |-
| |
− | | int32_t get0BasedUnclippedStart();
| |
− | | Returns the 0-based inclusive left-most position adjusted for clipped bases.
| |
− | |-
| |
− | | int32_t get1BasedUnclippedStart();
| |
− | | Returns the 1-based inclusive left-most position adjusted for clipped bases.
| |
− | |-
| |
− | | int32_t get0BasedUnclippedEnd();
| |
− | | Returns the 0-based inclusive right-most position adjusted for clipped bases.
| |
− | |-
| |
− | | int32_t get1BasedUnclippedEnd();
| |
− | | Returns the 1-based inclusive right-most position adjusted for clipped bases.
| |
− | |-
| |
− | | const char* getReadName()
| |
− | | Returns the SAM formatted Read Name (QNAME).
| |
− | |-
| |
− | | const char* getCigar()
| |
− | | Returns the SAM formatted CIGAR string.
| |
− | |-
| |
− | | const char* getSequence()
| |
− | | Returns the SAM formatted Sequence string.
| |
− | |-
| |
− | | const char* getQuality()
| |
− | | Returns the SAM formatted Quality string.
| |
− | |-
| |
− | | bool getNextSamTag(char* tag, char& vtype, void** value)
| |
− | | Returns true if a tag was read, false if there are no more tags.
| |
− | For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String.
| |
− | |-
| |
− | | bool isIntegerType(char vtype)
| |
− | | Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type.
| |
− | |-
| |
− | | bool isDoubleType(char vtype)
| |
− | | Returns true if the passed in vtype is of double ('f') type.
| |
− | |-
| |
− | | bool isCharType(char vtype)
| |
− | | Returns true if the passed in vtype is of char ('A') type.
| |
− | |-
| |
− | | bool isStringType(char vtype)
| |
− | | Returns true if the passed in vtype is of String ('Z') type.
| |
− | |-
| |
− | |-
| |
− | |}
| |
− | | |
− | | |
− | Example of using getNextSamTag:
| |
− | <source lang="cpp">
| |
− | // record is a previously setup SamRecord.
| |
− | String recordString = "";
| |
− | char tag[3];
| |
− | char vtype;
| |
− | void* value;
| |
− | | |
− | // While there are more tags, write them to the recordString.
| |
− | while(record.getNextSamTag(tag, vtype, &value) != false)
| |
− | {
| |
− | recordString += "\t";
| |
− | recordString += tag;
| |
− | recordString += ":";
| |
− | recordString += vtype;
| |
− | recordString += ":";
| |
− | if(record.isIntegerType(vtype))
| |
− | {
| |
− | recordString += (int)*(int*)value;
| |
− | }
| |
− | else if(record.isDoubleType(vtype))
| |
− | {
| |
− | recordString += (double)*(double*)value;
| |
− | }
| |
− | else if(record.isCharType(vtype))
| |
− | {
| |
− | recordString += (char)*(char*)value;
| |
− | }
| |
− | else
| |
− | {
| |
− | // String type.
| |
− | recordString += (String)*(String*)value;
| |
− | }
| |
− | }
| |
− | | |
− | recordString += "\n";
| |
− | </source>
| |
| | | |
| + | Documentation on reading/writing a SAM/BAM Record can be found at [[C++ Class: SamRecord]]. |
| | | |
| == Suggested Improvements/Features == | | == Suggested Improvements/Features == |
− | * Add optional user flag for checking if a file is sorted when it is read. It would report an error if it is unsorted.
| |