SAM/BAM Convert Sequence

From Genome Analysis Wiki
Revision as of 18:48, 7 December 2010 by Mktrost (talk | contribs)
Jump to navigationJump to search

SAM/BAM support conversion between '=' and the base in a sequence

Request: To be able to call SamRecord::getSequence and have it return the sequence with '=' or with the correct bases.

Requester: Goncalo Abecasis & Hyun Min Kang

Date Requested: Nov 23, 2010?

Notes

It was requested to have this handled within the getSequence call so the user doesn't have to convert the sequence after calling getSequence.

Proposed Solution

....More details coming soon!

USER API UPDATES

Possibly useful to a User

Added a class, SamQuerySeqWithRef, with static methods seqWithEquals and seqWithoutEquals. SamRecord uses these methods to

// Gets the sequence with '=' in any position where the sequence matches the reference.
// NOTE: 'N' in both the sequence and the reference is not considered a match.
// Parameters:
//    	currentSeq 	sequence that should be converted
//    	seq0BasedPos 	0 based start position of currentSeq on the reference.
//   	cigar 	cigar string for currentSeq (used for determining how the sequence aligns to the reference)
//    	referenceName 	reference name associated with this sequence
//    	refSequence 	reference sequence object
//    	updatedSeq 	return parameter that this method sets to the current sequence, replacing any matches to the reference with '='.
void SamQuerySeqWithRef::seqWithEquals(const char * 		currentSeq,
                                       int32_t 	 		seq0BasedPos,
                                       Cigar &  		cigar,
                                       const char *  		referenceName,
                                       const GenomeSequence &  	refSequence,
                                       std::string &  		updatedSeq)

// Gets the sequence converting '=' to the appropriate base using the reference.
// Parameters:
//    	currentSeq 	sequence that should be converted
//    	seq0BasedPos 	0 based start position of currentSeq on the reference.
//   	cigar 	cigar string for currentSeq (used for determining how the sequence aligns to the reference)
//    	referenceName 	reference name associated with this sequence
//    	refSequence 	reference sequence object
//    	updatedSeq 	return parameter that this method sets to the current sequence, replacing any "=" with the base from the reference.
void SamQuerySeqWithRef::seqWithoutEquals(const char * 		currentSeq,
                                       int32_t 	 		seq0BasedPos,
                                       Cigar &  		cigar,
                                       const char *  		referenceName,
                                       const GenomeSequence &  	refSequence,
                                       std::string &  		updatedSeq)

INTERNAL MODIFICATIONS

  • Updated GenericSamInterface, BamInterface, and SamInterface to take TranslationType as a parameter to writeRecord.
    • SamInterface::writeRecord passes the translation parameter to SamRecord::getSequence when it obtains the sequence to write into the SAM file.