Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,673 bytes added ,  16:41, 24 March 2010
no edit summary
Line 1: Line 1:  
= SAM/BAM File=
 
= SAM/BAM File=
 +
    
== Read & Write BAM/SAM Executable ==
 
== Read & Write BAM/SAM Executable ==
Line 9: Line 10:     
The software reads the beginning of the input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
 
The software reads the beginning of the input file to determine if it is SAM/BAM.  To determine the format (SAM/BAM) of the output file, the software checks the output file's extension.  If the extension is ".bam" it writes a BAM file, otherwise it writes a SAM file.
 +
    
== Reading/Writing SAM/BAM Files In Your Program ==
 
== Reading/Writing SAM/BAM Files In Your Program ==
 +
 +
=== Reading/Writing Standard SAM/BAM Files ===
 
The SamFile class allows a user to easily read/write a SAM/BAM file.
 
The SamFile class allows a user to easily read/write a SAM/BAM file.
 
The methods found in this class are:
 
The methods found in this class are:
Line 48: Line 52:  
|}
 
|}
   −
=== Usage Example ===
+
 
 +
==== Usage Example ====
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 
<pre>
 
<pre>
Line 100: Line 105:  
}
 
}
 
</pre>
 
</pre>
 +
 +
 +
=== Reading Indexed (and Sorted) BAM Files ===
 +
The IndexedBamReader class allows a user to easily read BAM files that are sorted and indexed.
 +
This class allows a user to read only alignments for specific reference sequence.  This saves the time of having to read the entire file.
 +
It takes advantage of the seeking capability of BGZF files, using the BAM Index file to determine where in the BAM file to seek to.
 +
 +
'''Future Enhancements''': Add the ability to read alignments that match a given start, end position for a specific reference sequence.
 +
The methods found in this class are:
 +
{| class="wikitable" style="width:100%" border="1"
 +
|+ style="font-size:150%"|'''SamFile Class Methods'''
 +
!  width=""|Method Name
 +
!  width=""|Description
 +
|- 
 +
| bool OpenForRead(const char* bamFilename, const char* bamIndexFilename)
 +
| Opens the bam file for reading and reads in the corresponding index file.
 +
Returns true if successfully opened, false if not.
 +
|-
 +
| bool SetReadSection(int32_t refID)
 +
| Tell the class which reference ID should be read from the BAM file.  The records for that reference id will be retrieved on each ReadRecord call.  When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.
 +
Pass in -1 in order to read the section of the bam file not associated with any reference ID.
 +
Returns true if the read section was successfully set, false if not.
 +
|-
 +
| bool ReadRecord(SamFileHeader& header, SamRecord& record)
 +
| Reads the next record from the file and stores it in the passed in record.  Only alignments in the section specified by SetReadSection are read.  If they have all already been read, this method returns false.  If SetReadSection has not been called, then the entire file is read.
 +
Returns true if successfully read, false if not.
 +
|-
 +
| bool IsEOF()
 +
| While this is available, if you are only reading a specific reference sequence, you may never hit the end of the file, so be careful using this method.
 +
bool: true if the end of file has been reached, false if not.
 +
|-
 +
| bool OpenForRead(const char* filename)
 +
| This method exists, but does not do anything and just returns false.  You cannot open a file without an associated index file.
 +
Returns false.
 +
|-
 +
| bool OpenForWrite(const char * filename)
 +
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 +
Returns false.
 +
|-
 +
| bool ReadHeader(SamFileHeader& header)
 +
| Reads the header section from the file and stores it in the passed in header.
 +
Returns true if successfully read, false if not.
 +
|-
 +
| bool WriteHeader(const SamFileHeader& header)
 +
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 +
Returns false.
 +
|-
 +
| bool WriteRecord(SamFileHeader& header, SamRecord& record)
 +
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 +
Returns false.
 +
|}
 +
 +
 +
==== Usage Example ====
 +
This example reads in the inputFilename bam file and writes it back out section by section to the specified outputFilename, starting with section -1.  It also prints a count of the number of records in each section.
 +
<pre>
 +
int ReadIndexedBam(const char* inputFilename,
 +
                  const char* outputFilename,
 +
                  const char* indexFilename)
 +
{
 +
  IndexedBamReader samIn;
 +
     
 +
  samIn.OpenForRead(inputFilename, indexFilename);
 +
 +
  SamFile samOut;
 +
 +
  samOut.OpenForWrite(outputFilename);
 +
 +
  // Read the sam header.
 +
  SamFileHeader samHeader;
 +
  samIn.ReadHeader(samHeader);
 +
  // Write the sam header.
 +
  samOut.WriteHeader(samHeader);
 +
 +
  SamRecord samRecord;
 +
 
 +
  int numValidRecords = 0;
 +
  int numRecords = 0;
 +
 +
  // Loop through each Reference.
 +
  for(int i = -1; i < 23; i++)
 +
  {
 +
      int numSectionRecords = 0;
 +
      samIn.SetReadSection(i);
 +
      // Keep reading records until they aren't read successfully.
 +
      while(samIn.ReadRecord(samHeader, samRecord) == true)
 +
      {
 +
        numSectionRecords++;
 +
        numRecords++;
 +
        // Successfully read a record from the file, so check to see
 +
        // if it is valid.
 +
        if(samRecord.isValid())
 +
        {
 +
            //  It is valid, so write it.
 +
            numValidRecords++;
 +
            samOut.WriteRecord(samHeader, samRecord);
 +
        }
 +
      }
 +
      std::cout << "Reference ID " << i << " has " << numSectionRecords
 +
                << " records" << std::endl;
 +
  }
 +
     
 +
  std::cout << "Number of records = " << numRecords << std::endl;
 +
  std::cout << "Number of valid records = " << numValidRecords << std::endl;
 +
 +
  return(0);
 +
}
 +
</pre>
 +
    
== Setting fields in a SAM/BAM Header ==
 
== Setting fields in a SAM/BAM Header ==

Navigation menu