Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,283 bytes added ,  17:19, 7 December 2010
no edit summary
Line 18: Line 18:  
<p>
 
<p>
 
The code needs to figure out the strand and reverse compliment the reverse strands.
 
The code needs to figure out the strand and reverse compliment the reverse strands.
 +
<p>
 +
One file for the first in pair & 1 file for the 2nd in the pair - the order in the two files must match.
 
<p>
 
<p>
 
Reverse complimenting means:
 
Reverse complimenting means:
Line 27: Line 29:  
** It would also error if the read is not paired.
 
** It would also error if the read is not paired.
 
*Later Release: work on unsorted BAM files.
 
*Later Release: work on unsorted BAM files.
 +
** Prefer to sort at the same time as writing the FASTQ files rather than 1 step to sort and a 2nd step to write the FASTQs.
 +
** Would be useful to have something implemented within the library (would be useful for dedupping, etc, but might be tricky to implement as API - sometimes the pair may be far apart.
 +
*** maybe something like SamFile::getNextReadPair or SamFileHelper::getNextReadPair due to bookkeeping, may be useful to separate it out from the SamFile - either would return handle the logic and return a pair of records
 +
*** At some point may have to start writing a file.
 +
*** could attempt to just store the readname and FilePosition and use random access to jump around when a pair is found (but that would be inefficient if they are close) - and it would depend on how big the file is to whether or not readname & filePosition would still be storing too much information
 +
*** A two scan approach on the original BAM may be the best
 +
*Separate suggestion: implement is a smart pileup - which retains a clone of SamRecord until you see the mate pair
 +
** useful in the dedupper and variant caller and etc but we probably need to discuss if we decide to implement it
 +
    
=== Proposed Solution ===
 
=== Proposed Solution ===

Navigation menu