Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,047 bytes added ,  14:44, 4 October 2011
add recovery section
Line 61: Line 61:  
  Sequence with Bases:  AATAACTAGATAGGG
 
  Sequence with Bases:  AATAACTAGATAGGG
 
  Sequence with Equals: AA======G===GGG
 
  Sequence with Equals: AA======G===GGG
 +
 +
= BAM File Recovery =
 +
 +
A BAM file that has been corrupted, or truncated due to a copy or disk problem can often be partially recovered.
 +
 +
Both the BGZF format and binary BAM format have enough information to scan forward and resynchronize the input data.  While some data will be lost, substantial recovery can often be done.
 +
 +
When a file has bad blocks in it, normal copy commands (cp) will truncate the file at the point of disk read failure.  To recover the maximum amount of data possible, use the dd command with the conv=noerror option.
 +
 +
So a normal use case for recovery would look this this:
 +
 +
<pre>
 +
# dd if=brokenbamfile.bam of=/tmp/brokenbamfile1.bam conv=noerror bs=4k
 +
# bam convert --recover --in /tmp/brokenbamfile1.bam --out /tmp/brokenbamfilerecovered.bam
 +
</pre>
 +
 +
Note, you will of course need to output the result file to a known good filesystem.
 +
 +
Currently, no statistics are printed as far as how many BAM records are recovered, but subsequent tests can readily be done on the resulting file to determine the quality of recovery.
    
= Usage =
 
= Usage =
75

edits

Navigation menu