Merge two binary image files by boolean OR (ddrescue output filename mistake)

Merge two binary image files by boolean OR (ddrescue output filename mistake)

I made a silly mistake by using the wrong output filename when resuming a ddrescue. This is what happened:

ddrescue -b 2048 -d -v /dev/sr1 IDTa.img IDTa.ddrescue.log 

Then the computer crashed and I mistakenly resumed with:

ddrescue -b 2048 -d -v /dev/sr1 IDTa.iso IDTa.ddrescue.log

I gather that both image files will start off all zeroed, so I guess that if I were to boolean OR both files together then the result would be what ddrescue would have output if I had not made the mistake?

The files are not continuations of one another (like How can I merge two ddrescue images?) since I had already run ddrescue -n previously, which completed successfully. i.e. IDTa.img contains most of the data, IDTa.iso contains scattered blocks from all over the image (and those blocks would be zero in IDTa.img).

Is there a simple CLI way to do this? I could prob do this in C, but I'm very rusty! Also might be a nice first exercise in Python, which I've never got round to learning! Nevertheless, don't particularly want to reinvent the wheel if something out there already exists. Not too fussed about performance.

Update: (apologies if this is the wrong place to put a reply to an answer. The 'comment' option seems to be too allow too few characters, so I'm replying here!)

I have also tried ddrescue with '--fill-mode=?' as a solution to the above, but it did not work. This is what I did:

ddrescue --generate-mode -b 2048 -v /dev/sr1 IDTa.img IDTa.img.log
cp IDTa.img IDTa.img.backup
ddrescue '--fill-mode=?' -b 2048 -v IDTa.iso IDTa.img IDTa.img.log 

To check, I looked for the first position that IDTa.iso has data:

hexdump -C IDTa.iso |less

the output was:

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
001da800  00 00 01 ba 21 00 79 f3  09 80 10 69 00 00 01 e0  |....!.y....i....|
...
001db000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
...

I looked up 001da800 in IDTa.img:

hexdump -C IDTa.img |less
/001da800

Output:

001da800  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
001db000  00 00 01 ba 21 00 7b 00  bf 80 10 69 00 00 01 e0  |....!.{....i....|
...

So, the data at position 001da800 has not copied over from file IDTa.iso to IDTa.img?

Checking IDTa.img.log:

# Mapfile. Created by GNU ddrescue version 1.22
# Command line: ddrescue --fill-mode=? -b 2048 -v IDTa.iso IDTa.img IDTa.img.log
# Start time:   2021-06-28 13:52:39
# Current time: 2021-06-28 13:52:46
# Finished
# current_pos  current_status  current_pass
0x299F2000     +               1
#      pos        size  status
0x00000000  0x00008000  ?
0x00008000  0x001D2800  +
0x001DA800  0x00000800  ?
0x001DB000  0x00049000  +
...

and a reality check:

diff -q IDTa.img IDTa.img.backup

returns no difference.

Update 2:

@Kamil edited the solution (see below) by dropping the --fill-mode=? argument. Appears to work!

Antwort1

I think this can be done with ddrescue itself. You need --generate-mode.

When ddrescue is invoked with the option --generate-mode it operates in "generate mode", which is different from the default "rescue mode". That is, in "generate mode" ddrescue does not rescue anything. It only tries to generate a mapfile for later use.

[…]

ddrescue can in some cases generate an approximate mapfile, from infile and the (partial) copy in outfile, that is almost as good as an exact mapfile. It makes this by simply assuming that sectors containing all zeros were not rescued.

[…]

ddrescue --generate-mode infile outfile mapfile

(source)

Make copies of the two images, just in case. If your filesystem supports CoW-copy then use cp --reflink=always for each image to make copies virtually instantly.

You need to make sure the two images are of equal size. If one of them is smaller then it should be enlarged, i.e. zeros (possibly sparse zeros) should be appended. This code will do this automatically (truncate is required):

( f1=IDTa.img
  f2=IDTa.iso
  s1="$(wc -c <"$f1")"
  s2="$(wc -c <"$f2")"
  if [ "$s2" -gt "$s1" ]; then
    truncate -s "$s2" "$f1"
  else
    truncate -s "$s1" "$f2"
  fi
)

(I used a subshell so variables die with it and the main shell is unaffected.)

Now let the tool analyze your first image and find out which sectors were probably not rescued:

ddrescue --generate-mode -b 2048 -v /dev/sr1 IDTa.img new_mapfile

Note new_mapfile here is a new file, not your IDTa.ddrescue.log. Do not touch IDTa.ddrescue.log.

After new_mapfile is generated, lines in it should show status + or ?, depending on if the corresponding fragment was considered "rescued" or "non-tried".

Now you want to fill the allegedly "non-tried" block of IDTa.img with data from IDTa.iso. The next command will modify IDTa.img.

Rescue the allegedly "non-tried" block of IDTa.img by reading data from IDTa.iso:

ddrescue -b 2048 -v IDTa.iso IDTa.img new_mapfile

Now the modified IDTa.img along with the untouched IDTa.ddrescue.log should be as good as if you didn't make the mistake.

Notes:

  • It can have happened some sectors containing all zeros were actually rescued. --generate-mode will classify them as ?. They will be filled with data taken from IDTa.iso "in vain". This doesn't matter for the ultimate result because they are all zeros in this other file as well.
  • The result should be the same if you interchange IDTa.iso and IDTa.img in the entire procedure (but keep in mind if you do this then the result will be in IDTa.iso). So there's a choice. With --generate-mode I would use the file from which I expect less sectors containing all zeros because this should minimize the amount of work for the last command.
  • The method works for regular files IDTa.iso and IDTa.img. If instead any of them you had a block device, its "random" content from before your work with ddrescue would interfere and spoil the result (so there's no point in solving a potential problem with different sizes in the first place, where truncate doesn't help).
  • I tested the procedure after replicating your mistake while trying to rescue a flakey device.

verwandte Informationen