Here’s some testing on iPhone 12 Pro Max ProRaw DNG files which has multiple data blocks.
65 image data blocks
pos: 3612, size: 145845, pad: 0, pos: 149457, size: 153732, pad: 0, pos: 303189, size: 156425, pad: 0, pos: 459614, size: 157925, pad: 0, pos: 617539, size: 158392, pad: 0, pos: 775931, size: 157352, pad: 0, pos: 933283, size: 155805, pad: 0, pos: 1089088, size: 154272, pad: 0, pos: 1243360, size: 146496, pad: 0, pos: 1389856, size: 153557, pad: 0, pos: 1543413, size: 156022, pad: 0, pos: 1699435, size: 155417, pad: 0, pos: 1854852, size: 155741, pad: 0, pos: 2010593, size: 156477, pad: 0, pos: 2167070, size: 147454, pad: 0, pos: 2314524, size: 144603, pad: 0, pos: 2459127, size: 146933, pad: 0, pos: 2606060, size: 153178, pad: 0, pos: 2759238, size: 159627, pad: 0, pos: 2918865, size: 161667, pad: 0, pos: 3080532, size: 159122, pad: 0, pos: 3239654, size: 156860, pad: 0, pos: 3396514, size: 144587, pad: 0, pos: 3541101, size: 146435, pad: 0, pos: 3687536, size: 144645, pad: 0, pos: 3832181, size: 154953, pad: 0, pos: 3987134, size: 159339, pad: 0, pos: 4146473, size: 157849, pad: 0, pos: 4304322, size: 155740, pad: 0, pos: 4460062, size: 154221, pad: 0, pos: 4614283, size: 152040, pad: 0, pos: 4766323, size: 152265, pad: 0, pos: 4918588, size: 146993, pad: 0, pos: 5065581, size: 148942, pad: 0, pos: 5214523, size: 152775, pad: 0, pos: 5367298, size: 144907, pad: 0, pos: 5512205, size: 147170, pad: 0, pos: 5659375, size: 143499, pad: 0, pos: 5802874, size: 142795, pad: 0, pos: 5945669, size: 151072, pad: 0, pos: 6096741, size: 148978, pad: 0, pos: 6245719, size: 147370, pad: 0, pos: 6393089, size: 153343, pad: 0, pos: 6546432, size: 154764, pad: 0, pos: 6701196, size: 152782, pad: 0, pos: 6853978, size: 151604, pad: 0, pos: 7005582, size: 149113, pad: 0, pos: 7154695, size: 145593, pad: 0, pos: 7300288, size: 147684, pad: 0, pos: 7447972, size: 146889, pad: 0, pos: 7594861, size: 148436, pad: 0, pos: 7743297, size: 151330, pad: 0, pos: 7894627, size: 152334, pad: 0, pos: 8046961, size: 151399, pad: 0, pos: 8198360, size: 147186, pad: 0, pos: 8345546, size: 142587, pad: 0, pos: 8488133, size: 147791, pad: 0, pos: 8635924, size: 150236, pad: 0, pos: 8786160, size: 148456, pad: 0, pos: 8934616, size: 147215, pad: 0, pos: 9081831, size: 146308, pad: 0, pos: 9228139, size: 146062, pad: 0, pos: 9374201, size: 145635, pad: 0, pos: 9519836, size: 143228, pad: 0, pos: 9663064, size: 218771, pad: 1,
So I think that we can just dump the data as it is being copied for any raw format that exiftool supports.
I just tested this out by opening a test.dat file and writing each block in the foreach loop to this file, omitting the padding.
sub CopyImageData($$$)
{
my ($self, $imageDataBlocks, $outfile) = @_;
my $raf = $$self{RAF};
my ($dataBlock, $err);
my $num = @$imageDataBlocks;
$self->VPrint(0, " Copying $num image data blocks\n") if $num;
my $filename = "./test.dat";
open(FH, '>', $filename);
foreach $dataBlock (@$imageDataBlocks) {
my ($pos, $size, $pad) = @$dataBlock;
$raf->Seek($pos, 0) or $err = 'read', last;
my $buff;
$raf->Read($buff, $size+$pad);
print FH $buff;
$raf->Seek($pos, 0) or $err = 'read', last; # reset
my $result = CopyBlock($raf, $outfile, $size);
$result or $err = defined $result ? 'read' : 'writ';
# pad if necessary
Write($outfile, "\0" x $pad) or $err = 'writ' if $pad;
last if $err;
}
close(FH);
if ($err) {
$self->Error("Error ${err}ing image data");
return 0;
}
return 1;
}
The size of the DNG file IMG.DNG is 9881836 and the size of the resulting test.dat is 9878223, which seems plausible.
Furthermore, when I do a diff -ua test.dat and IMG.DNG, there are only a few differences: 1) the stuff at the beginning of the IMG.DNG file which clearly has the EXIF data; and 2) no newline at the end of the test.dat.
So far so good.
But there also appears to be 3) some data (roughly 48 bits) at the beginning of test.dat that aren’t also in IMG.DNG, and that will change the hashes.
--- test.dat
+++ IMG.DNG
@@ -1,4 +1,21 @@
-< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
+< stuff >
@@ -21881,4 +21898,4 @@
-< stuff >
\ No newline at end of file
+< stuff >
\ No newline at end of file
So this approach will require a little more investigation and testing, but I think it will work.
ETA #1: Upon further examination, it appears that the 3) apparent difference at the beginning of the test.dat dump is actually in-line at the end of the EXIF section, and therefore is an artifact of how diff processes line-by-line.
So the output data matches bit-for-bit for the file types I have tested thus far.
ETA #2: I had incorrectly assumed that only the last dataBlock has any padding, but that’s not true based on the way exiftool is reading the data.
Through my testing, exiftool is finding non-terminal dataBlocks that it determines need padding.
Since we want to copy each dataBlock as it exists in the original file including any in-line garbage data that is considered padding, we should copy each dataBlock for $size + $pad.