Dumping unmodified raw image data from raw files?

Is there a way to dump the unmodified raw image data from raw files, i.e. no EXIF data? I want to calculate hashes of this data.

@rt985426 Welcome to the forum! By unmodified, do you mean in its original mosaiced form without brightness adjustments? Like dcraw's document mode? If so, I don’t think RT has that, or I may have forgotten if it did.

If you’re up to a bit of C++…

A while back, I wrote a C++ program to open a raw file and save the absolute unmodified raw data to TIFF. You can find it here:

You could delete everything between lines 45 and 104 and replace it with something like this:

FILE *f = fopen(argv[2], "wb");
fwrite (rawdata , sizeof(unsigned short), width*height, f);
fclose(f);

Compile it with:

g++ -o rawdata rawdata.cpp -libraw

and run it with:

$ ./rawdata DSC_your_favorite_nef.NEF rawdata.dat

and now you can run hashes against rawdata.dat to your heart’s content…

Edit: Not tested, your mileage may vary.

1 Like

Yes, completely unmodified and unprocessed, i.e. just a dump of the raw image data itself.

This is important in order to ensure that the resulting hashes are not dependent on the program, version, and parameters.

Otherwise, I could just convert everything to JPEG so long as I use the same program, version, and parameters in perpetuity.

This sounds exactly what I’m looking for. I’ll give it a try. Thanks!

2 Likes

@ggbutcher Could you add this feature to rawproc? dcraw often crashes on me.

Actually, it’s already there, kinda. If you save a image with .dat as the file extension, rawproc will write the data to a file, in one of a couple of formats. From the help file:

outputmode=rgb|split|channelaverage: Applies to data save. If ‘rgb’, data is printed RGBRGBRGB… If ‘split’, each channel is printed separately, in sequence. If ‘channelaverage’, each column is averaged, channel-by-channel. Default: 'rgb

The output is text, comma-separated floating point values. I added this to get raw data of spectrum shots for my SSF endeavors…

A couple of hints:

  1. You need libraw and libtiff libraries installed where the compiler can get them. In Ubuntu:
    $ sudo apt-get libraw-dev libtiff5-dev
    if you don’t want to mess with libtiff, just delete the #include <tiffio.h> at line 5.

  2. The Makefile contains pkgconfig invocations for rtprocess; you can delete ‘rtprocess’ from both lines, and the Makefile will probably work. I forget why I put those in… :smiley: … never mind, I just pushed a commit that takes them out.

Let me know how it goes; I may just add a really_raw.cpp to the repo, along with appropriate Makefile mods.

I think it would be useful to have an option to dump the unmodified raw image data as well, since any amount of decoding or processing makes the resulting data dependent on the program, version, and parameters used at runtime.

Yes, you have me thinking about that… rawproc is my hack raw processor, at https://github.com/butcherg/rawproc. Wouldn’t be hard to add a outputmode=rawdata to do exactly what we’ve been discussing…

Yup, this seems to do exactly what I want to do, and I guess the only limitation is what libraw can handle.

I would have to do testing to verify, though.

Being able to dump to stdout and pipe into another command would be useful, too.

It occurs to me, in rawproc the same thing you’re doing with rawdata would be a little more challenging, as rawproc’s internal data format is float, 0.0-1.0. I’d feel a little funny converting that back to 0-65535 unsigned integer and calling that ‘unmodified’…

Now, piping to stdout, you’d want the binary integers? or text numbers?

Yes, by unmodified, it would be as if you took the raw file and chopped out the header information, EXIF data, etc., and just output the raw image data byte for byte.

I think a binary file/output in that form would be the best and most meaningful way to calculate hashes.

You could always add rawdata output modes for anything that you think would be useful.

I came across this not sure if its useful…

https://cs.brown.edu/courses/csci1290/labs/lab_raw/index.html

Extract CFA from RAW

To get the raw sensor data into Python, we use dcraw with the following options to output a 16bpp TIFF file. This will also overwrite the previously produced preliminary image.

dcraw -4 -D -T <raw_file_name>

You can now read this file into Python using

raw_data = Image.open('../sample/sample.tiff')
raw = np.array(raw_data).astype(np.double)

,which will yield the raw CFA information of the camera.

Why bother with these two steps when you can read into a numpy array directly from a raw file using rawpy?

This works as expected for my Canon CR2 files, but for some reason, libraw can’t open my Sony ARW and Panasonic RW2 files.

I’m using the latest version as of this writing (0.20), and my cameras have been supported for a while now. So, I’ll have to investigate.

btw, fopen should be using argv[2] in your code snippet.

I did see that but most of the parameters I saw in the API were calls to perform functions that modified the data. It did appear like something you could do with rawpy. I’m not a programmer so it was not immediately apparent to me. I was going from a link I found on google and this reference…https://rcsumner.net/raw_guide/RAWguide.pdf Clearly I think you have more experience and can comment and direct a lot better than me…

Here’s one for you…I might get time to go through it…looks like it could be interesting…

https://www.dpreview.com/forums/post/56232710

Hmmm, I just opened a RW2 (Panasonic DC-LX100M2) and a ARW (Sony ILCA-77M2) with rawproc, which uses the git master branch of libraw, pulled about two weeks ago. Libraw has decent error reporting, just need to hook into it…

Thanks; wrote that before I actually looked at the code… :crazy_face:

I haven’t actually sat down and learned python, yet. I can copy/paste others’ recipes with aplomb, however… :laughing: