Megabytes per Megapixel (MB per MP)

Tamas_Papp · February 8, 2023, 9:21am

First, your Sony a7IV (which is an ILCE-7M4, probably) chooses 12 or 14 bits depending on shooting mode. This of course affects file size when compressed. The actual compression is probably JPEG92 (which is lossless).

That said, I am not sure what you can actually do about this once the image is taken. You can either just leave the file as is, or convert to another format, such as lossy DNG. It all depends on how much you imagine that the very fine pixel-level details matter to you.

Personally, while I agree that storage costs can add up, I think that your time is better spent on optimizing your storage solutions. Eg if you pick a “deep archive” one like Amazon Glacier, you can go down to around than $1/TB/month, with the understanding that retrieval is not instantaneous.

I found it cheapest to run a Linux box with ZFS mirroring, upgrade HDDs whenever I need to (ZFS makes migration really smooth), and use the cloud for a daily backup from the server.

mbs · February 8, 2023, 1:11pm

Interesting!

One thing to keep in mind is that 7zip, which runs on a regular CPU, can spend tons of effort (and a significant amount of time) obtaining every last bit of compression. A camera, on the other hand, needs to compress each image in a fraction of a second while running on a small battery. It’s very likely that Sony and Canon engineers made different choices when balancing frame rate, time to clear a full buffer, battery life and storage size.

kofa · February 8, 2023, 1:24pm

Storing 14-bit values as a bitstream, without adding an algorithm that tries to exploit redundancy (repetitions, patterns, distributions) in the data is not considered compression, it is just storage, at least in general IT lingo. Saving 8-bit data in an 8-bit uncompressed TIFF is not compression, even though it requires less space than saving the same data in a 16-bit uncompressed TIFF. The same goes for 12 or 14-bit raw sensor data.

snibgo · February 8, 2023, 1:48pm

My Nikon D800 has an “uncompressed” raw mode, which I have never used before, so I tried it. For 7424x4924 pixels (14-bit), the files are about 76 MB. 7424*4924*2 = 73111552, which is the file size minus embedded images and stuff.

So, to my surprise, Nikon stores 14-bit images in 16 bits, and calls this “uncompressed”. This isn’t the first time I have disagreed with Nikon.

Colin_Adams · February 8, 2023, 2:53pm

I’m not surprised at all. Processors have an optimal data transfer rate when they transfer a word at a time. Transferring a bit stream would normally be much slower.

Tim · February 8, 2023, 3:15pm

I had had the same thought, but I don’t know enough about image storage to have made a comment.

jorismak · February 8, 2023, 3:37pm

This must mean that file is an uncompressed Sony file for a 60mpixel sensor.

So yes, that compressed with 7zip quite ok.

If it was already losslessy compressed , 7zip wouldn’t so much at all .

kmilos · February 8, 2023, 3:54pm

Btw, some Nikons pack the 12b or 14b tightly in a bitstream, some don’t (thus adding an overhead that can be a considerable 33.33% in the 12b case).

Generally, all lossless compression schemes in wide use today are around 1.5x, 2x at best depending on the image content.

Entropy512 · February 8, 2023, 4:09pm

It is possible to bitpack raw data (in DNG parlance, BitsPerSample not a multiple of 8), but in the vast majority of implementations I’ve seen, things get padded up to 16 bits because it’s so much simpler implementation-wise. Interestingly, I’ve experimented with DNGs with BPS of 10 or 12, and Lightroom chokes on those but RawTherapee does not.

Sony definitely stores 14-bit samplies as uint16. So uncompressed Sonys have 2 bytes per pixel plus a little extra for the JPEG preview and a tiny amount for headers and metadata.

Sony’s lossy compression was a fairly fixed mechanism that was constant-ratio - 1 byte per pixel

Lossless compression typically achieves around 1.5:1 to 2:1 ratios depending on content, so 1-1.5 bytes/pixels is typical. A black frame or clipped all-white frame will compress better and may be smaller.

This is why Sony didn’t offer lossless compression until the BIONZ XR.

kofa · February 8, 2023, 9:30pm

The Raspberry Pi camera also packs its 12-bit data: https://www.strollswithmydog.com/open-raspberry-pi-high-quality-camera-raw/
Of course, packing 12 bits is simpler than packing 14 bits, placing 2x4 bits into a byte, but distributing 4x6 ‘extra’ bits (in addition to the 8 bits that go straight into a byte) in 3 bytes isn’t rocket science, either.

Tamas_Papp · February 9, 2023, 9:24am

You may also want to try very aggressive compression algorithms (such as bzip2, zstd, lzip, or xz), with the max compression setting. For me on CR2 and RW2 files they don’t make a whole lot of difference, (1–3%), but that’s already compressed data so one would need to uncompress first.

Ofnuts · February 9, 2023, 10:10am

Then UTF-8 encoding is a form of compression, since technically Unicode characters are encoded on 21 bits these days.

Colin_Adams · February 9, 2023, 10:15am

Yes. It’s compressed compared with UTF-32.

vbs · February 9, 2023, 10:23pm

I tried .tar.xz but was not able to squeeze anything more out.
I think main point is that the compression of the files varies a lot between manufacturers.
The vast amount of space that is going to be consumed comes from resolution of course but there can be some penalty depending on the particular make / model of the camera and of course on the type of image.

A reasonable concern was the space needed - not only locally but also cloud.
It is good to find out that even the bigger files compress quite well. At the end - looks like the cloud storage is going to be less of an issue than initially expected (because of compression that can be applied during backup).

Thank you all for your ideas and thoughts!

paperdigits · February 9, 2023, 10:28pm

I’m using restic to backblaze b2 storage, and restic has built in compression as well as deduplication.

elGordo · February 9, 2023, 10:47pm

Do you have an estimate of the per cent compression you are getting (I suppose I should ask what manufacturer’s files you are backing up too) ?

paperdigits · February 9, 2023, 11:05pm

Since I generally shoot compressed lossless raw already, the compression doesn’t help a ton. I mostly have Nikon net, but also some old canon cr2, Fuji RAF, and now Ricoh dng. Majority net files though.

Deduplication helps a little bit.

But for my 400gb of raw files its like $2/month

vbs · February 10, 2023, 1:20am

I am also on B2 with duplicati
Same - compression and de duplication.

You may find the following of interest (not sure how updated it is however)

I don’t use qBackup but I was considering it.

I was also considering MSP360 but they are dropping the perpetual license and turn towards subscription model. And this is a no go for me at this point.

A bag of rice here moved from $18 / per 18 kg to $50 / per 18 kg.
So - I am sorry - rice is more important than the subscription for something that I can live without.
Besides - duplicati is nice, open, free and also have supportive community.

Tamas_Papp · February 10, 2023, 11:00am

Duplicati (the software) itself is free, but my understanding is that it requires a storage backend, like all alternatives.

It doesn’t really. Most raw formats are a thin veneer over some standard, eg those found in TIFF, a direct adaptation or at most a tweak of some well-established algorithm (eg CR3).

What can vary a lot is how compressible files are, which depends on the image and a bit on the sensor noise (noise is, by definition, random, so it adds a lot to the file size). To disentangle the two, one would have to make up the same sensor data, re-convert it into a raw file, and try compression on that. I am not aware of anyone doing this, it would be a lot of work for little benefit. Generally the differences between formats are below 30–50%, and few people would choose a camera based on this, so most people just accept this as is.

nosle · February 10, 2023, 3:22pm

I also use restic and deduplication was a big feature in making that choice. When having the same raws/files across computers/discs and backing up to the same repo you save a lot of space. Those who work from NAS and only keep their files on one computer benefit a lot less.