Megabytes per Megapixel (MB per MP)

I had had the same thought, but I don’t know enough about image storage to have made a comment.

This must mean that file is an uncompressed Sony file for a 60mpixel sensor.

So yes, that compressed with 7zip quite ok.

If it was already losslessy compressed , 7zip wouldn’t so much at all .

2 Likes

Btw, some Nikons pack the 12b or 14b tightly in a bitstream, some don’t (thus adding an overhead that can be a considerable 33.33% in the 12b case).

Generally, all lossless compression schemes in wide use today are around 1.5x, 2x at best depending on the image content.

It is possible to bitpack raw data (in DNG parlance, BitsPerSample not a multiple of 8), but in the vast majority of implementations I’ve seen, things get padded up to 16 bits because it’s so much simpler implementation-wise. Interestingly, I’ve experimented with DNGs with BPS of 10 or 12, and Lightroom chokes on those but RawTherapee does not.

Sony definitely stores 14-bit samplies as uint16. So uncompressed Sonys have 2 bytes per pixel plus a little extra for the JPEG preview and a tiny amount for headers and metadata.

Sony’s lossy compression was a fairly fixed mechanism that was constant-ratio - 1 byte per pixel

Lossless compression typically achieves around 1.5:1 to 2:1 ratios depending on content, so 1-1.5 bytes/pixels is typical. A black frame or clipped all-white frame will compress better and may be smaller.

This is why Sony didn’t offer lossless compression until the BIONZ XR.

2 Likes

The Raspberry Pi camera also packs its 12-bit data: https://www.strollswithmydog.com/open-raspberry-pi-high-quality-camera-raw/
Of course, packing 12 bits is simpler than packing 14 bits, placing 2x4 bits into a byte, but distributing 4x6 ‘extra’ bits (in addition to the 8 bits that go straight into a byte) in 3 bytes isn’t rocket science, either.

1 Like

You may also want to try very aggressive compression algorithms (such as bzip2, zstd, lzip, or xz), with the max compression setting. For me on CR2 and RW2 files they don’t make a whole lot of difference, (1–3%), but that’s already compressed data so one would need to uncompress first.

Then UTF-8 encoding is a form of compression, since technically Unicode characters are encoded on 21 bits these days.

Yes. It’s compressed compared with UTF-32.

I tried .tar.xz but was not able to squeeze anything more out.
I think main point is that the compression of the files varies a lot between manufacturers.
The vast amount of space that is going to be consumed comes from resolution of course but there can be some penalty depending on the particular make / model of the camera and of course on the type of image.

A reasonable concern was the space needed - not only locally but also cloud.
It is good to find out that even the bigger files compress quite well. At the end - looks like the cloud storage is going to be less of an issue than initially expected (because of compression that can be applied during backup).

Thank you all for your ideas and thoughts!

I’m using restic to backblaze b2 storage, and restic has built in compression as well as deduplication.

1 Like

Do you have an estimate of the per cent compression you are getting (I suppose I should ask what manufacturer’s files you are backing up too) ?

Since I generally shoot compressed lossless raw already, the compression doesn’t help a ton. I mostly have Nikon net, but also some old canon cr2, Fuji RAF, and now Ricoh dng. Majority net files though.

Deduplication helps a little bit.

But for my 400gb of raw files its like $2/month

2 Likes

I am also on B2 with duplicati
Same - compression and de duplication.

You may find the following of interest (not sure how updated it is however)

I don’t use qBackup but I was considering it.

I was also considering MSP360 but they are dropping the perpetual license and turn towards subscription model. And this is a no go for me at this point.

A bag of rice here moved from $18 / per 18 kg to $50 / per 18 kg.
So - I am sorry - rice is more important than the subscription for something that I can live without.
Besides - duplicati is nice, open, free and also have supportive community.

3 Likes

Duplicati (the software) itself is free, but my understanding is that it requires a storage backend, like all alternatives.

It doesn’t really. Most raw formats are a thin veneer over some standard, eg those found in TIFF, a direct adaptation or at most a tweak of some well-established algorithm (eg CR3).

What can vary a lot is how compressible files are, which depends on the image and a bit on the sensor noise (noise is, by definition, random, so it adds a lot to the file size). To disentangle the two, one would have to make up the same sensor data, re-convert it into a raw file, and try compression on that. I am not aware of anyone doing this, it would be a lot of work for little benefit. Generally the differences between formats are below 30–50%, and few people would choose a camera based on this, so most people just accept this as is.

I also use restic and deduplication was a big feature in making that choice. When having the same raws/files across computers/discs and backing up to the same repo you save a lot of space. Those who work from NAS and only keep their files on one computer benefit a lot less.

Yes and no - you can backup to a share, FTP or friend’s NAS over S3, even to your own USB if you want. But for the cloud - I am not aware of someone offering free cloud storage. Recently I read about this one https://www.storj.io/ maybe a good option if you have limited need. I have not tested it however.

You are right. Upgrading my camera is on the far dream list for now. I am just doing my best to continue dreaming. For now I will continue squeezing whatever is left in my current one and enjoying the photos even if they are likely regular or ordinary for many :slight_smile:

It would be cool to have p2p backup, under ng something like the protocol that syncthing uses to discover its piers, to share backup space among people at home. Probably not the most reliable…

From what I read (if I understood correctly) storj.io is something like that. In a sense that the backup is distributed among many nodes and participants can rent space to others. It is possible to have backup sharing (I have not done it) but if the NAS of uses have standard sharing like S3 then the users have free DNS like duck dns - for a limited implementation it would work (I “expect” it would work). But when it is at scale - it would be quite different thing.

Yes, but with a blockchain and a shitcoon attached to it, which makes it really undesirable.

2 Likes

Part of my backup strategy is to sync my home server backup folder to my brother’s server using Syncthing in send-only mode (this is the “disaster level recovery” copy, 10000km away).

1 Like