What do you call âgeneric compressionâ? Are you comparing a debayered 16bit image to the raw file compressed with 7zip or something?
The raw file is a monochrome file. If you write it as a TIFF file or something first, youâre comparing a lossless-encoding of a RGB 16-bit per channel image, (48bit per pixel), vs the RAW file which is single channel 16bit (16bit per pixel).
Debayering (demosaicing) âgeneratesâ extra data, which youâre then asking to compress. You lose the possibilities of the RAW file (white balance applied, demosaicing applied, color-matrix applied) and you make the file bigger.
JPEG-XLâs near-lossless-mode (VarDCT) is groundbreaking vs the other formats out there, and is actually good at preserving small details and grain and stuff like that (where other âmodern image formatsâ like to smooth details away).
Use dcraw to convert the raw data to bayer pixel data (so a 61 megapixel monochrome file), use imagemagick to select the right pixels, so you get the 4 channels inside it in separate files (R, G1, G2, B) and compress those with jpeg-xl in losssmode mode. No way a generic data compressor can beat that. But like I said, there is no tool yet to reconstruct the original raw file again :).
(Although if you visit encode.su, there are a lot of generic compressors which are smart enough to ârecognize image dataâ and actually use a lossless image compressor behind the scenes).
Iâve done a little test with a Sony A7Rm4 file I downloaded from Sony A7R IV Sample Images (SOOC JPGs + RAW) - AlphaShooters.com.
Raw lossy (Which isnât that bad at all, but people freak out about it being lossy): 61.684.736 bytes.
The bayered uncompressed data inside of it: 122.419.200 bytes (raw pixel values, no file format header or something).
That is 30.604.800 bytes per channel (R, G0, G1, B).
Compress those 4 channels with Jpeg-XL gives me:
r: 9.042.750 bytes
g0: 10.840.330 bytes
g1: 10.835.900 bytes
b: 10.575.782 bytes
41.294.762 bytes total, 33.7% of the original raw pixel data, or 66.95 of the Sony compressed raw file.
7z with lzma2 and a large dictionary + ultramode gives me 51.176.749 bytes when I compress the original (compressed) ARW file.
Funny enough, it gives me 47.061.852 bytes when I compress the original raw monochrome pixel data.
What I think is even more interesting - and a bit depressing - is that JPEG XR (the old 2009 Microsoft HD Photo format) which is actually meant to be used inside cameras can compress the raw data to 43.156.396. Lossless still. JPEG XR is also an open specification, and is based on jpeg math with a bit bigger matrices and memory requirement. But it compresses and decompresses almost instantly on my system, should be way more easy to implement in fixed-function processing chips that already use jpeg, and comes close. I never understood why it was never used. The compression-gain-vs-processing-speed is still best in class after all these years, supports lossy and lossless, up to 16bit per channel.
Anyway, If you really want to archive smaller files, save a small DNG or a preview JPG and 7zip the original RAW file. Youâll get the easiest compression that is quite good.
If you expect camera makers to use any algorithm you think is nice inside a RAW file, donât get your hopes up. And I prefer speed and simplicity in the camera. A bigger storage card is an easy solution.
There are external tools to compress RAW files, but you have to uncompress them again to use them in any software (I believe, not sure). In those cases, just 7zip the ARW and be done with it.
Maybe someone has the time and energy to write a tool to split a raw file into itâs channels, use an external tool to compress them, and then can reverse the process. But it will require some knowledge of all different raw formats to get them back bitperfect.
Compressing to a DNG would be possible, but Adobe already does that, so
.
Being annoyed with the difference between 12bit vs 16bit per pixel, or 14bit vs 16bit per pixel of uncompressed data⌠there are bigger issues in the world to think about.