Question about Image Fileformats

pragomer · May 22, 2019, 5:56am

Hey,
I got a general question about dealing with file formats. Normally I take raw photos and process them in darktable, and I could say that raw processing is the most “lossless” processing of course.

Now lets say, after the raw-workflow, I want to fine tune some things; and for giving an extreme example, lets say:
I export a fullsize jpg out of darktable. Then edit in in gimp and overwrite the jpg. Then opening in photoshop doing something and also overwriting the jpg. And so on…

Question: is it correct that, with every edit&saving process (jpg) the quality gets worse and worse because of jpg compression (even with 100% of quality) ?

And now my second question is: Would TIF be the perfect format for this case because it does not (if not enabled) compress? And is it also correct that I could theoretically jump between the two examples above (gimp and photoshop) as often as I want without loosing quality because of TIF?

I know TIF is a very huge filesize; in that case where I would have to “jump” between programs I would create a TIF out of darktable, edit it and finally create a jpg and then delete the TIF so that the size would not matter in the end.

I hope I could post my question clear enough.

Thank you folks

Kind regards

Pragomer

Tobias · May 22, 2019, 7:46am

Question: is it correct that, with every edit&saving process (jpg) the quality gets worse and worse because of jpg compression (even with 100% of quality) ?

Yes jpeg is always lossy compression. (90° rotations can be done lossless.) You should only use it as a last step for the web.
The quality is not a percent value, but just a number between 0 and 100.

And now my second question is: Would TIF be the perfect format for this case because it does not (if not enabled) compress?

Yes. And even better, TIFF supports lossy compression (e.g. JPEG) and lossless compression (e.g. LZW, PackBits, Deflate)

And is it also correct that I could theoretically jump between the two examples above (gimp and photoshop) as often as I want without loosing quality because of TIF?

Yes, the only problem with TIFF is, that everyone can extend it with extra stuff. But as long as you don’t do fancy stuff it should not matter.

Jossie · May 22, 2019, 7:48am

Hello,

yes, if you do not destroy information during processing, TIF is loss-less and you can go back and forth as often you like.

This is not true for JPG, because of the compression, which is always used when you save a file (as far as I know), you will loose information. I like to compare this to copying a sheet of paper. Instead of always using the original, you loose information if you copy the copy again and then the copy again …

Here is a simple test. I created an image and saved it as TIF (text and a coloured rectangle). Then I opened the TIF and saved it with maximum quality as JPG. Then this JPG was opened and again saved as JPG with maximum quality. The two JPGs were then divided in imageJ by one another. If it would be lossless, the result should be an image with all pixels set to 1. Here is what I got:

Left is the TIF original, right is the ratio of the two JPGs. The orange pixels are 1 in the ratio image. Deviations are +/- 3%.

I could imagine that some software will not compress the file if no changes were made before saving. But this has to be found out by experiment or be written in the documentation. In the example above this was obviously not the case.

As Tobias pointed out, there are only a few operations that can be done lossless in JPGs like rotation by 90°, flip, crop. However, the lossless operation has to be supported by the software. If you just save the file as JPG it will not be lossless.

Hermann-Josef

pragomer · May 22, 2019, 8:43am

Hi,

thank you so much for answering so quickly. Also is this the answer I was hoping for, what means that my thinkings were right and I can go this way.

Again, this is such a great community here. Thanks for this.

Kind regards

Morgan_Hardwood · May 22, 2019, 9:24am

That is correct.

You can use TIFF with lossless compression, e.g. deflate. However, if you are simply passing an image from one program to the next and this is not the final image (i.e. it is an intermediate image), then file size does not matter but read/write time matters, so skip compression.

Your final image, the one you archive, should balance file size and quality. Write time does not matter at this point, as you write it once and have it forever. Compressed 8 or 16-bit TIFF, or JPEG 2000, you choose.

What else to consider?

Lossy compression is not the only source of data loss. Right off the bat, if you export a 16-bit TIFF out of darktable you’re already losing a lot of data in the “clipped” highlights and shadows (i.e. values less than 0 and more than 1 are discarded). If your workflow supports it, stick to 32-bit unclipped intermediate files. In RawTherapee you would follow the “Unclipped” RawPedia article (yes I still need to update those placeholder images with real samples).
TIFF implementations vary. Test your workflow that you’re not losing data and metadata along the way, or to an acceptable degree.
Concerning re-saving JPEG files, re-saving without using the same JPEG compression settings leads to greater degradation in quality. That is, if you re-save 100 times using quality=90, and do another test where you re-save 100 times using a quality that hovers between 91 and 100, you will find that despite the former test using a lower quality, the end result should look better than the latter test. I read about this many years ago and confirmed it with a simple Bash script, it’s probably still true.

Jossie · May 22, 2019, 2:44pm

@Morgan_Hardwood if you export a 16-bit TIFF out of darktable you’re already losing a lot of data in the “clipped” highlights and shadows (i.e. values less than 0 and more than 1 are discarded).

Could you please explain, why this is so? The detector delivers pixel values in the range 0 … 65535 for 16-bit/channel representation. So if one does preserve this range, there are no clipped values. Where and why does then clipping occur? What does this have to do with 32-bit representation?

Hermann-Josef

PS: How does one insert a quote to a post with avatar as you did above?

Morgan_Hardwood · May 22, 2019, 5:17pm

@Jossie AFAIK darktable’s pipeline is 32-bit float, not 16-bit int. Having said that, it’s not a question of just bit depth, but of what happens to values outside of the range you mapped to 0-max in the histogram. When saving using an unsigned integer type (16-bit int TIFF, for instance), those values get clipped. A floating-point format of sufficient bit depth is capable of accurately representing values on a logarithmic scale, so if the software allows you (as does RawTherapee, and I assume as does darktable though I don’t use it) you could tell it to not clip values outside of your end histogram.

Select the text using your mouse, then a “Quote” button pops up near your cursor. Whack it.

Claes · May 22, 2019, 5:18pm

Wow! It is even possible to expand the post by clicking Morgan.

Jossie · May 22, 2019, 5:43pm

But then the clipping is introduced by the image processing. This is what I meant.

If I set the black and white point in the histogram at those places, where the histogram starts / ends, no (noticeable) clipping will occur.

Hermann-Josef

ggbutcher · May 22, 2019, 8:16pm

Data right out of the sensor is at the ADC resolution, say, unsigned 14-bit, which corresponds to a range between 0=black and 16383=maximum saturation (some channels may saturate before that, camera specific). Since computers don’t have 14-bit integer types, this data is usually delivered in unsigned 16-bit integers.

At some point, that 14-bit data has to be scaled so the camera data’s max saturation roughly aligns to the data container, but in the meantime the 16-bit container provides about 2 stops of headroom for processing to eat into. This, if your software uses 16-bit integer as the internal image format. Me, I use floating point in rawproc, where the convention is 0.0 - 1.0, but I scale my raw data to that in the same proportion as it exists in the integer container, so my max camera value is in floating point about 0.25.

Whatever, the things done in raw processing can drive the data past it’s original saturation, and maybe up to and beyond the container max (16-bit: 65536, float: 1.0). Here’s where the floating point convention works better than integer; in floating point the values can just keep going past 1.0, where with the 16-bit integer they have to be clamped to 65535, or they could just wrap around to 0, which is erroneous and bad-looking to boot. Anyway, to the point of the thread, the data’s notion of white eventually has to be scaled to the display or file format’s convention for white. Floating point TIFF is an exception; you can store the arbitrarily larger than 1.0 values in there, for subsequent use in other programs like GIMP, with maybe the opportunity to drag some or all of it back into display bounds; with integer-based formats, no such opportunity, that data is lost in the output mangling.

For batch proofing, the only “curve” i use is a simple linear black/white point setting to the histogram limits, no base curve, no filmic, no gamma, and a lot of times that scaling works just fine for the original exposure. I don’t do many portraits, however…

Edit: “DAC” to “ADC” - wrong device…
Edit: There appears to be about 2 stops of headroom for a 14-bit raw in a 16-bit container…

afre · May 22, 2019, 8:45pm

Any raw processing will inevitably incur loss. The question is whether that loss is acceptable. Making any decision post-raw requires the weighing of the pros and cons. You would need to consider loss in raw processing, image compression and file compression.

Jossie · May 23, 2019, 7:48am

I think I do understand the mathematics behind the various data representations. I also see that any correction like flatfielding or ICC-profiling can lead to an increase in the numbers provided by the ADC. What I do not understand is why this should lead to any data loss in the final result. If the data are processed, say in 32bit-floating, and then in the end are scaled to the available number space, say for 16bin unsigned integers 0 … 65535, then there will be no loss. Furthermore, since the eye can only distinguish roughly 256 grey levels, I do not see any issue with the number space provided by 16-bit unsigned integer.

See above. I cannot understand that statement, since the ADC in the cameras usually do provide less than 16 bit /channel.

Hermann-Josef

Morgan_Hardwood · May 23, 2019, 10:00am

No, raw values are typically 12-bit or 14-bit, as @ggbutcher nicely explained (how useful the last bit or two is, is another story).

It depends on your idea of “preserve this range”. In typical use, the range is easily not preserved - even if you just apply an input profile and a typical s-shaped tone curve, you’re already significantly boosting values and likely pushing parts of the image past the right end of the histogram.

It’s easy to see with sample images. Here’s a sunset in RawTherapee:

Save this to a 16-bit integer TIFF file, open in GIMP, and you see this:

Now reduce exposure by 1EV in GIMP, and you get this mess:

Save from RawTherapee to a 16-bit float instead, and reducing exposure by 1EV in GIMP leads to this fine result:

Jossie · May 23, 2019, 11:44am

@Morgan_Hardwood Thanks a lot for the examples.

So if the detector only delivers 12 or 14 bits/channel it is even less obvious that you get losses!

Of course, the range need not be preserved by operations like tone curves and others. But as I said, if one does all the image processing in 32bit floating point arithmetic and when done, scale everything back to the range covered by 16 bits, there cannot be any losses. If there are, something was done wrong, to my opinion . Or do we have different definitions of “losses”? For me, unsaturated data are lost, if you drive them into numerical saturation either at the lower or the upper end.

If it would be the case that one cannot avoid losses during reduction of CCD-data (there is no difference to CMOS in this case) , it would not be possible to use CCDs for quantitative work in astronomy.

Hermann-Josef

snibgo · May 23, 2019, 2:02pm

One “data loss” process occurs because of quantization, reducing the results of a computation to a certain number of bits. This always occurs, whether the result is 8-bit integer or 32-bit float (which has 24 bits of precision, IIRC). More bits in the result reduces the effect.

Quantization means that two values that were different in the input may become the same value in the output. This is a form of data loss, aka “information loss”.

Jossie · May 23, 2019, 2:37pm

Yes, this might also be called a rounding error. But with 65535 values available and 256 discernible this is only a theoretical issue.

As far as I see this could only happen, if the tone curve rises above 1 somewhere, which I have never seen. It will not happen with a standard s-shaped curve, which brightens dark pixels and reduces bright pixels in intensity, or the other way around – unless I misunderstand the tone curves.

Hermann-Josef

ggbutcher · May 23, 2019, 6:41pm

I’m not at the computer where I can dig up an example, but there are certain of the demosaic routines in librtprocess that will push certain channels/pixels well past both 0.0 on the low end or 1.0 on the high end, as high as 6.0 in one case.

Since I separated out my individual raw processing options in rawproc-development (0.9, to be released Real Soon Now…), I’ve seen how camera sensor saturation can bollox up the data to require very aggressive clipping for display. When the pixels pile up at sensor saturation, not only do they lose definition but in application of white balance they separate, and the clipping required to make the blown highlights “not magenta” is to clip to the lowest pile…

So, for now I’ve moved my chase for an effective ETTR strategy to one of ETPHAAC: Expose To Preserve Highlights At Any Cost. Accordingly, my new camera has a special matrix metering mode that, instead of averaging to middle gray, it “shifts left” to keep resolvable highlights. Still trying to tease out specifics of its behavior, but it clearly works on the JPEG-origin histogram. Not ETTR, but it does prevent sensor saturation.

Back to @Jossie’s inquiry, I think there are two places where clipping is to be considered: 1) exposure relative to sensor saturation, and 2) truncation to whatever display white after munging the data in post-processing. For #2, one can surely re-scale their data to fit the bounds just prior to saving or displaying, but for high-dynamic-range scenes, that scaling has to be done with care to keep some semblance of the original scene’s tone and color. My software test image has an in-scene light source; I haven’t found a curve yet that will prevent me having to clip out the brightest parts of the light source. And I think that’s okay; danged light was pretty bright, to my recollection…

Jossie · May 23, 2019, 7:13pm

To my understanding, all that counts as regards to tone and colour are the relative levels in an image, which is preserved by the scaling, if done correctly. As an example, such a scaling occurs when as a final step you go from a 16bit TIF to a 8bit JPG. You scale from 0…65535 to 0…255.

Hermann-Josef

Claes · May 23, 2019, 7:17pm

Didn’t Elle write a whole lot about this a few months ago?
Search this forum for elle float clip

Have fun!
Claes in Lund, Sweden

ggbutcher · May 23, 2019, 7:28pm

I think your success in doing so will depend on the dynamic range of the scene vice the dynamic range of your camera. You’ve prompted an experiment in that regard: I have in my possession Nikon D50, D7000, and Z6, with progressively wider DR. I’ll find a high-DR scene and take the same picture with each camera, ETTR. Then, I’ll post-process to 1) make three images all pegged to the max data at display white, and 2) three images all tone-curved to make similar shadow pull-up. Since I procured each successive camera for it’s improvement in this specific regard I think I know where this will go, but I’ve never really done a comparative assessment. Might take a few days, done somewhere interleaved with major yard work…