Question about Image Fileformats

afre · May 23, 2019, 7:46pm

Listen to @Claes. Also read Elle’s articles (see their updated dates because they may be out-of-date in relation to GIMP).

It isn’t just about rounding. When you make decisions such as scaling (arithmetic) and value remapping, you are changing the way values relate to one another and where they lie. This irreversibly alters the tone, colours and dynamic range, etc., and introduces abstractions and distortions of all sorts. That is why we have such a thing as scene-referred editing and display, which still sacrifices data integrity but strives to do so in a minimal way. Then we have demosaicing and white balancing, and the preceding steps. Of course, this discussion is way beyond the question of file formats.

Tobias · May 23, 2019, 7:53pm

Of course, this discussion is way beyond the question of file formats.

You are right, lets bring it back to the main topic. As an alternative to TIFF you could use EXR as file format. That is able to save 32 bit float and at least darktable and GIMP support that.

Morgan_Hardwood · May 23, 2019, 8:24pm

No, it’s a very practical issue when talking about a file format for preserving data, especially when data from that file will be further manipulated. Remember that human vision is more sensitive to changes in low light than in bright light, so more bits need to be assigned to dark values than to light values, but when you want to manipulate the data further, then what is now bright may become dark, so you need precision all along the dynamic range. If you want to capture and store a scene with minimal loss, where there are dark things and extremely bright things, that’s a problem when working on a linear 16-bit scale. That’s why log scales and floating-point formats such as OpenEXR (or flavors of TIFF) exist.

You’re increasing values whenever the tone curve goes above the linear diagonal. Which every practical tone curve does.

This conversation is about using an intermediate format that works for the human, and not the human working for the format.

Jossie · May 24, 2019, 7:09am

Here you are talking about the Weber-Fechner-law. The use of gamma-corrected data is one aspect of this (see Poynton 2012, page 316).

Sure, but you were talking about pushing values beyond the white point. This only happens if the tone curve is above 1.

I would say that the claimed “data losses” are part of the topic.

How could this happen? De-mosaicing in its first instance is an interpolation and as such will not push data values far out – except if you include hot / cold pixels in the interpolation process, which should not happen if they are mapped during the dark subtraction.

Yes, you are absolutely correct. But this is what we call image processing. It is not necessarily leading to a loss of information. That is all I wanted to point out. If it is not reversible has to be discussed for each operation. One operation, that definitely is not exactly reversible is the gamma-correction, because at the bright end it gives the same output values for different input values. The question I see remaining is what one understands on “data integrity”? E.g. demosaicing should not through pixel values far outside the input data range.

Hermann-Josef

Thomas_Do · May 24, 2019, 9:46am

I find this discussion here very interesting and have learned a few new things .

But I also have the feeling that we are getting lost in the details. The maximum information a camera gives us is contained in the RAW file. This should always be archived so that if necessary, one can go back to the complete information it contains. Already with demosaicing, a considerable part of the original information is lost. Therefore, in my opinion, image processing should at best be done with a RAW editor and a non-destructive workflow as far as possible without changing the file format. Only when it is absolutely necessary to change the format, e.g. to continue working with another image editor, a format change should be done. Also, one should always keep working directly from the original data to the desired result. If one later wants to follow another line, in my opinion, one should start again with the original RAW data. An intermediate storage format as a basis for different edits, doesn’t make a lot of sense in my view.

floessie · May 24, 2019, 9:55am

Ever stitched a pano? A lot of work and different edits on the cooked result…

Just my 2¢,
Flössie

Thomas_Do · May 24, 2019, 10:04am

Yes sure. But this is more the exception than the rule.

Thats why I only do minimal and basic adjustments in darktable and export 16-bit TIF files for stiching.

Jossie · May 24, 2019, 10:25am

Why should that be? Which considerable information is lost? Of course you change the numbers (by flatfielding, dark subtraction, vignietting correction, …). After demosaicing you have a factor of 3 more pixels (if you use a Bayer matrix). So in some sense you gain information and most likely you do have changed the original R,G,B values by interpolation! I would not call this a loss of information. The issue is the question, if information available in the Raw-file is destroyed, e.g. by (numerically) saturating a pixel, which was not saturated before. I do not see this must be happening.

Hermann-Josef

Thomas_Do · May 24, 2019, 11:04am

There are different algorithms around for demosaicing and new ones might get developed in the future. Depending on the image at hand and of your goals in processing this particular image, some algorithms do better than others. But once applied, one can not reverse the demosaicing. There is even the possiblity to generate a higher resolution black & white image from the unmosaiced data. Starting from a standard image file this is no longer possible.

Jossie · May 24, 2019, 12:46pm

@Thomas_Do
All true. But by demosaicing no information is lost, on the contrary, information is added. The raw file by itself is useless. But in the processed file you can read off colours and brightness values.

Demosaicing is not the best example. Let me take flatfielding instead to demonstrate, what I understand by loosing information.

These graphs show pixel values along a line in the image. Top left is the raw, bottom left the flatfield.

In the raw image, the left peak is lower than the right peak. In the flatfielded image, the reverse is the case. The latter is the true information. So the raw image does not contain by itself the correct information.

If you do the processing in 8 bit arithmetic, you get the result at lower right, which has clipped pixels to the left. However, If you process the image in e.g 32bit floating and then scale this image to the range 0 … 255 you recover in 8 bit space, the correct relative intensity distribution and no data are lost:

In principle, one can replace “flatfielding” by any other image processing step. This was the only point I wanted to make.

Hermann-Josef

ggbutcher · May 24, 2019, 2:19pm

Dunno. I recently incorporated the librtprocess library into rawproc, because it has a rich suite of demosaic routines. In my tests of the new tool that exposes them, I was surprised to find one of the algorithms (I think it was ‘amaze’, still haven’t had the opportunity to confirm) resulted in min and max values significantly below and very significantly above the 0.0-1.0 limits I use for eventual scaling to display black/white. The test image had sensor saturation clipping, so that might have something to do with it. My knowledge of demosaic math extends to implementation of ‘half’, which isn’t real math but simply drag-and-drop of adjacent values. @heckflosse, the smartest demosaic person I know, may have insight…

All this talk of clipping is why I still revert to a manually-scooched spline curve for my scaling if a simple black/white point scaling doesn’t look good enough. I know if I leave the 0,0 and 255,255* points alone, my spline will converge on those two points and not drag values out of my target display range.

*My curve tool uses a 0-255 range that is scaled to 0.0-1.0 for application to the internal image

ggbutcher · May 24, 2019, 2:22pm

Actually, it’s more appropriate to consider it like this: measured information is supplemented with asserted information. The essential semantics of ‘demosaic’ don’t imply information loss, but the by-product of a particular algorithm might cause it…

Thomas_Do · May 24, 2019, 2:37pm

Having the data from a raw file and knowing the algorithms, one can calculate the output image. However, having the output image and knowing the algorithms does not allow to retrieve the original raw data.

And yes, I know that the raw data is supplemented with additional information to produce a better representation of “reality”. But in practise demosaicing means to reduce some of the (unwanted) spatial information. And if someone would developed a “better” algorithm for demosaicing, that could only be applied to the full information of the raw file and not to an allready processed image.

Jossie · May 24, 2019, 3:08pm

I still do not see why this should be so.

To my understanding the principle task of demosaicing a Bayer matrix is to take the G-pixels and interpolate them for the R and B pixels, take the R-pixels and interpolate the G and B pixels and finally take the B-pixels and interpolate the G and R pixels. So one ends up with three greyscale images for R, G, and B. Of course, interpolation also means smoothing, if this is what you mean by “lost spatial information”. Why does “unwanted” mean? Various algorithms could be used to deal with the interpolation (see e.g. Burger & Burge 2016, chapter 22). Without knowing the details, all other (helpful) features of demosaicing should be second order effects. Perhaps @Heckflosse could comment, if this is correct.

Now I agree, that this is really off-topic here.

Hermann-Josef

ggbutcher · May 24, 2019, 3:12pm

<OT>

Ah, but there’s a bit more going on between the original raw data and the output image besides demosaic. In a somewhat arbitrary order:

Black subtraction: some cameras require this to put black at 0. Inverse: addition.
White Balance: For all the talk about temperature/tint, this operation boils down to a per-channel multiplication. Inverse: multiply by 1/original_multiplier.
Demosaic: Before - each pixel location contains a measurement of one of the three primaries. After - each pixel location contains that measurement plus assertions of the other two primaries. Inverse: remove the two assertions, leaving the original measured primary.
Color Conversion: A set of matrix operations to drag the camera colors to a colorspace amenable to regarding on a display or printer (there may be an intermediate operation to convert to a working space). Inverse: backward application of the inverse of the matrices (this should not be considere a formal specification of the math… )
Tone Curve: whatever “lift” of the data required to p lease the eye. Inverse: depends on how the lift is done, but some are invertible.

I’m curious regarding what you’re considering to be "reduce some of the (unwanted) spatial information. I’m learning this stuff too, and I’m finding others’ perspectives to be quite informative…

I don’t completely understand them, but I don’t know of a demosaic algorithm that doesn’t start with the raw data. Are you considering the “pre-demosaic” operations such as black subtraction, white balance, and sometimes denoise? If so, any of those could be done after demosaic, but they work better if done before…

</OT>

heckflosse · May 24, 2019, 4:46pm

I need the raw file and I need to know, which of the algorithms caused this. Without this, it’s hard to check.

ggbutcher · May 25, 2019, 12:07am

I must be losing it. While I get some shift below 0 and above saturation, the positive shift isn’t the order of magnitude I recalled.

Opened the raw file, here are the min/max values:

channels:
rmin: 0.000000	rmax: 0.249985
gmin: 0.000000	gmax: 0.249985
bmin: 0.000000	bmax: 0.249985

Now, with amaze demosaic, they go to this:

channels:
rmin: -0.137083	rmax: 0.262437
gmin: -0.121172	gmax: 0.240829
bmin: -0.030073	bmax: 0.259702

Multiply these numbers by 65536 to get the original integer values.

The raw is here: https://glenn.pulpitrock.net/DSG_3111.NEF
(uploading to here stalled…)
License: CC BY-NC-ND

afre · May 25, 2019, 3:57am

I get overflow and negative values all of the time when I use PhotoFlow’s raw module and leave them intact when I save the image for further processing in G’MIC.

Values move out of the original range for at least 3 reasons: white balance, demosaicing and colour space. In fact, I could, and have, set them to minimize the amount of outliers I get. Naturally, that wouldn’t necessarily help me make a respectable product, unless I am going for something experimental.

Thomas_Do · May 25, 2019, 11:14am

Hello @ggbutcher ,
I was referring to Bayer filters. There, the information of the color channels is obtained at spatially different positions, although this is actually not desired.
As I already wrote, I find this discussion basically very interesting, but in this thread a bit too much. We should not hijack this thread for other topics. I just wanted to make clear that you should definitely archive the original RAW file, because e.g. the chosen method for demosaicing already creates irreversible changes. Therefore, in my opinion, intermediate formats are only useful to a limited extent.

ggbutcher · May 25, 2019, 2:36pm

Ah, makes sense.

And I agree that is a very pertinent point to this thread. For my workflow, anything other than the original raw file is for either regarding or an intermediate to another program for further work. rawproc supports this with an “Open Source” menu selection (yes, overloaded name, can’t think of anything better…) which, when you use it to select a JPEG, for instance, will actually open the original raw file and re-apply the operations that it took to make the JPEG. One can then select an operation and change it, add new operations, or delete operations, and then save back to the JPEG or to a new file of any format ('cept raw, don’t mess with that)