Adding 2 entirely new pixels between each adjacent pixel pair in RAW to TIFF conversion?

In a Nikon camera raw, each Red/Green/Blue/Jade (2nd green) pixel has only 1 of the 4 colors as a 16 bit int. The missing 2 colors to make a 48 bit TIFF pixel are interpolated from the neighboring values with methods such as AMaZE, LMMSE or IGV.

If 2 of the 3 channel values can be conjured up at a given XY position, it should be possible to make up all 3 channels at some nearby point using similar interpolation.

I would like to introduce 2 brand new points between every pair of adjacent points. This would nearly triple both the X and the Y resolutions resulting in almost 9 times as many pixels.

“You would be better off using Bi-Cubic interpolation in Photoshop/ImageMagick/Gimp/… to make your picture seem larger”.
Would it not be better to use the rawest raw data once rather then to reprocess the TIFF which has already been demosaiced and rounded to uint16 quanta? The calculations could be performed in float or even double just once and then rounded to uint16 saving a secondary processing and another round off error. It would not look like a Xerox of a Xerox.

“Your picture won’t be any sharper. There is only so much sensor data to work with. You can’t cheat physics!”
With just 1 image, that is correct. But with 2 or more images from a burst taken by hand or with a Monopod, you now have 9 times the number of points at which to align them. This should greatly help the registration alignment process.

Is there already a tool to do this?

It has one of the three colors as a 14 bit int.

Use a PixelShift Camery, Only 4 shots are needed to avoid demosaic…

It has one of the three colors as a 14 bit int.

There are 2 greens, each used for Luminosity whereas the Red and Blue are used for Chromaticity. The 2 greens are physically different photoreceptors and have different, completely separate, but usually close values.

There is no such thing as a uint14 in C or on CPUs I am familiar with, so they cram the 14 bits into a uint16. You are right that the A/D converter (on my camera at least) has settings for 12 or 14 bit RAWs.

Do the math for image size:
RAW res = 7424 x 4924, 2 bytes/pixel

m 7424 4924 2 => 73,111,552 B
lsr -s pf-2017.0620-269365.nef => 75,546,112 B

The excess 2,434,560 are the multiple JPG images, the EXIF and a bunch of Nikon data.

Use a PixelShift Camera

Does that work hand held in the field? Even on a Monopod? No. Frames can be dozens of frames off from their neighboring frames.

That is only for studio work where the camera is welded in place for stability. Shooting fish in a barrel approach? :slight_smile:

For shots taken in the wild under less than perfect conditions, there is ~always a frame to frame alignment problem not to mention a rotation.

Photoshop can show you which sections they took from each picture to make the mosaic, each on a separate layer. It looks like a drunken woodsman chopped up paper prints with a dull axe. Bizarre stitching algorithm.

And then they fudge the mismatches by feathering the edges to blur the discontinuities.

Worse yet, they use trapezoidal scrunch to smash the pictures into a very poor alignment which can often bee seen to be dozens of pixels worse off than the originals.

I would like to use sub-pixel alignment which is why I need higher resolution.

After aligning/rotating and merging, resizing in Photoshop from 9x exaggerated size to perhaps 2x or even 4x original size, the composite could be much sharper than any single, original frame.

Am I smoking rope?

Please don’t cry in bold…

Really?, I would expect 14 bits/pixel

@Patrick Please don’t get me wrong. I will answer more detailed tomorrow. Just getting tired as it’s quite late here in Germany already…

Why would you whine about somebody accidentally using bold? Are we being Petty or would Irascible be more apropos? Does not add any information. Sorry. :wink:

Not sure how I did it. I will try to not do it again so as not to further antagonize you.

No. 14 bits is not a native storage type.

It is simply a uint16_t array in C language. They are not packed or compressed in any way.

Strangely enough, you can simply extract the rawest possible data at warp speed by figuring out the native NEF resolution:

exiftool -ImageSize pf-2017.0620-269365.nef
Image Size : 7424x4924 << constant for my camera

Fopen the file
fseek to (file_size - 2 * $xres * $yres) [2434560 bytes in on this file]
fread the rest into a uint16[]
dump it to disk. Voila!

The fseek number always varies because the 3 JPG files are compressed and vary in size.

The ones I have analyzed all have the binary bitmap as the last chunk of the file!

There is no rawer data to be had.

You should look into how smart phones combine a burst of images. I recall one paper discussing demosaicing a stack of randomly offset images from handheld bursts which did something like what you are talking about.

If we take four photos with an exact one-pixel shift between each photo, in the x-direction or y-direction or both, we don’t need to demosaic. This is simple.

But if we have some random hand-held shift between photos, so we need a transformation to align them, how do we know the correct transformation? We look for corresponding features across the image set, and this needs an accuracy of one pixel or better. But we don’t yet know the colours of the pixels, so we don’t have that accuracy.

In practice, we might demosaic each image. Then use that data to find the alignment of the original mosaiced images. Distort the mosaiced images so they align, and now we can use the single-channel values from one image to partially complete the missing values in another image.

This seems like a lot of work, and I doubt the result would be significantly better than an ordinary demosaicing.

I think there would be an inherent advantage to doing everything possible with the rawest data, the Bayer data. Every time you hack a float result down to shoehorn it into a uint16, there is an irretrievable data loss.

RGB float with Alpha immediately bloats out to 128 bits which is not a native data size and so would have to be emulated (e-mutilated) in software. 64 bit quanta are computationally perfect for 64 bit machines.

You don’t need the full RGB colors to know the alignment and the rotation. The features in the data will map even in Bayer space.

Imagine printing both frames onto analog, dye sublimation transparencies and then moving one relative to the other to get the best visual alignment. There would be a point of translation and rotation at which the 2 frames would snap into sharpest focus.

Resolution augmentation might be a solution to this rotation problem or at least reduce the problem by a notch.

Instead of cutting each layer out as if with a scissors and layering them, use 16 bit transparency calculated at the pixel to meld them! 64 bit RGBA quanta!

It has to work and we have the computational resources to make it happen in a jiffy on 5120 CUDA cores or even on 64 X86 cores!

Fixing a rotational mismatch before merging frames would have to look much better than just translating to minimize.

Handheld Multi-frame Super-resolution


Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to-noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.

No, they’re all used for chroma, else, where would you get green? All three channels contribute to luma, perceptually in different proportions.

What @heckflosse is presenting with PixelShift is probably the closest tool there currently is to your ends.

Edit: Oh, welcome to!

This is exactly what I want!

I need to wear a second layer of tin foil. They seem to have read my mind.

But, we need it on a real sensor. If it can be done on a toy, 12 MPix camera sensor, imagine how it would look on a real 100 MPix sensor with vastly better color depth and dynamic range.

The first part of the super-resolution is the supering the resolution.

If 2/3 of a pixel can be pulled out of a hat, why not an entire pixel at a nearby point? It would just be more of the same interpolation.

At least 1 in between pixel would be a great feature. 2-3 extra might be worth doing.

They stated that the fixed the “small offsets”. The rotation is a much harder nut to crack.


My point was that there are 4 electron wells which catch electrons freed by the Photoelectric effect after light passes through a filter. There are not just 3. The red and blue are used primarily for Chromaticity. Buy you are right, the are all used for both purposes.

I don’t shoot fish in a barrel, I rarely shot in a studio and I need a solution for The Wild with nothing more than a Monopod. Of the last 100 people you saw with cameras, now many had tripods? Most probably didn’t. This is for them. On 6th Street in Austin, the Police can confiscate a tripod as a trip hazard.

If you need this feature this badly, start writing the code!

1 Like

Thank you for your inestimable addition to the knowledge base here.

Could you contribute anything on super-resolution, interpolation, error diminution, or snark attenuation?

I am actually doing the derivation on a Bi-Quartic interpolation within a 36 pixel neighborhood. It’s pretty hairy. More later…

While @paperdigits actually made a valid point, you retort with sarcasm. Hardly seems appropriate. You could have replied with “unfortunately I don’t know how to code”, which would have been fine.

As for your suggestion on a new way of interpolation. I have read possibly two dozen papers on demosaicing raw Bayer data in various ingenious ways. Some even more exotic than others. My gut tells me that your suggestion is not really feasible, or that you’ve made a wrong assumption somewhere, but I can’t pinpoint an immediate flaw…
I’ll think about it some more.


Looking at the link provided by @Iain (thanks), I underestimated the complexity but also the quality of the results. For example, in that paper, transformations are found for patches, rather than a simple overall transformation. This is complex but does account for subject movement etc, so eliminates ghosting.

The results are very good.

You are more than welcome for the 33 days worth of time over the last four and a half years that I’ve spent here helping build this community.

1 Like