Adding 2 entirely new pixels between each adjacent pixel pair in RAW to TIFF conversion?

Why would you whine about somebody accidentally using bold? Are we being Petty or would Irascible be more apropos? Does not add any information. Sorry. :wink:

Not sure how I did it. I will try to not do it again so as not to further antagonize you.

No. 14 bits is not a native storage type.

It is simply a uint16_t array in C language. They are not packed or compressed in any way.

Strangely enough, you can simply extract the rawest possible data at warp speed by figuring out the native NEF resolution:

exiftool -ImageSize pf-2017.0620-269365.nef
Image Size : 7424x4924 << constant for my camera

Fopen the file
fseek to (file_size - 2 * $xres * $yres) [2434560 bytes in on this file]
fread the rest into a uint16[]
dump it to disk. Voila!

The fseek number always varies because the 3 JPG files are compressed and vary in size.

The ones I have analyzed all have the binary bitmap as the last chunk of the file!

There is no rawer data to be had.

You should look into how smart phones combine a burst of images. I recall one paper discussing demosaicing a stack of randomly offset images from handheld bursts which did something like what you are talking about.

If we take four photos with an exact one-pixel shift between each photo, in the x-direction or y-direction or both, we don’t need to demosaic. This is simple.

But if we have some random hand-held shift between photos, so we need a transformation to align them, how do we know the correct transformation? We look for corresponding features across the image set, and this needs an accuracy of one pixel or better. But we don’t yet know the colours of the pixels, so we don’t have that accuracy.

In practice, we might demosaic each image. Then use that data to find the alignment of the original mosaiced images. Distort the mosaiced images so they align, and now we can use the single-channel values from one image to partially complete the missing values in another image.

This seems like a lot of work, and I doubt the result would be significantly better than an ordinary demosaicing.

I think there would be an inherent advantage to doing everything possible with the rawest data, the Bayer data. Every time you hack a float result down to shoehorn it into a uint16, there is an irretrievable data loss.

RGB float with Alpha immediately bloats out to 128 bits which is not a native data size and so would have to be emulated (e-mutilated) in software. 64 bit quanta are computationally perfect for 64 bit machines.

You don’t need the full RGB colors to know the alignment and the rotation. The features in the data will map even in Bayer space.

Imagine printing both frames onto analog, dye sublimation transparencies and then moving one relative to the other to get the best visual alignment. There would be a point of translation and rotation at which the 2 frames would snap into sharpest focus.

Resolution augmentation might be a solution to this rotation problem or at least reduce the problem by a notch.

Instead of cutting each layer out as if with a scissors and layering them, use 16 bit transparency calculated at the pixel to meld them! 64 bit RGBA quanta!

It has to work and we have the computational resources to make it happen in a jiffy on 5120 CUDA cores or even on 64 X86 cores!

Fixing a rotational mismatch before merging frames would have to look much better than just translating to minimize.

Handheld Multi-frame Super-resolution

Abstract

Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to-noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.

No, they’re all used for chroma, else, where would you get green? All three channels contribute to luma, perceptually in different proportions.

What @heckflosse is presenting with PixelShift is probably the closest tool there currently is to your ends.

Edit: Oh, welcome to pixls.us!

This is exactly what I want!

I need to wear a second layer of tin foil. They seem to have read my mind.

But, we need it on a real sensor. If it can be done on a toy, 12 MPix camera sensor, imagine how it would look on a real 100 MPix sensor with vastly better color depth and dynamic range.

The first part of the super-resolution is the supering the resolution.

If 2/3 of a pixel can be pulled out of a hat, why not an entire pixel at a nearby point? It would just be more of the same interpolation.

At least 1 in between pixel would be a great feature. 2-3 extra might be worth doing.

They stated that the fixed the “small offsets”. The rotation is a much harder nut to crack.

Glen,

My point was that there are 4 electron wells which catch electrons freed by the Photoelectric effect after light passes through a filter. There are not just 3. The red and blue are used primarily for Chromaticity. Buy you are right, the are all used for both purposes.

I don’t shoot fish in a barrel, I rarely shot in a studio and I need a solution for The Wild with nothing more than a Monopod. Of the last 100 people you saw with cameras, now many had tripods? Most probably didn’t. This is for them. On 6th Street in Austin, the Police can confiscate a tripod as a trip hazard.

If you need this feature this badly, start writing the code!

1 Like

Thank you for your inestimable addition to the knowledge base here.

Could you contribute anything on super-resolution, interpolation, error diminution, or snark attenuation?

I am actually doing the derivation on a Bi-Quartic interpolation within a 36 pixel neighborhood. It’s pretty hairy. More later…

While @paperdigits actually made a valid point, you retort with sarcasm. Hardly seems appropriate. You could have replied with “unfortunately I don’t know how to code”, which would have been fine.

As for your suggestion on a new way of interpolation. I have read possibly two dozen papers on demosaicing raw Bayer data in various ingenious ways. Some even more exotic than others. My gut tells me that your suggestion is not really feasible, or that you’ve made a wrong assumption somewhere, but I can’t pinpoint an immediate flaw…
I’ll think about it some more.

2 Likes

Looking at the link provided by @Iain (thanks), I underestimated the complexity but also the quality of the results. For example, in that paper, transformations are found for patches, rather than a simple overall transformation. This is complex but does account for subject movement etc, so eliminates ghosting.

The results are very good.

You are more than welcome for the 33 days worth of time over the last four and a half years that I’ve spent here helping build this community.

1 Like

That is certainly not how you get the raw pixel data from any NEF I’m familiar with compressed NEFs, which are pretty common. The data is stored in an encoded compressed file format, not a simple array of RGBG data. Otherwise code like this would never have needed to be written.

Really? Could you share such a file maybe? Is that straight out of your camera? I now know this is actually possible :slight_smile:

The libraries I mentioned above extract the raw values for each sensor pixel (sensel) and are able to output these raw values as is. Depending on the type of sensor, you get 10, 12, 14 or 16 bit integer values. Raw processing software like RawTherapee, darktable, ART, RawProc, PhotoFlow and others take this data and convert it to floating point immediately. Even before demosaicing. I am not sure how you can get any more raw than that. No information is lost whatsoever, which is what you seem to imply.

There may be a difference in terminology here, but I find your idea hard to follow. A “48 bit TIFF pixel” makes no sense to me whatsoever.
If I get your meaning right, you want to take this idea: (image taken from here)


But instead of having to interpolate a single missing R, G or B pixel in between, you want to supersample the image and have two pixels in between each known pixel?

Check your settings and you may find 12 and 14 bit options as well as uncompressed, lossless compressed and compressed. You may have compression turned on.

Compression greatly reduces battery life and increases write time. And with a 128GB memory card you can shoot through many battery packs. And, you have 2 memory ports. You can’t put in a bigger battery. And, you will find that 7zip with LZMA2 run on a real processor provides more efficient compression and can be done before archiving!

Here is the C Code:
// =====================================================================
// Read NEF’s Bayer data (including junk) and stuff into matrix provided
int read_nef_bayer_rgjb(char *fname, int bayer_offset, int xres, int yres,
uint16_t **nrbmat, EV_TIME_STR *tsa, int debug)
{
int fsr=0; // FSeekRtn value, s/b=0. FSize from OS in B
long sread=0; // SHORTs read from raw file
FILE *stream; // FILE pointer to input file
int brow; // Bayer_ROW
if(tsa) time_event(E_READ_NEF_BAYER_RGJB, tsa, E_TIME_EVENT, 1, debug);

if((stream = fopen(fname, "rb")) == NULL)  {  // READ BINARY
	printf("RNBM: Cannot <open Input file \"%s\"!\n", fname);  exit(-3);  }
if(bayer_offset  &&  (fsr=fseek(stream, (long) bayer_offset, SEEK_SET)))  {
	printf("RNBM: FSeek -> %d, terminating\n\n", fsr);  exit(-12);  }

for(brow=0; brow < yres; brow++)  {
	// Read from input file 1 row of xres uint16 at a time
	// The total number of elements successfully read are returned as a 
	// size_t object, which is an integral data type.
	sread += fread(nrbmat[brow], sizeof(uint16_t), xres, stream);  // -> XRes!
}
fclose(stream);
printf("RNBM: Read %d USHORTs from BAYER file %s\n", sread, fname); 
return(sread);  // Return number UINT16s read, NOT BYTES!

} // End Read_Nef_Bayer_Rgjb().

The IEEE 754 spec for float (32) includes 24 significand bits so you are correct that it can absorb all 16 bits of a uint16_t with perfect fidelity.

The problem arise after the floating point calcs are done and a uint48_t TIFF pixel (actually a uint16_t[3]) gets written to disk. How do you smash all 7.22 float significant figures into a 16 bit storage variable which can hold no number larger than 65535? You truncate or round which is irreversible.

This is why working with the Pristine Bayer data has inherent advantages over working with the TIFF data which has already been processed once.

Here is a trivia question for you:
the float spec allows for only 1 sign bit. How do you get a negative sign on the exponent if the only sign bit has already been used for the number itself? Ex: -123E-04 ? There are 2 negative signs! Where is the second one stored?

If you look at the PPM file Dcraw creates, you will find a 4 line header that looks a lot like this:
head -n4 pf-269361.ppm =>
P6
7360
4912
65535
After which you will find 48 bit TIFF pixels encoded as uint16[3] RGB triplets.

Do a size analysis:
lsr -sa pf-269361.ppm => 216913939 < Total disk file size
head -n4 pf-269361.ppm | wc => 4 4 19 < 19 bytes of header
216913939 - 19 = 216913920 < bitmap size
mult 216913920 /7360 /4912 == 6 < Divide bitmap size by XY Res…
Looks like 6 bytes per pixel which === 48 bits!

The Dcraw TIFF file is quite similar except that its bloated header is 1376 bytes
#define DCRAW_TIF_HEADER_SIZE 1376 // Tif overhead above raw, rgb uint16s

And, you can peek into Dave’s TIFF header to get the XY res at these offsets:
#define HDR_GRAY_XRES_IDX 16 // uint_t_ara[15] → GRAY_Image_XRES
#define HDR_GRAY_YRES_IDX 22 // uint_t_ara[21] → GRAY_Image_YRES

Yes! Think of 2 completely new pixels between pixels (0, 0) and (0, 1) at (0, ⅓) and (0, ⅔). Of course you would scale your output matrix by a factor or 3 so the new pixels would appear at (0, 1) and (0, 2) between the old pixels remapped to (0, 0) and (0, 3).

This would provide a 3x3 matrix of pixels where each original pixel was. When aligning multiple frames off by a non-integer number of pixels, you would have 9 points to select from to find the best fit.

If you can conjure up Red and Green on a Blue pixel at (1, 1), how hard would it be to apply a similar process to prestidigitate all 3 values at (1, ⅓) and (1, ⅔)?

For a single frame, this entire Quixotic exercise would be no better than an ImageMagick resize. But, when merging many layers via transparency, the data from all of the frames could be added with accurate alignment.

This should average digital errors toward zero and provide super-resolution.

I have a Nikon D5100. I have compression turned on. I can’t turn it off. Ask Nikon why…

I can’t choose the bit depth, either…

Those settings are only present on higher end Nikon cameras

And exactly how did you while away the other 1611 days? :slight_smile:

I was referring to your “go write it yourself” comment which does not any ideas or information.

I do thank all of the generous geniuses who write the open source code and keep these forums open!

There are ways of saving to TIFF that bypass any processing. dcraw -D…

Okay, college-professor mode kicking in here… First, back to your original question:

Well, none that anyone has offered to date in the thread, apparently. So then, a suggestion:

This is really a challenge to your initiative, no snark involved. Reading your posts, you seem to have a fundamental grasp of digital data representation if not a little misguided in places, and a clear vision to your end. So, my assignment to you would be to code up a reference implementation. To get you started, I’ll offer a program I wrote a little while ago that uses the libraw library to open raws, the, save them directly to TIFF with no image data modification:

In that code, you’ll have the uint_16 image data straight from the file; adding the code to skive off the channels and maneuver them to your heart’s content should be fairly easy to do with your current level of comprehension. Clone the repository (learning a bit of git, if you haven’t already used it), and code away…

I’m not a demosaic person (@heckflosse is, by the way), but I’ll be interested to review your results.

1 Like

I stand corrected. I know about these options, but I always thought the data was still encoded somehow. Now see that uncompressed really means a dump of raw integers at the end of the file. I learned something.

Yes, but what is your point? Maybe I got confused again about the way you wrote your initial post, but nobody suggested to work with tiffs except you… I fully agree that for any sort of processing the best-case scenario is to work directly from raw (converted to float or double for processing).

imo this thread’s title and first post are not clear that this is actually what you’re looking for…

Pixel-shift and the article @Iain posted are the currently available technical solutions to what you’re asking. Otherwise, I suggest you browse through some academic papers on Google Scholar regarding this topic: Google Scholar