Adding 2 entirely new pixels between each adjacent pixel pair in RAW to TIFF conversion?

I think there would be an inherent advantage to doing everything possible with the rawest data, the Bayer data. Every time you hack a float result down to shoehorn it into a uint16, there is an irretrievable data loss.

RGB float with Alpha immediately bloats out to 128 bits which is not a native data size and so would have to be emulated (e-mutilated) in software. 64 bit quanta are computationally perfect for 64 bit machines.

You don’t need the full RGB colors to know the alignment and the rotation. The features in the data will map even in Bayer space.

Imagine printing both frames onto analog, dye sublimation transparencies and then moving one relative to the other to get the best visual alignment. There would be a point of translation and rotation at which the 2 frames would snap into sharpest focus.

Resolution augmentation might be a solution to this rotation problem or at least reduce the problem by a notch.

Instead of cutting each layer out as if with a scissors and layering them, use 16 bit transparency calculated at the pixel to meld them! 64 bit RGBA quanta!

It has to work and we have the computational resources to make it happen in a jiffy on 5120 CUDA cores or even on 64 X86 cores!

Fixing a rotational mismatch before merging frames would have to look much better than just translating to minimize.

Handheld Multi-frame Super-resolution

Abstract

Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to-noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.

No, they’re all used for chroma, else, where would you get green? All three channels contribute to luma, perceptually in different proportions.

What @heckflosse is presenting with PixelShift is probably the closest tool there currently is to your ends.

Edit: Oh, welcome to pixls.us!

This is exactly what I want!

I need to wear a second layer of tin foil. They seem to have read my mind.

But, we need it on a real sensor. If it can be done on a toy, 12 MPix camera sensor, imagine how it would look on a real 100 MPix sensor with vastly better color depth and dynamic range.

The first part of the super-resolution is the supering the resolution.

If 2/3 of a pixel can be pulled out of a hat, why not an entire pixel at a nearby point? It would just be more of the same interpolation.

At least 1 in between pixel would be a great feature. 2-3 extra might be worth doing.

They stated that the fixed the “small offsets”. The rotation is a much harder nut to crack.

Glen,

My point was that there are 4 electron wells which catch electrons freed by the Photoelectric effect after light passes through a filter. There are not just 3. The red and blue are used primarily for Chromaticity. Buy you are right, the are all used for both purposes.

I don’t shoot fish in a barrel, I rarely shot in a studio and I need a solution for The Wild with nothing more than a Monopod. Of the last 100 people you saw with cameras, now many had tripods? Most probably didn’t. This is for them. On 6th Street in Austin, the Police can confiscate a tripod as a trip hazard.

If you need this feature this badly, start writing the code!

1 Like

Thank you for your inestimable addition to the knowledge base here.

Could you contribute anything on super-resolution, interpolation, error diminution, or snark attenuation?

I am actually doing the derivation on a Bi-Quartic interpolation within a 36 pixel neighborhood. It’s pretty hairy. More later…

While @paperdigits actually made a valid point, you retort with sarcasm. Hardly seems appropriate. You could have replied with “unfortunately I don’t know how to code”, which would have been fine.

As for your suggestion on a new way of interpolation. I have read possibly two dozen papers on demosaicing raw Bayer data in various ingenious ways. Some even more exotic than others. My gut tells me that your suggestion is not really feasible, or that you’ve made a wrong assumption somewhere, but I can’t pinpoint an immediate flaw…
I’ll think about it some more.

2 Likes

Looking at the link provided by @Iain (thanks), I underestimated the complexity but also the quality of the results. For example, in that paper, transformations are found for patches, rather than a simple overall transformation. This is complex but does account for subject movement etc, so eliminates ghosting.

The results are very good.

You are more than welcome for the 33 days worth of time over the last four and a half years that I’ve spent here helping build this community.

1 Like

That is certainly not how you get the raw pixel data from any NEF I’m familiar with compressed NEFs, which are pretty common. The data is stored in an encoded compressed file format, not a simple array of RGBG data. Otherwise code like this would never have needed to be written.

Really? Could you share such a file maybe? Is that straight out of your camera? I now know this is actually possible :slight_smile:

The libraries I mentioned above extract the raw values for each sensor pixel (sensel) and are able to output these raw values as is. Depending on the type of sensor, you get 10, 12, 14 or 16 bit integer values. Raw processing software like RawTherapee, darktable, ART, RawProc, PhotoFlow and others take this data and convert it to floating point immediately. Even before demosaicing. I am not sure how you can get any more raw than that. No information is lost whatsoever, which is what you seem to imply.

There may be a difference in terminology here, but I find your idea hard to follow. A “48 bit TIFF pixel” makes no sense to me whatsoever.
If I get your meaning right, you want to take this idea: (image taken from here)


But instead of having to interpolate a single missing R, G or B pixel in between, you want to supersample the image and have two pixels in between each known pixel?

Check your settings and you may find 12 and 14 bit options as well as uncompressed, lossless compressed and compressed. You may have compression turned on.

Compression greatly reduces battery life and increases write time. And with a 128GB memory card you can shoot through many battery packs. And, you have 2 memory ports. You can’t put in a bigger battery. And, you will find that 7zip with LZMA2 run on a real processor provides more efficient compression and can be done before archiving!

Here is the C Code:
// =====================================================================
// Read NEF’s Bayer data (including junk) and stuff into matrix provided
int read_nef_bayer_rgjb(char *fname, int bayer_offset, int xres, int yres,
uint16_t **nrbmat, EV_TIME_STR *tsa, int debug)
{
int fsr=0; // FSeekRtn value, s/b=0. FSize from OS in B
long sread=0; // SHORTs read from raw file
FILE *stream; // FILE pointer to input file
int brow; // Bayer_ROW
if(tsa) time_event(E_READ_NEF_BAYER_RGJB, tsa, E_TIME_EVENT, 1, debug);

if((stream = fopen(fname, "rb")) == NULL)  {  // READ BINARY
	printf("RNBM: Cannot <open Input file \"%s\"!\n", fname);  exit(-3);  }
if(bayer_offset  &&  (fsr=fseek(stream, (long) bayer_offset, SEEK_SET)))  {
	printf("RNBM: FSeek -> %d, terminating\n\n", fsr);  exit(-12);  }

for(brow=0; brow < yres; brow++)  {
	// Read from input file 1 row of xres uint16 at a time
	// The total number of elements successfully read are returned as a 
	// size_t object, which is an integral data type.
	sread += fread(nrbmat[brow], sizeof(uint16_t), xres, stream);  // -> XRes!
}
fclose(stream);
printf("RNBM: Read %d USHORTs from BAYER file %s\n", sread, fname); 
return(sread);  // Return number UINT16s read, NOT BYTES!

} // End Read_Nef_Bayer_Rgjb().

The IEEE 754 spec for float (32) includes 24 significand bits so you are correct that it can absorb all 16 bits of a uint16_t with perfect fidelity.

The problem arise after the floating point calcs are done and a uint48_t TIFF pixel (actually a uint16_t[3]) gets written to disk. How do you smash all 7.22 float significant figures into a 16 bit storage variable which can hold no number larger than 65535? You truncate or round which is irreversible.

This is why working with the Pristine Bayer data has inherent advantages over working with the TIFF data which has already been processed once.

Here is a trivia question for you:
the float spec allows for only 1 sign bit. How do you get a negative sign on the exponent if the only sign bit has already been used for the number itself? Ex: -123E-04 ? There are 2 negative signs! Where is the second one stored?

If you look at the PPM file Dcraw creates, you will find a 4 line header that looks a lot like this:
head -n4 pf-269361.ppm =>
P6
7360
4912
65535
After which you will find 48 bit TIFF pixels encoded as uint16[3] RGB triplets.

Do a size analysis:
lsr -sa pf-269361.ppm => 216913939 < Total disk file size
head -n4 pf-269361.ppm | wc => 4 4 19 < 19 bytes of header
216913939 - 19 = 216913920 < bitmap size
mult 216913920 /7360 /4912 == 6 < Divide bitmap size by XY Res…
Looks like 6 bytes per pixel which === 48 bits!

The Dcraw TIFF file is quite similar except that its bloated header is 1376 bytes
#define DCRAW_TIF_HEADER_SIZE 1376 // Tif overhead above raw, rgb uint16s

And, you can peek into Dave’s TIFF header to get the XY res at these offsets:
#define HDR_GRAY_XRES_IDX 16 // uint_t_ara[15] → GRAY_Image_XRES
#define HDR_GRAY_YRES_IDX 22 // uint_t_ara[21] → GRAY_Image_YRES

Yes! Think of 2 completely new pixels between pixels (0, 0) and (0, 1) at (0, ⅓) and (0, ⅔). Of course you would scale your output matrix by a factor or 3 so the new pixels would appear at (0, 1) and (0, 2) between the old pixels remapped to (0, 0) and (0, 3).

This would provide a 3x3 matrix of pixels where each original pixel was. When aligning multiple frames off by a non-integer number of pixels, you would have 9 points to select from to find the best fit.

If you can conjure up Red and Green on a Blue pixel at (1, 1), how hard would it be to apply a similar process to prestidigitate all 3 values at (1, ⅓) and (1, ⅔)?

For a single frame, this entire Quixotic exercise would be no better than an ImageMagick resize. But, when merging many layers via transparency, the data from all of the frames could be added with accurate alignment.

This should average digital errors toward zero and provide super-resolution.

I have a Nikon D5100. I have compression turned on. I can’t turn it off. Ask Nikon why…

I can’t choose the bit depth, either…

Those settings are only present on higher end Nikon cameras

And exactly how did you while away the other 1611 days? :slight_smile:

I was referring to your “go write it yourself” comment which does not any ideas or information.

I do thank all of the generous geniuses who write the open source code and keep these forums open!

There are ways of saving to TIFF that bypass any processing. dcraw -D…

Okay, college-professor mode kicking in here… First, back to your original question:

Well, none that anyone has offered to date in the thread, apparently. So then, a suggestion:

This is really a challenge to your initiative, no snark involved. Reading your posts, you seem to have a fundamental grasp of digital data representation if not a little misguided in places, and a clear vision to your end. So, my assignment to you would be to code up a reference implementation. To get you started, I’ll offer a program I wrote a little while ago that uses the libraw library to open raws, the, save them directly to TIFF with no image data modification:

In that code, you’ll have the uint_16 image data straight from the file; adding the code to skive off the channels and maneuver them to your heart’s content should be fairly easy to do with your current level of comprehension. Clone the repository (learning a bit of git, if you haven’t already used it), and code away…

I’m not a demosaic person (@heckflosse is, by the way), but I’ll be interested to review your results.

1 Like

I stand corrected. I know about these options, but I always thought the data was still encoded somehow. Now see that uncompressed really means a dump of raw integers at the end of the file. I learned something.

Yes, but what is your point? Maybe I got confused again about the way you wrote your initial post, but nobody suggested to work with tiffs except you… I fully agree that for any sort of processing the best-case scenario is to work directly from raw (converted to float or double for processing).

imo this thread’s title and first post are not clear that this is actually what you’re looking for…

Pixel-shift and the article @Iain posted are the currently available technical solutions to what you’re asking. Otherwise, I suggest you browse through some academic papers on Google Scholar regarding this topic: Google Scholar

This is not a “new way of interpolation”, just more of the same old way.
I have a “gut feeling” that you could do this 3x3 Bayer upscaling, take the TIFF, reduce it by the same scaling factor in Photoshop or IM and almost exactly reverse the process.

The advantage would be in alignment accuracy. With 9 frames merged via transparency, you would have a good shot at filling in that made-up inner space with real data!

I am working on performing this experiment with the TIFF data but there can be little doubt that upscaling the Bayer data would yield better results due to less processing.

Another advantage of using the Bayer vs the TIFF is that you are processing ~73 MB of data vs 217 for an uncompressed TIFF.

Have you read through the technical paper @Iain posted? How does that method differ from what you want to achieve?
The way I see it: if you combine multiple shots that are exactly the same (i.e. taken with a tripod), you sample the scene at exactly the same location and you have no useful additional information to ‘fill in the gaps’. If you combine multiple shots that are shifted only a single pixel you have exactly the idea of pixel-shift camera’s, and you can use available algorithms (like in RawTherapee) to obtain images with high fidelity. If you combine multiple random shots, align them and then fill in the gaps, you do what the Google researchers did.

Pixel-shift is not a possible solution outside of a studio. It requires exact alignment between frames. As stated, I need a Monopod-only (hand held?) solution.

The awesome Google Research results are useful if I want to upgrade to a Google Pixel go get them.

From the Google paper referenced by Lain,
"The input must contain multiple aliased images, sampled at different subpixel offsets. "

They seem to be using the “subpixel offsets” inherent in hand held shots in much the same way the sensor shift trick works; you try to position Red, Green, Blue and Jade pixels from various frames over each “molecule” of the subject to obviate the need for demosaicing. If you have more than 1 sample per channel per molecule, average them to increase the signal to noise ratio. Brilliant!

From the Google paper referenced by Lain,
“we refine the block matching alignment vectors by three iterations of Lucas-Kanade [1981] optical flow image warping.”
This is a CGI fix to warp the image to force a fit.
Google is attacking the much more general problem of aligning pictures taken out the window of a moving bus. Useful for perspective changes which I don’t see. Vastly more complicated than my situation and akin to General Relativity.

This is not necessary in my case when aligning 2 frames at a time taken from an almost perfectly stationary monopod a tenth of a second apart. I am working in a non-accelerating inertial reference frame akin to the Relatively simpler Special Relativity. Other than leaf wobble, my scenes are practically stationary. I shoot mountains, not moving traffic.

From the Google paper referenced by Lain,
“Mobile devices provide precise information about rotational movement measured by a gyroscope, which we use in our analysis.”
I don’t know if I had my Nikroscope® gyroscope energized.
This blows this approach right out out of the water!

And, I didn’t notice a link to the actual code. But, they supplied all of the formula and the process.
How long will it take you to code it? Forever? Me too!

The assertion that this is a “currently available technical solution” does not appear to be supported by the available evidence.

I wrote that years ago to directly dump RGBJ dfrom a NEF:

 USAGE: nef2rgjb NEF [-DH] [-I] [-O output_dir] [-M mult] 
[-P] [-S]  [-T Right_Row_Trim_Pixels] [-U] [-V] <enter>    

And then a program to extract individual channels as PGM files including Green, Jade and (Green+Jade)/2:

USAGE: bay2rgb Bayer.rgbj.raw [-D] [-A] [-B basename] 
[-G 1*|2|3] [-I] [-M Mult] -x X_Res [junk_spec] [-S] 
[-T trim_top_bytes]  [-V] <enter>

Is 64 seconds very fast to directly extract RGBJ uint16s from a 36.6 MPix .NEF?
It could be somewhat faster if I nuked Firefox and some other processes.

bb_a  Run 2020-05-09 18:10:48
N2P: FSize=77.398016MB, bay_off=4286464, bmsize=73111552
Wrote 73.111571MB to PPM file pf-2017.0620-269369.pgm

TE: Report: Accum time for 5 events=63.56 ms  Run 2020-05-09 18:10:48
Time=  0.011 sec = 17.008%, READ_BAYER_RGBJ      , hits=1
Time=  0.052 sec = 82.046%, SYS_WRITE_TO_DISK    , hits=1
Time=  0.065 ms  =  0.103%, ALIGN_ALLOC_MATRIX   , hits=1
Time=  0.170 ms  =  0.267%, INIT_MEM_MAIA_STR    , hits=1
Time=  0.190 ms  =  0.299%, FREE_BB_MEM_ALLOC    , hits=1

identify  pf-2017.0620-269369.pgm
pf-2017.0620-269369.pgm PGM 7424x4924 7424x4924+0+0 16-bit 
    Grayscale Gray 69.7246MiB 0.090u 0:00.089

It’s very dark but it has not been engammarated! And you can see the speckling effect if you zoom in, characteristic of viewing Bayer data directly.

As predicted, the Bayer bitmap offset can be calculated directly from the EXIF XY res:

int xres= D800E_BAYER_X_RES ;  // 7424 Nef_Raw_XRes with 46 JUNK uints on end
int yres= D800E_BAYER_Y_RES ;  // 4924         YRes
fsize = file_size(neffn);
bayer_offset = fsize - 2 * xres * yres;
mbyte = 2 * D800E_BAYER_X_RES * D800E_BAYER_Y_RES;  // Bitmap size

It takes ImageMagick 263 ms to convert the PGM to TIF

timeit magick pf-2017.0620-269369.pgm  pf-2017.0620-269369.tif
TI: Elapsed time = 263.148 ms 
ImageMagick 7.0.8-20 Q16 x86_64 2018-12-25

Dcraw nukes the 46 Junk shorts on the end of each row (ignoring the EXIF data). Dave’s row length agrees with my channel split lengths. Good on Mr. Coffey!

#define D800E_BAYER_RIGHT_FUNKY_UINT16 46  // Garbage on R side of Bayer GRBJ

timeit  dcraw -v  -4 -D   pf-2017.0620-269369.nef 
Loading Nikon D800E image from pf-2017.0620-269369.nef ...
Building histograms...
Writing data to pf-2017.0620-269369.pgm ...
TI: Elapsed time = 444.447 ms 

identify  pf-2017.0620-269369.pgm
pf-2017.0620-269369.pgm PGM 7378x4924 7378x4924+0+0 16-bit 
    Grayscale Gray 69.2926MiB 0.060u 0:00.059

And, it can be split into 4 individual channels quite easily, again with purely raw data:

nef2rgjb_d pf-2017.0620-269369.nef -d -P
Run 2020-05-09 16:34:38
PAV: -P -> Making RGJB Portable_Gray_Map PGMs
PAV: NEF Size=77.398016MB -> Bayer at 4286464, FQP= pf-2017.0620-269369.nef 
PAV: Final_Bayer_XYRes=(7378,4924), chan_xy_res=(3689, 2462)
R2PGM: Wrote 19 Hdr_B, 18164655 tot_B to pf-2017.0620-269369.red.3689x2462.pgm 
R2PGM: Wrote 19 Hdr_B, 18164655 tot_B to pf-2017.0620-269369.gre.3689x2462.pgm 
R2PGM: Wrote 19 Hdr_B, 18164655 tot_B to pf-2017.0620-269369.jad.3689x2462.pgm 
R2PGM: Wrote 19 Hdr_B, 18164655 tot_B to pf-2017.0620-269369.blu.3689x2462.pgm 
Elapsed Nef2RGJB time 157.395 mSec

I am not going to wait for the demosaic folks to implement the new pixels. It appears to be far more difficult than I had hoped. I can do it entirely in float now on the rawest data but without the AMaZE process.

It would be interesting to compare the results with and without demosaicing.

In the Wronski paper referenced by @Iain, the gyroscope data was used for analysis of general camera movement, not to actually generate the results.

A simplified version of the paper’s method could be:

  1. Take a number of handheld photos. (Would it work with a monopod? I don’t know.)

  2. Designate one photo as the master.

  3. From each photo, extract all four raw channels.

  4. Enlarge all the extracted channels by two, and shift by one pixel as necessary.

  5. Using the master, for each other photo:
    5a. For each colour channel:
    5a2. Find four points for alignment, and do a perspective alignment so the “other” photo matches the “master”.

  6. For each colour channel:
    6a. Average the results for that channel.

  7. Now we have one full-size image for each colour channel. Combine them into a colour image.

Doing this with ImageMagick would be fairly simple. All the parts for this are shown on my pages http://im.snibgo.com. They just need to be glued together. Of course, it wouldn’t be fast.

The alignments should be the same for each channel.

The alignments could be done with Hugin.

EDIT to add: the more photos, the better. Step (4) could enlarge by a larger factor (with increased shift), but larger factors would need more photos.

We could reduce movement ghosting by modifying step (6a). Corresponding pixels in the image set should be roughly the same. Any that aren’t should be ignored for the averaging (ie remove outliers). Or, instead of averaging, take the median.

EDIT2: If we take only one photo, we don’t need steps (5) and (6), and the algorithm simplifies to a primitive demosaicing, as shown in Demosaicing.

1 Like