Working with afre_cleantext filter and G'MIC plugin

Your pic is factually 8-bit and smaller in size, while in scanned book pages processing its common to double pic size before any cleanup. If you can post processed picture in TIFF without changing its size compare to the original, I can check if subsequent processing would reveal any defects introduced or left by your cleanup. :rofl:

A user or software just auto selects the threshold, and converts everything to black or white above or below it. Of course it would work well only if background is clean and scanned text is pre-improved (by unsharping, thickening, smoothing, despeckling etc.) such a way that converting color text to black would not result in its quality or outline loss, and the resulting various font appearance would ideally be close to the book fonts, i.e. not too thick, fine enough, etc.

Not sure what you mean by “leftover areas”? If you mean transparent text and text highlighting from back page, ideally it must be completely removed if possible. If you talk about embedded photos, they can be selected on page to bypass processing, this is called image segmentation. The problem is, the back text also shows up on the front photos, so the challenge is to remove it from photos too, possibly by a different method. :cry:

Here it is:

This is how it looks flatten and binarized at 127KB file size. Possible to read as this is the ultimate goal. But some text outline is missing, and more prior font outline improvement is desired so its not partially lost at binarization.

Ideally the font outline should be filled without defects, and its pixel color spectrum should be narrowed at cleanup close to black, so its not cut off to white later at binarization thus making small visible defects in the outline. The background might also contain close to black pixels as a result, and their removal may then require a different filter like despeckle. :sweat_smile:

Talking to the site admin, since this site can’t show DJVU format when uploaded, there is a little sense in blocking other hosting sites links here for DJVU files, unless you enable showing them here. :blush:

It is too late now here in India. I will try to post another tiff file tomorrow, if you are interested.

No problem. I’m definitely interested. :sleeping:

OK. Here is a tif file that is obtained by using the RawTherapee 5.7 programme. The advantage of this is that you can apply the attached processing file to all images without having to open each one of them.
It is a 16 bit tif file generated from a 8 bit jpg file. Hence, the same processing applied to an original 16 bit file may yield better result.

Processing file: 484RawTherapee-1.tif.out.pp3 (11.9 KB)

The text recovery is quite noticeable, but I was unable to remove enough background noise without loosing some text outline, even with intermediate processing in a specialized book restoration package before extra processing in DjVU cleaner & converter. In other words, we’re facing the same task of narrowing the text outline all pixels color spectrum closer to black color before binarization.

Besides, this RawTherapee TIFF format can’t be converted directly by DjVU tools for some reason, thus requiring be re-saved as TIFF by another graphical package.

As mentioned, there are many ways to reach the goal. This time I am not using afre_cleantext but afre_contrastfft (I haven’t written the GUI part yet).

3 Likes

It looks like you’re closer to the goal Of what @sambul81 want than I thought.

It may be, here’s 50KB DjVU page to support that.

Actually there is large community of book lovers who want “what sambul81 wants”. :joy: And it was obvious from the start, the man is highly intelligent and a bright talent. You guys rock!

Looks like I don’t need the collab after all. Don’t get too excited: I haven’t release the GUI yet.

I know, because… here there are 2 test sets. Hope you won’t forget about… :crazy_face:

There’s a bit more processing step that could be added. Lightness/Contrast and erode/dilate. And afre would have a working alternative to afre_cleantext.

I wonder if there was any improvement progress on the new “immature” plugin lately? :yum:

I have found that the easiest way to eliminate bleed thru of text from the other side of the page, is to place a black sheet of paper behind the page being scanned. I use it all the time and it works great.

1 Like

Would you point to a suitable black material on Ebay or such? Or were did you get your sheets?

@Bilbo Yes, it is all about technique as I have been saying all along.

@sambul81 I have decided not to. You should improve your scanning technique first. Call it tough love. :slight_smile:

1 Like

And just one more thing, if you ever want to improve upon current technique, you could try to learn g’mic and contribute. Ever since learning g’mic, I’m far more independent in that aspect. On the side note, still need to learn c++ to finish some of my needs though since no one else wants to take it up.

I participate on other forums too. Some folks solicit commercial services for a fee on these forums despite being strictly prohibited by Rules. I was openly objecting to this abuse. The guy in retaliation contacted every software developer around the world we discussed on that forum asking them not to improve their free packages. The motif is clear - he was facing loss of illegal income. He might be a member of this forum too. However, he’s not the only one.

Some “free” soft devs abuse access to users sharing knowledge on such forums to develop improved software they offer for a fee to some companies. The technique is primitive: complain that you’re poor, incapable to survive, solicit samples and detail info about the problem and expected resolution from narrow knowledge holders, do all this by extending fake promises, and once collected enough info, show your real face to the forum. That works well too for some, if such conduct is allowed by the forum admins.

Another interesting possibility is using this forum to advertise skills you don’t actually have. For example, one can use a popular graphics editor to cleanup someone’s sample, and then say they developed their own code to do that. Of course when asked to show such a code, a suitable answer will be what? Empty handed… substitute proof by some rude redirection.

I want to tell you btw that you should address people in the manner they address you. You can’t dictate people what to do, can’t be rude - its against forum rules, and don’t give unsolicited advice when asked about totally different matters. I did not scan these pages at all, because the source doesn’t have an accessible scanner. Its clear from my above notes and the photos, so your comments about scans are not only unsolicited, but also irrelevant to the conversation and quite stupid to be honest.

This is a very bad habit, don’t ask people to be rude with you by abusing access to forum members and their polite honest attitude.

@sambul81 it isn’t clear what conspiracy theories you’re floating here and it doesn’t matter, really. Nobody is obliged to provide you anything at all, just as you are not obliged to provide them anything either. Whatever you provide is of your own volition. What you do on other forums and how people treat one another on those forums is not of consequence here.

We are all here because we have a common interest and generally enjoy solving problems around imaging. If someone happens to solve your problem 100%, that is fantastic. If they only solve it 1% or even zero, but share along that way, that is also good.

There is no floor or ceiling to sharing and everyone should share what they want, whether that is 100%, 1%, or 0%.

Seems like you got somewhere between 1 and 99%, you should be thankful.

Rudeness is not welcome here. You’ve ready been messaged once, let’s not go for more.