Background removal and cleanup of book scan

Are youso kind to tell more aboutthe different steps in order to build them in rawtherapee? That would be helpful.
Thanks for any tip,
Pit

Download that pp3 file above your last post. Open your file in RT then click on Open on top right corner and choose open profile and double click the pp3.

If this message was for me, G’MIC doesn’t go with RT but with GIMP (as a GUI plugin) or command line (which is faster). Could you share a few more scans of various qualities so that I may test my filter script?

Mh,

i will give you a short overview to my tasks and please have a look to the results:

Try this file as a startingpoint. It is a very bad scan from my library.
https://www.dropbox.com/s/ruok2iknd7pbr7l/178_start.JPG?dl=0

First step was to cut it and to make some effects with HDR etc. with an old version of dxo which let me get this result
https://www.dropbox.com/s/7d531cn9i7srva2/178_start_dxo.jpg?dl=0

I would like to get his done by RT - but i am newbee and i will go further to get results …

Second step is to reach the width of file which i want to get for proper printing (5000 px). For this i use an old programm with name smilla enlarger, which does a good job, but perhaps this job could also be done by RT?

Result is
https://www.dropbox.com/s/axw643tljswqp28/178_start_dxo_e.jpg?dl=0

Now i come to RT.
I created this file
https://www.dropbox.com/s/bhatjj3rnn0mpnn/178_start_dxo_e_result.jpg.out.pp3?dl=0

to get his result
https://www.dropbox.com/s/tr3roj0plnvavvr/178_start_dxo_e_result.jpg?dl=0

I found out that using the Black-slider in the exposure tab gives best results with ~ 20.000.

But i must confess that the results are somehow away from the original.
Some pp3 files were given from very generous members of the forum and i tried them all.

I hope to combine the steps of dxo and smilla. Perhaps RT can do that?
Thanks for your time!
Later, Pit


I would say RawTherapee is a great tool for what its designed for - RAW photos. It’s not an OCR software at all.

I suggest you have a look at this page:

https://www.diybookscanner.org

Check what software they suggest. I use Abbyy Finereader, that ain’t OpenSource though.

If you just want the image, I would go with GIMP. Certainly not RT.

1 Like

Here is the full page result I get from my test G’MIC filter using the exact same settings as before. Of course, the parameters can be tweaked to make it look better. It performs best when it has been cropped beforehand.

Uncredible!
That is more than i can expect. I would like to see this filter, please.
Thank you so much.

Btw: My goal is not OCR but a crisp facsimile of old archive material. Thanks to all for thinking about make it possible with RT.

I tried myself using Gimp and the plug-in.
Two filters are great:
Sharpen (Octave Sharping) and
Iain’s Noise Sharping

With those two i try some files now.

Btw. the original scans from the library have only poor resolution. What do you think is better:
First increasing resolution and than processing?
or
First processing and after that increasing resolution?

I keep in mind that every step will create false data. This data will be increased also. But what is your opinion about that?

Thanks a lot.
Later, Pit

Maybe this G’MIC filter is interesting?
GIMP>G’MIC>Testing>Samj>Colors>Samj At06A 2017 VarCouleurs:


Or first

GIMP>G’MIC>Testing>Naggobot>Dodge Sketch:


And thereafter again
GIMP>G’MIC>Testing>Samj>Colors>Samj At06A 2017 VarCouleurs

1 Like

Won - der - ful !!!

What value did you use for
Puissance
Couper A
Couper B
Noir et Blanc?

Thank you so much!
Later, Pit

About the same as in my fore last post:

first image

Please play with variables.

My filter is coming soon. I think you would enjoy it since it is very simple. Just figuring out a few things before I make a pull request.

Can you please explain how you make two different filters on one scan? Do you write it to a new file, leaving G’MIC open and then for the second instance you start with this new photo on the second filter? Or do you write the results to different levels and combine them later?

I have a bunch of scans here and i want to make it more easier with shortcuts. Batch is not possible, because i have to look after every scan individually.

Thanks for any help!
Pit

A standard way (independent of it being a scan) is to use in the Input/Output section:

  1. Input Layers as Active (default),
  2. Output layers as New Active Layer(s)
    And the use the Apply button to apply the filter (not the OK button).

This way, G"MIC does not close and chooses the latest image obtained after applying the first filter for the application of the second filter.

@shreedhar does answer your question excellent.

Maybe I’m more clumsy. I push the OK-button after a filter. Then I view the result in GIMP, because the end result is sometimes different from the preview. In this case I continue with the last result and restart G’MIC for the next filter.

@afre made a nice filter yesterday/today :+1::

GIMP>G’MIC>Testing>Afre>Clean Text

Maybe you can try that filter also:


1. I have changed the category to Processing and added the tags gmic, rawtherapee, scanner. I think we have room to discuss how to make better scans.


2. iarga is correct. Clean Text is available as a GIMP-G’MIC plugin filter and also as a CLI command (afre_cleantext). Unfortunately, I am having trouble retrieving it from the servers. I don’t know if you have the same issue ATM when you try updating the filters. Do the following:

For GUI plugin

gmic-afre

Then search for Clean Text.


For CLI

gmic update
gmic sample tiger afre_cleantext

3. You can totally do batch work using the GUI or CLI. Try the filter on the two most different scans and find a happy medium.

Then in GIMP, Open as layers..., in the plugin, set Input layers as All, and then you could apply the filter to all of the pages.

Or in CLI, you could copy all of the scans to a single new folder and do

gmic input_glob *.jpg repeat $! local[$>] afre_cleantext , o _{b}.jpg endlocal done

Hint: replace the comma with your parameters. The comma by itself means that it will use the default values I have set.


PS If you found my filter and instructions helpful, perhaps we could as a community write a tutorial together, with instructions on how to digitize a book. I don’t have the time for that currently but what do you think @patdavid @paperdigits?

2 Likes

Anyone is free to contribute an article! I can help with the copy editing and all the git stuff if you’re not inclined.

Our git repo for the main website is public: GitHub - pixlsus/website: The PIXLS.US website

1 Like

Unfortunately i can not download afre’s filter :frowning:

Is it possible to load it somewhere else?
Later, Peter

You may need to hit the refresh button a few times.