Working with afre_cleantext filter and G'MIC plugin

sambul81 · December 26, 2019, 5:32pm

Thanks. What is the purpose of this step - convert color to grayscale? It’ll only help if doesn’t result in some pixels loss by front page font, so this page is a good example to try. Ultimate cleanup goal is to remove transparent font visible from the back page. Then image would go to Book Restorer to be flattened. If cleanup was done improperly, major background dirt will appear at flattening (geometric correction).

Once flattened, the image would go to DjVU package for conversion, where binarization will likely follow to cut on resulting DjVU file size. DjVU Solo 3.1 is free and still the most efficient compression choice, but it lacks cleanup and image enhancement tools and custom presets. For that DjVU Document Express Desktop & Enterprise combo is often used, or their popular derivatives.

Success of this step directly depends on quality of background cleanup and font appearance improvement at previous steps, though image auto segmentation on text and pictures by some derivative DjVU packages, and different segments processing may improve scan quality. If binarization produces serious loss in font quality, then its omitted, but the resulting DjVU file will be larger, and background cleanliness may worsen. Not even talking about PDF alternative due to larger file size and scarce cleanup tools.

Some books are copyright protected, and other don’t anymore. Imaging an old encyclopedia book with 1500-2000 pages - that’s where single page file size does matter, especially for book reading on mobile devices. The better scan and cleanup quality, the higher chance to binarize it with good results for smaller file size.

I found that thread, but examples in it were too easy to fix by existing cleanup tools, and they are not typical for real life dirty old paper archive scans or reference book page scans with very thin transparent pages. It was a good starting point though.