… I’m looking to set up a script or the like which can read in an OCRed PDF and output another …
I’m not sure what an “OCRed PDF” is.
There are two cases: (1) scanned documents, where each PDF page is a single raster image, the scan of a single page, and (2) text is recorded as vector data (infinitely scalable).
(1) Scanned documents.
I would use pdfimages to extract all the raster images from the PDF. Then I would use ImageMagick to change the colours to whatever I wanted. For my old eyes, I like white text on a black background, which is often a simple “-negate” operation. But you can have any colours you like.
(I would not use ImageMagick to read the source PDF files. This is because IM will rasterize each PDF page, but when each page is already a raster image, this causes re-sampling of the image, which lowers the readability.)
(2) Text is vector data.
From the OP comment …
And the OCR info has been lost. But I can live with that too, I can have one read aloud while I look at the other.
… I think this is the case.
“Ordinary” PDF documents, such as those in scientific journals, typically do not have rasterised text. A PDF viewer can change colours. For example: Adobe Acrobat Reader, Edit, Preferences, Accessibility, tick “Use High-Contrast colors” for white on black or a few others, or click on “Custom Color” for other combinations. Sadly, some PDF documents use gray text instead of black. Adobe Acrobat Reader can’t make this text actually black or white. Annoying.
Ideally, there would be a FOSS PDF editor that could do this simple change of colours. I am not aware of any such tool.
ImageMagick or Gimp can be used to rasterize each page, at some specified dpi. (IM does this via Ghostscript.) Then we can do whatever changes we want. This loses the “vector” nature of text, so it is no longer searchable. For example, using IM:
magick -density 300 -background White in.pdf[0-9] -alpha Background -alpha off -negate out-0-9.pdf
This converts just the first ten pages.
We can use pdfunite, if we want, to join the first ten pages to the next ten pages, and so on.