I finally got around to trying out the tesseract optical character recognition software. It works way better than any other OCR software I’ve used, and I’ve used quite a bit.

There are debian packages for both tesseract and leptonica, but I wanted to try out Ocropus too because it has page layout capabilities. And so I had to download and install a few items, thankfully, this page supplied all the information I needed.

Ocroscript at the Docunext Wiki