How to add OCRed PDF to search

Sorry if this is an old question.

That’s how I create the PDF:

  1. Scanned images (PNG) is OCRed by Google Vision.
  2. Convert Google output to hocr text.
  3. Combine hocr + PNG and output a PDF. (The output file is big, but it work).
  4. PDF is searchable in Acrobat.

NextCloud can OCR image files, but not this kind of PDF. How do I made the PDF searchable in NextCloud ?

Thanks for your help.

Regards,

Almond Wong

Hello Almond,

please check out https://apps.nextcloud.com/apps/ocr

Works pretty well for me.

Cheers

I already using this for normal images. Tesseract work great for Latin character, we use ocrmypdf with very good result. We use Google Vision mainly on Traditional Chinese.

Anyway to made that kind of PDF searchable ? Is it require create a workflow or write a small module ?

Please advice.

Best regards,

Almond Wong