How to OCR all files at once

Hello everyone,

I have a Nextcloud instance running with several thousand files uploaded. I have set up a workflow that automatically performs OCR upon file upload using the OCR file workflow.

My problem is that I have lots of files that were uploaded before the workflow was installed and I would like to OCR those.

I have already set up a OCR workflow that performs the OCR operation upon tag assignment, which works. For this, though, a user must assign the OCR tag to all the files that should be OCR’d.

Is there a way to OCR all the old files at once?

Thanks and have a good day!

Phil

As far as i understood it, you’ll need to indext them once:

sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:index

Im currently trying to setup it too, but had a few Installation issues thus im troubleshooting atm.

The readme document of the OCR workflow app describes that an external command line tool is called by the workflow to do the OCR processing. Check-out how this tool need to be called on the console to process files. Once you’ve clarified this, you could create a batch script which e.g. uses the find command to find all .pdf files and parse them to the external OCR tool.