I have a Nextcloud instance running with several thousand files uploaded. I have set up a workflow that automatically performs OCR upon file upload using the OCR file workflow.
My problem is that I have lots of files that were uploaded before the workflow was installed and I would like to OCR those.
I have already set up a OCR workflow that performs the OCR operation upon tag assignment, which works. For this, though, a user must assign the OCR tag to all the files that should be OCR’d.
The readme document of the OCR workflow app describes that an external command line tool is called by the workflow to do the OCR processing. Check-out how this tool need to be called on the console to process files. Once you’ve clarified this, you could create a batch script which e.g. uses the find command to find all .pdf files and parse them to the external OCR tool.