PDF Workflow Not Working for ORC

Support intro

Sorry to hear you’re facing problems. :slightly_frowning_face:

The community help forum (help.nextcloud.com) is for home and non-enterprise users. Support is provided by other community members on a best effort / “as available” basis. All of those responding are volunteering their time to help you.

If you’re using Nextcloud in a business/critical setting, paid and SLA-based support services can be accessed via portal.nextcloud.com where Nextcloud engineers can help ensure your business keeps running smoothly.

Getting help

In order to help you as efficiently (and quickly!) as possible, please fill in as much of the below requested information as you can.

Before clicking submit: Please check if your query is already addressed via the following resources:

(Utilizing these existing resources is typically faster. It also helps reduce the load on our generous volunteers while elevating the signal to noise ratio of the forums otherwise arising from the same queries being posted repeatedly).

The Basics

  • Nextcloud Server version (e.g., 29.x.x):
    • 31.0.0
  • Operating system and version (e.g., Ubuntu 24.04):
    • Ubuntu 22.04
  • Web server and version (e.g, Apache 2.4.25):
    • Apache 2.4.25
  • Reverse proxy and version _(e.g. nginx 1.27.2)
    • none
  • PHP version (e.g, 8.3):
    • PHP 8.2
  • Is this the first time you’ve seen this error? (Yes / No):
    • Yes
  • When did this problem seem to first start?
    • When I upload documents with .pdf extension
  • Installation method (e.g. AlO, NCP, Bare Metal/Archive, etc.)
    • Native installation done by hand without any container
  • Are you using CloudfIare, mod_security, or similar? (Yes / No)
    • No

Summary of the issue you are facing:

I have set up a WorkFlow for my PDF files when they are uploaded into the Nextcloud environment or updated to ensure that there is an OCR scan of the files and therefore make the text searchable directly in Nextcloud via the search functionality. Unfortunately, however, if I do not execute the command

Citazione
occ fulltextsearch:index
Citazione

from cmd the file is not scanned and therefore its content is not searchable in Nextcloud

Steps to replicate it (hint: details matter!):

  1. Settings > Flow > add new OCR flow
  2. Configure parameters for .pdf files and then file MIME type is .pdf
  3. File language set: Italian and English
  4. OCR mode: skip text

Log entries

There are no logs for this failed flow

Apps

Application installed Flow, Full text search, Full text search - Elasticsearch Platform, Full text search - Files, Full text search - Files - Tesseract OCR and Full text search - Bookmarks