Fulltextsearch with tesseract on old files not indexing pdf

Support intro

Sorry to hear you’re facing problems. :slightly_frowning_face:

The community help forum (help.nextcloud.com) is for home and non-enterprise users. Support is provided by other community members on a best effort / “as available” basis. All of those responding are volunteering their time to help you.

If you’re using Nextcloud in a business/critical setting, paid and SLA-based support services can be accessed via portal.nextcloud.com where Nextcloud engineers can help ensure your business keeps running smoothly.

Getting help

In order to help you as efficiently (and quickly!) as possible, please fill in as much of the below requested information as you can.

Before clicking submit: Please check if your query is already addressed via the following resources:

(Utilizing these existing resources is typically faster. It also helps reduce the load on our generous volunteers while elevating the signal to noise ratio of the forums otherwise arising from the same queries being posted repeatedly).

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can. :heart:

The Basics

  • Nextcloud Server version (e.g., 29.x.x):
    • 32
  • Operating system and version (e.g., Ubuntu 24.04):
    • Debian 13.3
  • Web server and version (e.g, Apache 2.4.25):
    • nginx 1.29.5 /PHP-FPM 8.3.30
  • Reverse proxy and version _(e.g. nginx 1.27.2)
    • nginxproxymanager 2.13.7
  • PHP version (e.g, 8.3):
    • 8.3.30
  • Is this the first time you’ve seen this error? (Yes / No):
    • Yes
  • When did this problem seem to first start?
    • Setting up a new instance, moving over old data
  • Installation method (e.g. AlO, NCP, Bare Metal/Archive, etc.)
    • Docker Compose
  • Are you using CloudfIare, mod_security, or similar? (Yes / No)
    • no

Summary of the issue you are facing:

  • Installed a new nextcloud instance using Docker-based installation, official NextCloud-Image with added tesseract-packages.
  • Added lots of old files
  • Added fulltextsearch with elasticsearch-backend, files and tesseract apps
  • Run index on search
  • Expected result: All files index
  • Result found: PDF-contents are missing from index (i.e. searching for text-content from pdf doesn’t find any results), other files (i.e., jpeg) are found (including OCR results).
  • When re-uploading a PDF, it gets indexed (found by filename and by content)

Steps to replicate it (hint: details matter!):

  1. Install new NextCloud instance

  2. Copy files and run “php occ files:scan”

  3. Add fulltextsearch-apps and tesseract

  4. Run “php occ fulltextsearch:index”

Log entries

Nextcloud

Please provide the log entries from your Nextcloud log that are generated during the time of problem (via the Copy raw option from Administration settings->Logging screen or from your nextcloud.log located in your data directory). Feel free to use a pastebin/gist service if necessary.

No log-entries related to fulltextsearch after the installation.

Web Browser

If the problem is related to the Web interface, open your browser inspector Console and Network tabs while refreshing (reloading) and reproducing the problem. Provide any relevant output/errors here that appear.

Not related to browser, tested search with multiple different browsers on multiple OS.

Web server / Reverse Proxy

The output of your Apache/nginx/system log in /var/log/____:

No relevant entries

Configuration

Nextcloud

The output of occ config:list system or similar is best, but, if not possible, the contents of your config.php file from /path/to/nextcloud is fine (make sure to remove any identifiable information!):

{
    "system": {
        "memcache.local": "\\OC\\Memcache\\APCu",
        "apps_paths": [
            {
                "path": "\/var\/www\/html\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/custom_apps",
                "url": "\/custom_apps",
                "writable": true
            }
        ],
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "password": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "upgrade.disable-web": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "nc..."
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "32.0.6.1",
        "overwrite.cli.url": "https:\/\/nc...",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "memories.exiftool": "\/var\/www\/html\/custom_apps\/memories\/bin-ext\/exiftool-amd64-glibc",
        "memories.vod.path": "\/var\/www\/html\/custom_apps\/memories\/bin-ext\/go-vod-amd64",
        "memories.vod.ffmpeg": "\/usr\/bin\/ffmpeg",
        "memories.vod.ffprobe": "\/usr\/bin\/ffprobe",
        "enabledPreviewProviders": [
            "OC\\Preview\\Image",
            "OC\\Preview\\HEIC",
            "OC\\Preview\\TIFF",
            "OC\\Preview\\Movie"
        ],
        "maintenance": false,
        "memories.vod.disable": false,
        "memories.vod.external": true,
        "memories.vod.connect": "go-vod:47788",
        "memories.vod.vaapi": true,
        "preview_max_memory": 512,
        "preview_max_filesize_image": 100,
        "memories.db.triggers.fcu": true,
        "memories.gis_type": 1,
        "app_install_overwrite": [
            "facerecognition",
            "maps"
        ],
        "onlyoffice": {
            "DocumentServerUrl": "\/ds-vpath\/",
            "DocumentServerInternalUrl": "http:\/\/onlyoffice-document-server\/",
            "StorageUrl": "https:\/\/nc....\/",
            "jwt_secret": "***REMOVED SENSITIVE VALUE***"
        },
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpmode": "smtp",
        "mail_sendmailmode": "smtp",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpauth": true,
        "mail_smtpsecure": "ssl",
        "mail_smtpport": "465",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***"
    }
}

Apps

The output of occ app:list (if possible).

Enabled:
  - activity: 5.0.0
  - app_api: 32.0.0
  - bookmarks: 16.1.3
  - bruteforcesettings: 5.0.0
  - calendar: 6.2.0
  - circles: 32.0.0
  - cloud_federation_api: 1.16.0
  - comments: 1.22.0
  - contacts: 8.3.3
  - contactsinteraction: 1.13.1
  - dashboard: 7.12.0
  - dav: 1.34.2
  - deck: 1.16.3
  - federatedfilesharing: 1.22.0
  - federation: 1.22.0
  - files: 2.4.0
  - files_downloadlimit: 5.0.0-dev.0
  - files_external: 1.24.1
  - files_fulltextsearch: 32.0.2
  - files_fulltextsearch_tesseract: 32.0.0
  - files_pdfviewer: 5.0.0
  - files_reminders: 1.5.0
  - files_sharing: 1.24.1
  - files_trashbin: 1.22.0
  - files_versions: 1.25.0
  - firstrunwizard: 5.0.0
  - fulltextsearch: 32.0.0
  - fulltextsearch_elasticsearch: 32.0.2
  - keeweb: 0.6.22
  - logreader: 5.0.0
  - lookup_server_connector: 1.20.0
  - mail: 5.7.1
  - maps: 1.6.0
  - memories: 7.8.2
  - nextcloud_announcements: 4.0.0
  - notes: 4.13.0
  - notifications: 5.0.0
  - notify_push: 1.3.0
  - oauth2: 1.20.0
  - onlyoffice: 9.13.0
  - password_policy: 4.0.0
  - photos: 5.0.0
  - polls: 8.6.3
  - previewgenerator: 5.13.0
  - privacy: 4.0.0
  - profile: 1.1.0
  - provisioning_api: 1.22.0
  - recognize: 10.0.7
  - recommendations: 5.0.0
  - related_resources: 3.0.0
  - richdocuments: 9.0.3
  - serverinfo: 4.0.0
  - settings: 1.15.1
  - sharebymail: 1.22.0
  - sociallogin: 6.3.1
  - spreed: 22.0.9
  - support: 4.0.0
  - survey_client: 4.0.0
  - systemtags: 1.22.0
  - tasks: 0.17.1
  - text: 6.0.1
  - theming: 2.7.0
  - twofactor_backupcodes: 1.21.0
  - updatenotification: 1.22.0
  - user_status: 1.12.0
  - viewer: 5.0.0
  - weather_status: 1.12.0
  - webhook_listeners: 1.3.0
  - whiteboard: 1.5.6
  - workflowengine: 2.14.0
Disabled:
  - admin_audit: 1.22.0
  - encryption: 2.20.0
  - facerecognition: 0.9.70 (installed 0.9.70)
  - richdocumentscode: 25.4.901 (installed 25.4.901)
  - suspicious_login: 10.0.0
  - twofactor_nextcloud_notification: 6.0.0
  - twofactor_totp: 14.0.0
  - user_ldap: 1.23.0

Tips for increasing the likelihood of a response

  • Use the preformatted text formatting option in the editor for all log entries and configuration output.
  • If screenshots are useful, feel free to include them.
    • If possible, also include key error output in text form so it can be searched for.
  • Try to edit log output only minimally (if at all) so that it can be ran through analyzers / formatters by those trying to help you.