OCR only working with embedded pictures

  • Tesseract installed
  • Folder contains 2 files:
  1. Word file which has embedded the picture attached above
  2. Attached picture

  • Search for “2017” finds only the Word file which includes the picture. Not the picture itself.

File names:

  • Test OCR nextant.docx
  • Image in Word file.jpg

Any idea?


Server configuration

Operating system: Linux owncloud 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64

Web server: Apache/2.4.18 (Ubuntu) (apache2handler)

Database: mysql 5.7.17

PHP version: 7.0.15-0ubuntu0.16.04.4
Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, standard, apache2handler, mysqlnd, PDO, xml, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, imap, intl, json, ldap, exif, mcrypt, mysqli, pdo_mysql, pdo_pgsql, pdo_sqlite, pgsql, Phar, posix, readline, redis, shmop, SimpleXML, smbclient, sockets, sqlite3, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, libsmbclient, Zend OPcache

Nextcloud version: 11.0.2 (stable) - 11.0.2.7

Updated from an older Nextcloud/ownCloud or fresh install: Upgrade from OC 9

Where did you install Nextcloud from: tech-and-me VM

List of activated apps:

App list ``` Enabled: - activity: 2.4.1 - activitylog: 0.0.1 - admin_notifications: 1.0.0 - announcementcenter: 3.0.0 - apporder: 0.3.3 - audioplayer: 1.5.1 - bookmarks: 0.9.1 - calendar: 1.5.2 - comments: 1.1.0 - contacts: 1.5.3 - dav: 1.1.1 - external: 1.2 - federatedfilesharing: 1.1.1 - federation: 1.1.1 - files: 1.6.1 - files_accesscontrol: 1.1.2 - files_automatedtagging: 1.1.1 - files_downloadactivity: 1.0.1 - files_external: 1.1.2 - files_markdown: 1.0.1 - files_pdfviewer: 1.0.1 - files_retention: 1.0.1 - files_sharing: 1.1.1 - files_texteditor: 2.2 - files_trashbin: 1.1.0 - files_versions: 1.4.0 - files_videoplayer: 1.0.0 - firstrunwizard: 2.0 - gallery: 16.0.0 - issuetemplate: 0.2.1 - keeweb: 0.3.1 - logreader: 2.0.0 - lookup_server_connector: 1.0.0 - nextant: 1.0.6 - nextcloud_announcements: 1.0 - notes: 2.2.0 - notifications: 1.0.1 - passman: 2.1.1 - password_policy: 1.1.0 - previewgenerator: 1.0.5 - provisioning_api: 1.1.0 - qownnotesapi: 17.3.0 - rainloop: 4.28.1 - serverinfo: 1.1.1 - sharebymail: 1.0.1 - spreed: 1.2.0 - spreedme: 0.3.8 - survey_client: 0.1.5 - systemtags: 1.1.3 - tasks: 0.9.5 - theming: 1.1.1 - twofactor_backupcodes: 1.0.0 - twofactor_totp: 1.1.0 - twofactor_u2f: 1.2.0 - updatenotification: 1.1.1 - weather: 1.3.5 - workflowengine: 1.1.1

Disabled:

  • admin_audit
  • deck
  • encryption
  • files_opds
  • files_reader
  • gluusso
  • gpxedit
  • gpxpod
  • mail
  • news
  • registration
  • templateeditor
  • user_external
  • user_ldap
  • user_saml
</details>

Soooooorryyyyyyyy … my fault! :kissing_smiling_eyes:

I forgot to include JPG files under “Edit your filters”. Now it’s set and indexing is ongoing and the process tesseract is running with high CPU load! :slight_smile: :+1:

My happiness was too early: Still no JPGs are found by the search. :frowning: