OCR only working with embedded pictures

  • Tesseract installed
  • Folder contains 2 files:
  1. Word file which has embedded the picture attached above
  2. Attached picture

  • Search for “2017” finds only the Word file which includes the picture. Not the picture itself.

File names:

  • Test OCR nextant.docx
  • Image in Word file.jpg

Any idea?

Server configuration

Operating system: Linux owncloud 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64

Web server: Apache/2.4.18 (Ubuntu) (apache2handler)

Database: mysql 5.7.17

PHP version: 7.0.15-0ubuntu0.16.04.4
Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, standard, apache2handler, mysqlnd, PDO, xml, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, imap, intl, json, ldap, exif, mcrypt, mysqli, pdo_mysql, pdo_pgsql, pdo_sqlite, pgsql, Phar, posix, readline, redis, shmop, SimpleXML, smbclient, sockets, sqlite3, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, libsmbclient, Zend OPcache

Nextcloud version: 11.0.2 (stable) -

Updated from an older Nextcloud/ownCloud or fresh install: Upgrade from OC 9

Where did you install Nextcloud from: tech-and-me VM

List of activated apps:

Soooooorryyyyyyyy … my fault! :kissing_smiling_eyes:

I forgot to include JPG files under “Edit your filters”. Now it’s set and indexing is ongoing and the process tesseract is running with high CPU load! :slight_smile: :+1:

My happiness was too early: Still no JPGs are found by the search. :frowning: