Tesseract on Nextcloud AIO docker

Without the " i get
Unable to find image ‘tesseract-ocr:latest’ locally
docker: Error response from daemon: pull access denied for tesseract-ocr, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.

Anybody installed OCR on Nextcloud with docker?

setup full text search for Nextcloud on docker (fariszr.com)

@szaimen Setting up Nextcloud full-text search isn’t very clear, especially with Docker, so in this post I’ll explain how to set up full-text search with Elasticsearch and Tesseract using Nextcloud’s official Docker images.

Looks fine and was it added to the nextcloud container after restarting the containers from the AIO interface? If not, please post the nextcloud-aio-nextcloud container logs here. You can retreive them with sudo docker logs nextcloud-aio-nextcloud


2023-11-01 08:59:47.618105+00
(1 row)

  • ‘[’ -f /dev-dri-group-was-added ‘]’
    ++ find /dev -maxdepth 1 -mindepth 1 -name dri
  • ‘[’ -n ‘’ ‘]’
  • set +x
    Installing ocrmypdf via apk…
    Installing tesseract-ocr via apk…
    Installing tesseract-ocr-eng via apk…
    ERROR: unable to select packages:
    tesseract-ocr-eng (no such package):
    required by: world[tesseract-ocr-eng]
    The packet tesseract-ocr-eng was not installed!
    Enabling Imagick…
    WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.18/main: No such file or directory
    WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.18/community: No such file or directory
    Configuring Redis as session handler…
    Setting php max children…
    Applying one-click-instance settings…
    System config value one-click-instance set to boolean true
    System config value one-click-instance.user-limit set to integer 100
    System config value one-click-instance.link set to string All-in-one - Nextcloud
    support already enabled
    Adjusting log files…
    System config value upgrade.cli-upgrade-link set to string What can I do if Nextcloud shows `Update needed`? · nextcloud/all-in-one · Discussion #2726 · GitHub

If i search on txt in a photo nothing has found.

You found the problem then

Also see https://pkgs.alpinelinux.org/packages?name=*tesseract*&branch=v3.18&repo=&arch=x86_64&maintainer=

1 Like

Well ill stop with it, commands are not working or not complete etc.etc…

It would work if you would add the correct package.

command: sh -c “apt update && apt-get install -y --no-install-recommends tesseract-ocr tesseract-ocr-eng tesseract-ocr-nld && mkdir -p /var/log/supervisord && mkdir -p /var/run/supervisord supervisor && supervisord -c /supervisord.conf”

tesseract-ocr is already the newest version (4.1.1-2.1build1).
tesseract-ocr-eng is already the newest version (1:4.00~git30-7274cfa-1.1).

So the package name is not the problem,

It takes to much time…

The correct package name in alpine is tesseract-ocr-data-eng and not tesseract-ocr-eng which you would have noticed if you would have visited the link. Adding that to the variable shouls automatically install it.

I have tryed that before

E: Unable to locate package tesseract-ocr-data-eng

So…

sudo docker run --sig-proxy=false --name nextcloud-aio-mastercontainer --restart always --publish 80:80 --publish 8080:8080 --publish 8443:8443 --volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config --volume /var/run/docker.sock:/var/run/docker.sock:ro --env NEXTCLOUD_ADDITIONAL_APKS=“ocrmypdf tesseract-ocr-data-eng” nextcloud/all-in-one:latest

log output is THIS…

Installing ocrmypdf via apk…
Installing tesseract-ocr-all via apk…
ERROR: unable to select packages:
tesseract-ocr-all (no such package):
required by: world[tesseract-ocr-all]
The packet tesseract-ocr-all was not installed!
Enabling Imagick…
WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.18/main: No such file or directory
WARNING: opening from cache https://dl-cdn.alpinelinux.org/alpine/v3.18/community: No such file or directory

Did you already restart the containers from the AIO interface?

Yes, but no OCR of images.

It seems that you need to install Libreoffice things.

Conclussion… Its a mess , nextcloud has it own office, apps working with others.

If you want to use OCR and SMB then build fresh nextcloud. Stay away from docker or snap. It will not work.

You could simply add libreoffice as additional package to the variable if that is needed

Its not simply … there to much try this maybe that , wrong version, not today maybe when its raining…

Let it be…

{“reqId”:“kMR0nEL4laaAxgTSVMSL”,“level”:3,“time”:“2023-11-02T14:18:35+00:00”,“remoteAddr”:“”,“user”:“–”,“app”:“workflow_ocr”,“method”:“”,“url”:“–”,“message”:“OCRmyPDF did not produce any output for file /admin/files/Documents/20211204_104657.jpg”,“userAgent”:“–”,“version”:“27.1.3.2”,“exception”:{“Exception”:“OCA\WorkflowOcr\Exception\OcrResultEmptyException”,“Message”:“OCRmyPDF did not produce any output for file /admin/files/Documents/20211204_104657.jpg”,“Code”:0,“Trace”:[{“file”:“/var/www/html/custom_apps/workflow_ocr/lib/Service/OcrService.php”,“line”:117,“function”:“ocrFile”,“class”:“OCA\WorkflowOcr\OcrProcessors\OcrMyPdfBasedProcessor”,“type”:“->”,“args”:[[“OC\Files\Node\File”],[“OCA\WorkflowOcr\Model\WorkflowSettings”],[“OCA\WorkflowOcr\Model\GlobalSettings”,“”]]},{“file”:“/var/www/html/custom_apps/workflow_ocr/lib/BackgroundJobs/ProcessFileJob.php”,“line”:67,“function”:“runOcrProcess”,“class”:“OCA\WorkflowOcr\Service\OcrService”,“type”:“->”,“args”:[14254,“admin”,[“OCA\WorkflowOcr\Model\WorkflowSettings”]]},{“file”:“/var/www/html/lib/public/BackgroundJob/Job.php”,“line”:81,“function”:“run”,“class”:“OCA\WorkflowOcr\BackgroundJobs\ProcessFileJob”,“type”:“->”,“args”:[[“admin”,14254,“{"tagsToAddAfterOcr":[10],"ocrMode":2}”]]},{“file”:“/var/www/html/lib/public/BackgroundJob/QueuedJob.php”,“line”:57,“function”:“start”,“class”:“OCP\BackgroundJob\Job”,“type”:“->”,“args”:[[“OC\BackgroundJob\JobList”]]},{“file”:“/var/www/html/lib/public/BackgroundJob/QueuedJob.php”,“line”:47,“function”:“start”,“class”:“OCP\BackgroundJob\QueuedJob”,“type”:“->”,“args”:[[“OC\BackgroundJob\JobList”]]},{“file”:“/var/www/html/cron.php”,“line”:152,“function”:“execute”,“class”:“OCP\BackgroundJob\QueuedJob”,“type”:“->”,“args”:[[“OC\BackgroundJob\JobList”],[“OC\Log”]]}],“File”:“/var/www/html/custom_apps/workflow_ocr/lib/OcrProcessors/OcrMyPdfBasedProcessor.php”,“Line”:90,“message”:“OCRmyPDF did not produce any output for file /admin/files/Documents/20211204_104657.jpg”,“exception”:[],“CustomMessage”:“OCRmyPDF did not produce any output for file /admin/files/Documents/20211204_104657.jpg”},“id”:“6543b0db76f03”}