Tesseract on Nextcloud AIO docker

I am on Nextcloud AIO v9.1.0 + Nextcloud Hub 8 29.0.3 on AWS (after fiddling around a bit)
as we have lot of scanned pdf files to process a functioning OCR solution is a requirement.
I was a bit surprised to learn that the provided fulltextsearch container comes without OCR capabilities.
any hints (recipe to install!) please

my script for now is

aio=`sudo docker ps | cut -f 1 -d' '|grep -v CONT`
sudo docker stop $aio
sudo docker rm $aio

sudo docker pull linuxserver/libreoffice
sudo docker pull jbarlow83/ocrmypdf-alpine

echo "Will start AIO master docker now"
sudo docker run \
--init \
--sig-proxy=false \
--name nextcloud-aio-mastercontainer \
--restart always \
--publish 80:80 \
--publish 8080:8080 \
--publish 8443:8443 \
--volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config \
--volume /var/run/docker.sock:/var/run/docker.sock:ro \
--env NEXTCLOUD_ADDITIONAL_APKS="libreoffice ocrmypdf" \
-d \
nextcloud/all-in-one:latest

echo "Got to AIO console and start containers"

next step - until I find a way to add ocrmypdf to the PATH:
sudo docker exec -it <nextcloud container> bash

change “ocrmypdf” to “/usr/bin/ocrmypdf”

vi 
custom_apps/workflow_ocr/lib/OcrProcessors/OcrMyPdfBasedProcessor.php

still missing - how to add languages

GitHub - ocrmypdf/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

mentions

By default the Docker image includes English, German, Simplified Chinese, French, Portuguese and Spanish, the most popular languages for OCRmyPDF users based on feedback. You may add other languages by creating a new Dockerfile based on the public one.

so I do not know why / what i should do to avoid the following message

OCRmyPDF succeeded with warning(s): OCR engine does not have language data for the following requested languages: eng Please install the appropriate language data for your OCR engine. See the online documentation for instructions: Installing additional language packs — ocrmypdf 16.4.2.dev2+gd544342 documentation Note: most languages are identified by a 3-letter ISO 639-2 Code. For example, English is ‘eng’, German is ‘deu’, and Spanish is ‘spa’. Simplified Chinese is ‘chi_sim’ and Traditional Chinese is ‘chi_tra’.

I can not select a language here

after adding the language packs

–env NEXTCLOUD_ADDITIONAL_APKS=“libreoffice ocrmypdf tesseract-ocr-data-eng tesseract-ocr-data-deu tesseract-ocr-data-fra” \

I can select languages in the flow app, but get another error now

cURL error 7: Failed to connect to xxx.xxx.xx port 443 after 1 ms: Couldn’t connect to server (see libcurl - Error Codes) for https://xxx.xxx.xx/hosting/capabilities

Failed to fetch capabilities: cURL error 7: Failed to connect to aws.chricar.at port 443 after 1 ms: Couldn’t connect to server (see libcurl - Error Codes) for https://xxx.xxx.xx/hosting/capabilities

it seem to me that ocrpypdf is not fully “integrated” in the nc container


AIO works now for me - to be honest, I do not know why OCR works now, probably some container restart helped

sudo docker run
–init
–sig-proxy=false
–name nextcloud-aio-mastercontainer
–restart always
–publish 80:80
–publish 8080:8080
–publish 8443:8443
–volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config
–volume /var/run/docker.sock:/var/run/docker.sock:ro
–env NEXTCLOUD_ADDITIONAL_APKS=“libreoffice ocrmypdf tesseract-ocr-data-eng tesseract-ocr-data-deu tesseract-ocr-data-fra”
nextcloud/all-in-one:latest