I am on Nextcloud AIO v9.1.0 + Nextcloud Hub 8 29.0.3 on AWS (after fiddling around a bit)
as we have lot of scanned pdf files to process a functioning OCR solution is a requirement.
I was a bit surprised to learn that the provided fulltextsearch container comes without OCR capabilities.
any hints (recipe to install!) please
my script for now is
aio=`sudo docker ps | cut -f 1 -d' '|grep -v CONT`
sudo docker stop $aio
sudo docker rm $aio
sudo docker pull linuxserver/libreoffice
sudo docker pull jbarlow83/ocrmypdf-alpine
echo "Will start AIO master docker now"
sudo docker run \
--init \
--sig-proxy=false \
--name nextcloud-aio-mastercontainer \
--restart always \
--publish 80:80 \
--publish 8080:8080 \
--publish 8443:8443 \
--volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config \
--volume /var/run/docker.sock:/var/run/docker.sock:ro \
--env NEXTCLOUD_ADDITIONAL_APKS="libreoffice ocrmypdf" \
-d \
nextcloud/all-in-one:latest
echo "Got to AIO console and start containers"
next step - until I find a way to add ocrmypdf to the PATH:
sudo docker exec -it <nextcloud container> bash
change “ocrmypdf” to “/usr/bin/ocrmypdf”
vi
custom_apps/workflow_ocr/lib/OcrProcessors/OcrMyPdfBasedProcessor.php
still missing - how to add languages
mentions
By default the Docker image includes English, German, Simplified Chinese, French, Portuguese and Spanish, the most popular languages for OCRmyPDF users based on feedback. You may add other languages by creating a new Dockerfile based on the public one.
so I do not know why / what i should do to avoid the following message
OCRmyPDF succeeded with warning(s): OCR engine does not have language data for the following requested languages: eng Please install the appropriate language data for your OCR engine. See the online documentation for instructions: Installing additional language packs — ocrmypdf 16.4.2.dev2+gd544342 documentation Note: most languages are identified by a 3-letter ISO 639-2 Code. For example, English is ‘eng’, German is ‘deu’, and Spanish is ‘spa’. Simplified Chinese is ‘chi_sim’ and Traditional Chinese is ‘chi_tra’.
I can not select a language here
after adding the language packs
–env NEXTCLOUD_ADDITIONAL_APKS=“libreoffice ocrmypdf tesseract-ocr-data-eng tesseract-ocr-data-deu tesseract-ocr-data-fra” \
I can select languages in the flow app, but get another error now
cURL error 7: Failed to connect to xxx.xxx.xx port 443 after 1 ms: Couldn’t connect to server (see libcurl - Error Codes) for https://xxx.xxx.xx/hosting/capabilities
Failed to fetch capabilities: cURL error 7: Failed to connect to aws.chricar.at port 443 after 1 ms: Couldn’t connect to server (see libcurl - Error Codes) for https://xxx.xxx.xx/hosting/capabilities
it seem to me that ocrpypdf is not fully “integrated” in the nc container
AIO works now for me - to be honest, I do not know why OCR works now, probably some container restart helped
sudo docker run
–init
–sig-proxy=false
–name nextcloud-aio-mastercontainer
–restart always
–publish 80:80
–publish 8080:8080
–publish 8443:8443
–volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config
–volume /var/run/docker.sock:/var/run/docker.sock:ro
–env NEXTCLOUD_ADDITIONAL_APKS=“libreoffice ocrmypdf tesseract-ocr-data-eng tesseract-ocr-data-deu tesseract-ocr-data-fra”
nextcloud/all-in-one:latest