Tesseract on Nextcloud AIO docker

Hi,

I can’t get OCR working, photo’s are not scanned, there is no language (empty)
How to install this correctly on AIO docker?

Hi,

tesseract-ocr ist not included by default due to security issues.

From
Think about a solution for adding external dependencies into the Nextcloud container · Issue #1162 · nextcloud/all-in-one · GitHub ,
WIP: allow to add dependencies into the Nextcloud container permanently by szaimen · Pull Request #1163 · nextcloud/all-in-one · GitHub and
allow to add dependencies and php extensions into the Nextcloud container by szaimen · Pull Request #1377 · nextcloud/all-in-one · GitHub
I would think, you should be able to add tesseract-ocr and its language-packs to NC-AIO, but I can not figure out, how that would work, esp. without breaking it with the next AIO update.

If anyone has figured it out, please leave a note.

Kind regards.

1 Like

Hi, see GitHub - nextcloud/all-in-one: Nextcloud AIO stands for Nextcloud All-in-One and provides easy deployment and maintenance with most features included in this one Nextcloud instance.

1 Like

Thanks a lot for pointing out the relevant part of the documentation!

I’m getting

You’ve set NEXTCLOUD_ADDITIONAL_APKS but not to an allowed value.
It needs to be a string. Allowed are small letters a-z, digits 0-9, spaces, hyphens, dots and ‘_’.
It is set to ‘“imagemagick”’.

imagemagick is installed (Version: ImageMagick 6.9.11-60)

I tried it with tesseract-ocr only - same result.

It somehow doesn’t recognize it as string. I’m using Portainer Stacks (docker compose):


environment:
- APACHE_PORT=8100
- APACHE_IP_BINDING=0.0.0.0
- NEXTCLOUD_DATADIR=…
- NEXTCLOUD_MOUNT=…
- NEXTCLOUD_ADDITIONAL_APKS=“imagemagick”

I tried it with " " or ’ '. Any idea what else I could try?
Tanks!

try - NEXTCLOUD_ADDITIONAL_APKS=imagemagick (without quotes)

1 Like

What is the total command to install OCR lang etc. I try’ed many wizard spells.

sudo docker exec --user www-data -it nextcloud-aio-nextcloud php occ ocrmypdf tesseract-ocr-deu

You need to add it to the variable to add it permanently

“docker exec” is for starting programs that are inside the running container.

Instead of

sudo docker exec --user www-data -it nextcloud-aio-nextcloud php occ ocrmypdf tesseract-ocr-deu

use something like

sudo docker run … --env NEXTCLOUD_ADDITIONAL_APKS=“ocrmypdf tesseract-ocr-deu” nextcloud/all-in-one:latest

to start your NC-AIO with the environment variable NEXTCLOUD_ADDITIONAL_APKS. This tells your container which external apps it should be able to use. Don’t forget to install them on your server first.

Thanks again. You’re right. no quotes did the trick

Must i first install it localy?

Unable to find image ‘tesseract-ocr:latest’ locally
docker: Error response from daemon: pull access denied for tesseract-ocr, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.

no. Adding them to the variable and then restarting the containers from the AIO interface should be enough.

How do you do that?

Whats the command?

See GitHub - nextcloud/all-in-one: Nextcloud AIO stands for Nextcloud All-in-One and provides easy deployment and maintenance with most features included in this one Nextcloud instance.

@szaimen Your documentation is not clear.

~$ sudo docker run --env NEXTCLOUD_ADDITIONAL_APKS=ocrmypdf tesseract-ocr nextcloud/all-in-one:latest
Unable to find image ‘tesseract-ocr:latest’ locally
docker: Error response from daemon: pull access denied for tesseract-ocr, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.
See ‘docker run --help’.
sudo docker exec --env NEXTCLOUD_ADDITIONAL_APKS=ocrmypdf tesseract-ocr nextcloud/all-in-one:latest
Error response from daemon: No such container: tesseract-ocr
ncadmin@mysharebox:~$

It is stated:

If it was started already, you will need to stop the mastercontainer, remove it (no data will be lost) and recreate it using the docker run command that you initially used

And

You can do so by adding --env NEXTCLOUD_ADDITIONAL_APKS="imagemagick dependency2 dependency3" to the docker run command of the mastercontainer (but before the last line nextcloud/all-in-one:latest !

So

First stop all containers, then delete de master.
Then run

sudo docker run
–sig-proxy=false
–name nextcloud-aio-mastercontainer
–restart always
–publish 80:80
–publish 8080:8080
–publish 8443:8443
–volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config
–volume /var/run/docker.sock:/var/run/docker.sock:ro
–env NEXTCLOUD_ADDITIONAL_APKS=ocrmypdf tesseract-ocr
–env NEXTCLOUD_ADDITIONAL_APKS=ocrmypdf tesseract-ocr-eng
nextcloud/all-in-one:latest

Start the containers.

It should be something like

sudo docker run \
--sig-proxy=false \
--name nextcloud-aio-mastercontainer \
--restart always \
--publish 80:80 \
--publish 8080:8080 \
--publish 8443:8443 \
--volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config \
--volume /var/run/docker.sock:/var/run/docker.sock:ro \
--env NEXTCLOUD_ADDITIONAL_APKS="ocrmypdf tesseract-ocr tesseract-ocr-eng" \
nextcloud/all-in-one:latest

Still in the flow no lang option, the list is empty

Did you already restart the containers from the AIo interface after starting the mastercontainer with the fresh command?

I stopped all containers, delete master

sudo docker run --sig-proxy=false --name nextcloud-aio-mastercontainer --restart always --publish 80:80 --publish 8080:8080 --publish 8443:8443 --volume nextcloud_aio_mastercontainer:/mnt/docker-aio-config --volume /var/run/docker.sock:/var/run/docker.sock:ro --env NEXTCLOUD_ADDITIONAL_APKS=“ocrmypdf tesseract-ocr tesseract-ocr-eng” nextcloud/all-in-one:latest

Then this showedup

Trying to fix docker.sock permissions internally…
Adding internal www-data to group ping
Initial startup of Nextcloud All-in-One complete!
You should be able to open the Nextcloud AIO Interface now on port 8080 of this server!
E.g. https://internal.ip.of.this.server:8080

If your server has port 80 and 8443 open and you point a domain to your server, you can get a valid certificate automatically by opening the Nextcloud AIO Interface via:
https://your-domain-that-points-to-this-server.tld:8443
{“level”:“info”,“ts”:1698773046.9023373,“msg”:“using provided configuration”,“config_file”:“/Caddyfile”,“config_adapter”:“”}
[31-Oct-2023 17:24:06] NOTICE: fpm is running, pid 120
[31-Oct-2023 17:24:06] NOTICE: ready to handle connections
[Tue Oct 31 17:24:07.012384 2023] [mpm_event:notice] [pid 112:tid 140029440207688] AH00489: Apache/2.4.58 (Unix) OpenSSL/3.1.3 configured – resuming normal operations
[Tue Oct 31 17:24:07.013059 2023] [core:notice] [pid 112:tid 140029440207688] AH00094: Command line: ‘httpd -D FOREGROUND’
Pr