[Docker] Broken occ. Failed to connect to database after upgrade

:warning: This issue respects the following points: :warning:

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I’ve searched it).
  • Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • Nextcloud Server is running on 64bit capable CPU, PHP and OS.
  • I agree to follow Nextcloud’s Code of Conduct.

Bug description

After performing an upgrade of my nextcloud docker-image from version 24 to version 25, the occ-command is broken entirely, while reporting that it failed to connect to the database (Postgres in my case). However according to the logs, Postgres is completely fine and no database version upgrade was performed.

Running any command with occ (even occ -h) results in the follwing error message:
Doctrine\DBAL\Exception: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[08006] [7] timeout expired in /var/www/html/lib/private/DB/Connection.php:139

I upgraded by server by turning on maintenance:mode, then pulled the new docker image w/ docker-compose, and restarted the new container. After the upgrade, I was unable to turn off maintenance mode, or use the occ-command in general.

Steps to reproduce

  1. sudo docker exec -u www-data -it nextcloud php occ maintenance:mode --on
  2. sudo docker-compose pull
  3. sudo docker-compose up -d
  4. sudo docker exec -u www-data -it nextcloud php occ maintenance:mode --off

Expected behavior

Expected the upgrade to go smoothly and not break occ.

Installation method

Community docker container

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

Apache (supported)

Database engine version

PostgreSQL

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 22.2.3 to 23.0.1)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

occ is broken.

List of activated Apps

occ is broken. Here is a list of installed apps instead:

drwxr-xr-x 11 www-data www-data 4096 Mar 20 02:57 activity
drwxr-xr-x  6 www-data www-data 4096 Sep 13  2021 admin_audit
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 bruteforcesettings
drwxr-xr-x 12 www-data www-data 4096 Mar 20 02:57 circles
drwxr-xr-x  6 www-data www-data 4096 Mar 20 02:57 cloud_federation_api
drwxr-xr-x  7 www-data www-data 4096 Mar 20 02:57 comments
drwxr-xr-x  6 www-data www-data 4096 Sep 13  2021 contactsinteraction
drwxr-xr-x  8 www-data www-data 4096 May  5  2022 dashboard
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 dav
drwxr-xr-x 10 www-data www-data 4096 Sep 13  2021 encryption
drwxr-xr-x  9 www-data www-data 4096 Oct 26 03:27 federatedfilesharing
drwxr-xr-x 10 www-data www-data 4096 Sep 13  2021 federation
drwxr-xr-x 11 www-data www-data 4096 Mar 20 02:57 files
drwxr-xr-x 11 www-data www-data 4096 Mar 20 02:57 files_external
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 files_pdfviewer
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 files_rightclick
drwxr-xr-x 10 www-data www-data 4096 Mar 20 02:57 files_sharing
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 files_trashbin
drwxr-xr-x  7 www-data www-data 4096 May  5  2022 files_versions
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 firstrunwizard
drwxr-xr-x 10 www-data www-data 4096 Mar 20 02:57 logreader
drwxr-xr-x  6 www-data www-data 4096 Sep 13  2021 lookup_server_connector
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 nextcloud_announcements
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 notifications
drwxr-xr-x  7 www-data www-data 4096 May  5  2022 oauth2
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 password_policy
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 photos
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 privacy
drwxr-xr-x  7 www-data www-data 4096 Sep 13  2021 provisioning_api
drwxr-xr-x  6 www-data www-data 4096 Mar 20 02:57 recommendations
drwxr-xr-x  8 www-data www-data 4096 Mar 20 02:57 related_resources
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 serverinfo
drwxr-xr-x 11 www-data www-data 4096 Sep 13  2021 settings
drwxr-xr-x  8 www-data www-data 4096 Oct 26 03:27 sharebymail
drwxr-xr-x 10 www-data www-data 4096 Mar 20 02:57 support
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 survey_client
drwxr-xr-x 11 www-data www-data 4096 Mar 20 02:57 suspicious_login
drwxr-xr-x 10 www-data www-data 4096 Mar 20 02:57 systemtags
drwxr-xr-x  9 www-data www-data 4096 Mar 20 02:57 text
drwxr-xr-x 10 www-data www-data 4096 Oct 26 03:27 theming
drwxr-xr-x  8 www-data www-data 4096 May  5  2022 twofactor_backupcodes
drwxr-xr-x 11 www-data www-data 4096 Mar 20 02:57 twofactor_totp
drwxr-xr-x  9 www-data www-data 4096 Sep 13  2021 updatenotification
drwxr-xr-x 11 www-data www-data 4096 May  5  2022 user_ldap
drwxr-xr-x  8 www-data www-data 4096 May  5  2022 user_status
drwxr-xr-x  7 www-data www-data 4096 Mar 20 02:57 viewer
drwxr-xr-x  6 www-data www-data 4096 May  5  2022 weather_status
drwxr-xr-x  8 www-data www-data 4096 May  5  2022 workflowengine

Nextcloud Signing status

system stuck in maintenance mode

Nextcloud Logs

Configuring Redis as session handler,
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.20.0.2. Set the 'ServerName' directive globally to suppress this message,
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.20.0.2. Set the 'ServerName' directive globally to suppress this message,
[Wed Mar 22 03:57:31.314287 2023] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.54 (Debian) PHP/8.1.17 configured -- resuming normal operations,
[Wed Mar 22 03:57:31.314322 2023] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

Additional info

No response

Sounds like a simple case of the database connection details not being correct or similar. How have you configured your NC’s DB connection settings? Generally speaking that is what you should focus on, and also more specific information is needed to debug this.

I haven’t done any database connection configuration as I use the standard docker image.
The docker containers are on the same network, are configured correctly from a docker-perspective (and were working fine before the update), and from what I can tell no connection attempt is even trying to be made.

What specific information would you need to debug this?

I am not the one to debug it - you are :smiley: We can just guide you in the right direction.

You know that your NC tells you that it times out when trying to connect to the database.

You can do several things, e.g. exec into the NC container and from within that see if you can manually connect to the databaes server. You can check the connection settings to make sure that they really match what should work. You can check that the database server is really running.

Which image are you using exactly? Are you saying it is an image with both NC and a database in it?

1 Like

I was using the generic you.
I’ll try out the suggestions, thanks.

I’m just using the standard Nextcloud docker image, with Postgres (Version 13.5 pinned), and Redis.

1 Like

Cheers. To avoid any misunderstandings, can you please point us to the exact image you use, e.g. the URL on Docker Hub or such?

Here you go mate.

I’m going through the NC docs to figure out what the config should look like right now.

Do you happen to know something about the occ command? Because I don’t quite understand why it wouldn’t even work when running occ -h. Seems like a php issue to me, unless occ always connects to the DB first.

Which exact tag are you using? Can you also please share the contents of your docker-compose.y[a]ml file?

Here is my docker-compose.yml:

version: '3.3'

volumes:
   nextcloud:
   nextcloud-db:

networks:
   nextcloud-net:
        external: true
   nginx-proxy-manager_default:
        external: true

services:

    postgres:
        container_name: postgres
        restart: always
        #command: --transaction-isolation=READ-COMMITTED --binlog-format=ROW
        volumes:
            - '/media/storagedrive/nextcloud-db:/var/postgresql/data'
        environment:
            - POSTGRES_USER=<REDACTED>
            - POSTGRES_PASSWORD=<REDACTED>
        #network_mode: nextcloud-net
        image: postgres:13.5
        networks:
            - nextcloud-net

    redis:
        image: redis:alpine
        networks:
            - nextcloud-net
        container_name: nextcloud-redis
        restart: always

    nextcloud:
        container_name: nextcloud
        restart: always
        ports:
            - 8080:80
        volumes:
            - '/media/storagedrive/nextcloud:/var/www/html'
        environment:
            - POSTGRES_DB=<REDACTED>
            - POSTGRES_USER=<REDACTED>
            - POSTGRES_PASSWORD=<REDACTED>
            - POSTGRES_HOST=<REDACTED>
            - NEXTCLOUD_TRUSTED_DOMAINS= 192.168.0.101:8080 <REDACTED> <REDACTED> <REDACTED>
            - REDIS_HOST=redis
        depends_on:
            - postgres
            - redis
        links:
            - postgres:postgres
        networks:
            - nextcloud-net
            - nginx-proxy-manager_default
        image: nextcloud:25.0.4

What exact tag of the nextcloud image are you using? The compose file asks for effectively :latest, but if you pulled or built the nextcloud image yourself somehow we don’t know what it is.

Anyway, as you can see you have configuration for which host the NC should connect to to reach PG. You should debug why the NC container is seemingly not succeeding in reaching the host named using the POSTGRES_HOST environment variable.

1 Like

Any idea if there’s any logging on that? Because as far as I can tell from the logs, NC isn’t even trying to reach out to the database. There isn’t even a failed or refused connection.

Which log are you looking at when you (don’t) see this?

Not really, I mean you have a message saying that it timed out when trying to connect, I think it’s reasonable to assume that it has then tried to connect.

What you should do is get into the container and see if you can reach the database from in there.

likely you see most important things on container STDOUT - use “docker logs” to see it (there are arguments to shown only last # of lines and real-time output)

maybe there is some DNS issue and the application simply doesn’t know where to connect… I agree it sounds strange why this should have worked before and doesn’t work after upgrade… but you need to perform troubleshooting step by step if you want to cover all possible issues and not just try&error on wild guesses.

Which log are you looking at when you (don’t) see this?

Docker logs, nextcloud.log inside the datafolder, postgres docker logs. No sign anywhere of a failed connection attempt in any way shape or form.

I’ll try and connect to the DB manually today, but I’m new to php and haven’t found how to in the docs yet.

likely you see most important things on container STDOUT - use “docker logs” to see it (there are arguments to shown only last # of lines and real-time output)

Yeah the docker logs are basically empty, even after trying to run the occ-command.

Yeah it’s strange that it doesn’t work after the upgrade. I tried to look at some other people’s docker-compose configs but mine seems to be the norm, having the PG DB on the same machine and stack as NC and communicating internally without exposing ports or the like.

I’ve seen one or two posts now on stack-exchange or github, where the issue was related to php configuration, namely the php module pgsql missing, which is also the case for me, but I’m not quite sure it’s ever been installed, and it’s not listed as a requirement in the nextcloud documentation.

However, I’m also not able to install packages from inside the container so I can’t really test if that’s the issue.

There are also some people suggesting that it’s a php.ini configuration issue. Now again here I’m not sure what it should be like as the nextcloud-docker configuration seems to be quite different from the documentation.
Here is what the php --ini output is for me:

Configuration File (php.ini) Path: /usr/local/etc/php
Loaded Configuration File:         (none)
Scan for additional .ini files in: /usr/local/etc/php/conf.d
Additional .ini files parsed:      /usr/local/etc/php/conf.d/docker-php-ext-apcu.ini,
/usr/local/etc/php/conf.d/docker-php-ext-bcmath.ini,
/usr/local/etc/php/conf.d/docker-php-ext-exif.ini,
/usr/local/etc/php/conf.d/docker-php-ext-gd.ini,
/usr/local/etc/php/conf.d/docker-php-ext-gmp.ini,
/usr/local/etc/php/conf.d/docker-php-ext-imagick.ini,
/usr/local/etc/php/conf.d/docker-php-ext-intl.ini,
/usr/local/etc/php/conf.d/docker-php-ext-ldap.ini,
/usr/local/etc/php/conf.d/docker-php-ext-memcached.ini,
/usr/local/etc/php/conf.d/docker-php-ext-opcache.ini,
/usr/local/etc/php/conf.d/docker-php-ext-pcntl.ini,
/usr/local/etc/php/conf.d/docker-php-ext-pdo_mysql.ini,
/usr/local/etc/php/conf.d/docker-php-ext-pdo_pgsql.ini,
/usr/local/etc/php/conf.d/docker-php-ext-redis.ini,
/usr/local/etc/php/conf.d/docker-php-ext-sodium.ini,
/usr/local/etc/php/conf.d/docker-php-ext-zip.ini,
/usr/local/etc/php/conf.d/nextcloud.ini,
/usr/local/etc/php/conf.d/opcache-recommended.ini,
/usr/local/etc/php/conf.d/redis-session.ini

a driver is definitely required to connect to the database. the requirement is documented in the docs and installed within docker container

image

so the driver should be there…

please double and triple check you config - maybe there is some dumb issue like a typo. please review the DB is up and running - maybe there is an issue there… you have not many tools for troubleshooting inside the container… the easiest might be to run something like “docker exec nextcloud curl ${POSTGRES_HOST} -v” which will definitely fail because there is no webserver running but this will show if the DB is visible in the DNS…

if this doesn’t help start installing troubleshooting tools in the container…

Sorry for being unclear. pdo_pgsql is installed, but pgsql is not.

DB is up and running. Installing toublieshooting tools inside the container doesn’t seem possible however, as apt-get can’t reach any debian domains for some reason. Might be a firewall thing?

I tried the curl command, and I think it can reach the database. Running sudo docker exec nextcloud curl postgres -v doesn’t give any issues and outputs:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.20.0.3:80...
  0     0    0     0    0     0      0      0 --:--:--  0:00:32 --:--:--     0^C

I’m very thankful for the help. Very kind of you two to help out, but I’ve spent two entire days on this already before making a post here and on GH, and this is not the first kind of dumb issue that I’ve ran into. And I don’t seem to be the first running into an issue like this, as I’ve seen quite a few similar posts, many of which don’t have any solution.
Considering to spend the rest of my limited vacation time to move my files out of nextcloud and into something simpler and a bit more reliable.
Still happy to try things to fix this, maybe structurally for NC or anyone with a similar issue in the future, but after running NC for 3 years I think it might not be suited for me.

If debian servers are not reachable inside the containers, dns resolving or netowrking doesnt seem to work which is completely unrelated to Nextcloud, just wanted to point this out. Basically you need to fix your docker networks as you already found out.

Docker seems to by default use the host-system’s DNS servers (which work fine). Using a non-default bridge network in docker (as I do) should not result in DNS resolution issues.
Also the NC container is able to fine the postgres DB fine under the name postgres, so that shouldn’t really affect the occ command not working.
Thanks for the suggestion though.