Nextcloud docker postgres error "Temporary failure in name resolution"

I have recently got a new Nextcloud server up and running using docker compose (and Portainer).

It seems to work fine, no real problems or anything. But there is an error repeating multiple times every day in the nextcloud log. It is expressed in a few different ways, and in the Application column in the log in differs between index, PHP and core. But the main messgae is the same:

Failed to connect to the database: An exception occurred in the driver: SQLSTATE[08006] [7] could not translate host name "postgres" to address: Temporary failure in name resolution

The “postgres” in the error is (I assume) the name of my postgres container.
I´m not sure how to go about solving this. Is this a DNS issue? Could it be something else? What can i do to find out what is causing this?

The whole Nextcloud instance, and all of its containers, are in the same Portainer stack. Two networks are used. nextcloud-network and traefik-network. I created them using this command:

docker network create network-name

My containers : network(s):
collabora : nextcloud-network, traefik-network
nextcloud: nextcloud-network, traefik-network
nextcloud-backups: nextcloud-network
postgres: nextcloud-network
redis: nextcloud-network
traefik: traefik-network

Any obviuos error / mistake there? If someone can help, I´d be grateful.
Please let me know if more info is needed.

What Docker Engine version?

Please also post your Compose file.

Sure,

sudo docker version

returns this:

docker@docker:~$ sudo docker version
[sudo] password for docker:
Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

My compose file is based on the one from here: https://www.heyvaldemar.com/install-nextcloud-using-docker-compose.

I have edited the above, so my whole compose file, or Portainer stack as I use it, looks like this now (I have removed comments and tried to obscure values. I have variables loaded in Portainer stack, originally imported from .env file):

networks:
  nextcloud-network:
    external: true
  traefik-network:
    external: true

volumes:
  nextcloud-data:
  nextcloud-config:
  redis-data:
  nextcloud-postgres:
  nextcloud-postgres-backup:
  nextcloud-data-backups:
  nextcloud-database-backups:
  traefik-certificates:

services:
  postgres:
    container_name: postgres
    image: ${NEXTCLOUD_POSTGRES_IMAGE_TAG}
    volumes:
      - nextcloud-postgres:/var/lib/postgresql/data
    environment:
      TZ: ${NEXTCLOUD_TIMEZONE}
      POSTGRES_DB: ${NEXTCLOUD_DB_NAME}
      POSTGRES_USER: ${NEXTCLOUD_DB_USER}
      POSTGRES_PASSWORD: ${NEXTCLOUD_DB_PASSWORD}
    networks:
      - nextcloud-network
    healthcheck:
      test: [ "CMD", "pg_isready", "-q", "-d", "${NEXTCLOUD_DB_NAME}", "-U", "${NEXTCLOUD_DB_USER}" ]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s
    restart: unless-stopped

  redis:
    environment:
      TZ: ${NEXTCLOUD_TIMEZONE}
    image: ${NEXTCLOUD_REDIS_IMAGE_TAG}
    container_name: redis
    command: ["redis-server", "--requirepass", "$NEXTCLOUD_REDIS_PASSWORD"]
    volumes:
      - redis-data:/data
    networks:
      - nextcloud-network
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 60s
    restart: unless-stopped

  nextcloud:
#    hostname: ${NEXTCLOUD_HOSTNAME}
    image: ${NEXTCLOUD_IMAGE_TAG}
    container_name: nextcloud
    volumes:
      - nextcloud-data:${DATA_PATH}
      - nextcloud-config:${DATA_PATH}
      - /mnt/nextcloud/data:/var/www/html/data
      - /mnt/nextcloud/config:/var/www/html/config
    environment:
      TZ: ${NEXTCLOUD_TIMEZONE}
      POSTGRES_HOST: postgres
      DB_PORT: 5432
      POSTGRES_DB: ${NEXTCLOUD_DB_NAME}
      POSTGRES_USER: ${NEXTCLOUD_DB_USER}
      POSTGRES_PASSWORD: ${NEXTCLOUD_DB_PASSWORD}
      REDIS_HOST: redis
      REDIS_HOST_PORT: 6379
      REDIS_HOST_PASSWORD: ${NEXTCLOUD_REDIS_PASSWORD}
      NEXTCLOUD_ADMIN_USER: ${NEXTCLOUD_ADMIN_USERNAME}
      NEXTCLOUD_ADMIN_PASSWORD: ${NEXTCLOUD_ADMIN_PASSWORD}
      NEXTCLOUD_TRUSTED_DOMAINS: ${NEXTCLOUD_HOSTNAME}
      TRUSTED_PROXIES: ${TRAEFIK_IP}
      OVERWRITECLIURL: ${NEXTCLOUD_URL}
      OVERWRITEPROTOCOL: https
      OVERWRITEHOST: ${NEXTCLOUD_HOSTNAME}

    networks:
      - nextcloud-network
      - traefik-network
    expose:
      - "443"        
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80/"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 90s
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.nextcloud.rule=Host(`${NEXTCLOUD_HOSTNAME}`)"
      - "traefik.http.routers.nextcloud.service=nextcloud"
      - "traefik.http.routers.nextcloud.entrypoints=websecure"
      - "traefik.http.services.nextcloud.loadbalancer.server.port=80"
      - "traefik.http.routers.nextcloud.tls=true"
      - "traefik.http.routers.nextcloud.tls.certresolver=letsencrypt"
      - "traefik.http.services.nextcloud.loadbalancer.passhostheader=true"
      - "traefik.http.routers.nextcloud.middlewares=compresstraefik"
      - "traefik.http.middlewares.compresstraefik.compress=true"
      - "traefik.docker.network=traefik-network"

      #HSTS
      - "traefik.http.routers.nextcloud.middlewares=nextcloudHeader"
      - "traefik.http.middlewares.nextcloudHeader.headers.stsSeconds=15552000"
      - "traefik.http.middlewares.nextcloudHeader.headers.stsIncludeSubdomains=true"
      - "traefik.http.middlewares.nextcloudHeader.headers.stsPreload=true"
      - "traefik.http.middlewares.nextcloudHeader.headers.forceSTSHeader=true"
      
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      traefik:
        condition: service_healthy
        
  nextcloud-collabora:
    image: collabora/code:latest
    container_name: collabora
    restart: unless-stopped
    ports:
      - 127.0.0.1:9980:9980
    expose:
      - "9980"        
    environment:
      #should work as "domain=cloud1\.nextcloud\.com|cloud2\.nextcloud\.com"
      - TZ=${NEXTCLOUD_TIMEZONE}
      - domain=${COLLABORA_DOMAIN}
      - aliasgroup1=cloud.example.com
      - 'dictionaries=en_US,se_SE'
      - VIRTUAL_PROTO=http
      - VIRTUAL_PORT=9980
      - VIRTUAL_HOST=${COLLABORA_HOSTNAME}
      - username=${COLLABORA_USERNAME}
      - password=${COLLABORA_PASSWORD}
      - "extra_params=--o:ssl.enable=false  --o:ssl.termination=true"
    networks:
    - nextcloud-network
    - traefik-network
    cap_add:
      - MKNOD
    tty: true
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=traefik-network"
      - "traefik.http.routers.collabora.rule=Host(`office.example.com`)"
      - "traefik.http.routers.collabora.entrypoints=web"
      - "traefik.http.middlewares.collabora-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.collabora.middlewares=collabora-https-redirect"
      - "traefik.http.routers.collabora-secure.entrypoints=websecure"
      - "traefik.http.routers.collabora-secure.rule=Host(`office.example.com`)"
      - "traefik.http.routers.collabora-secure.tls=true"
      - "traefik.http.routers.collabora-secure.tls.certresolver=letsencrypt"

  traefik:
    image: ${TRAEFIK_IMAGE_TAG}
    container_name: traefik
    environment:
      TZ: ${NEXTCLOUD_TIMEZONE}
    command:
      - "--log.level=${TRAEFIK_LOG_LEVEL}"
      - "--accesslog=true"
      - "--api.dashboard=true"
      - "--api.insecure=true"
      - "--ping=true"
      - "--ping.entrypoint=ping"
      - "--entryPoints.ping.address=:8082"
      - "--entryPoints.web.address=:80"
      - "--entryPoints.websecure.address=:443"
      - "--providers.docker=true"
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.exposedByDefault=false"
      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
#      - "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
      - "--certificatesresolvers.letsencrypt.acme.email=${TRAEFIK_ACME_EMAIL}"
      - "--certificatesresolvers.letsencrypt.acme.storage=/etc/traefik/acme/acme.json"
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"
      - "--global.checkNewVersion=true"
      - "--global.sendAnonymousUsage=false"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - traefik-certificates:/etc/traefik/acme
    networks:
      - traefik-network
    ports:
      - "80:80"
      - "8081:8080"
      - "443:443"
    healthcheck:
      test: ["CMD", "wget", "http://localhost:8082/ping","--spider"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 5s
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.dashboard.rule=Host(`${TRAEFIK_HOSTNAME}`)"
      - "traefik.http.routers.dashboard.service=api@internal"
      - "traefik.http.routers.dashboard.entrypoints=websecure"
      - "traefik.http.services.dashboard.loadbalancer.server.port=8080"
      - "traefik.http.routers.dashboard.tls=true"
      - "traefik.http.routers.dashboard.tls.certresolver=letsencrypt"
      - "traefik.http.services.dashboard.loadbalancer.passhostheader=true"
      - "traefik.http.routers.dashboard.middlewares=authtraefik"
      - "traefik.http.middlewares.authtraefik.basicauth.users=${TRAEFIK_BASIC_AUTH}"
      - "traefik.http.routers.http-catchall.rule=HostRegexp(`{host:.+}`)"
      - "traefik.http.routers.http-catchall.entrypoints=web"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
    restart: unless-stopped

  backups:
    image: ${NEXTCLOUD_POSTGRES_IMAGE_TAG}
    container_name: nextcloud-backups
    command: >-
      sh -c 'sleep $BACKUP_INIT_SLEEP &&
      while true; do
        pg_dump -h postgres -p 5432 -d $NEXTCLOUD_DB_NAME -U $NEXTCLOUD_DB_USER | gzip > $POSTGRES_BACKUPS_PATH/$POSTGRES_BACKUP_NAME-$(date "+%Y-%m-%d_%H-%M").gz &&
        tar -zcpf $DATA_BACKUPS_PATH/$DATA_BACKUP_NAME-$(date "+%Y-%m-%d_%H-%M").tar.gz $DATA_PATH &&
        find $POSTGRES_BACKUPS_PATH -type f -mtime +$POSTGRES_BACKUP_PRUNE_DAYS | xargs rm -f &&
        find $DATA_BACKUPS_PATH -type f -mtime +$DATA_BACKUP_PRUNE_DAYS | xargs rm -f;
        sleep $BACKUP_INTERVAL; done'
    volumes:
      - nextcloud-postgres-backup:/var/lib/postgresql/data
      - nextcloud-data:${DATA_PATH}
      - nextcloud-data-backups:${DATA_BACKUPS_PATH}
      - nextcloud-database-backups:${POSTGRES_BACKUPS_PATH}
    environment:
      TZ: ${NEXTCLOUD_TIMEZONE}
      NEXTCLOUD_DB_NAME: ${NEXTCLOUD_DB_NAME}
      NEXTCLOUD_DB_USER: ${NEXTCLOUD_DB_USER}
      PGPASSWORD: ${NEXTCLOUD_DB_PASSWORD}
      BACKUP_INIT_SLEEP: ${BACKUP_INIT_SLEEP}
      BACKUP_INTERVAL: ${BACKUP_INTERVAL}
      POSTGRES_BACKUP_PRUNE_DAYS: ${POSTGRES_BACKUP_PRUNE_DAYS}
      DATA_BACKUP_PRUNE_DAYS: ${DATA_BACKUP_PRUNE_DAYS}
      POSTGRES_BACKUPS_PATH: ${POSTGRES_BACKUPS_PATH}
      DATA_BACKUPS_PATH: ${DATA_BACKUPS_PATH}
      DATA_PATH: ${DATA_PATH}
      POSTGRES_BACKUP_NAME: ${POSTGRES_BACKUP_NAME}
      DATA_BACKUP_NAME: ${DATA_BACKUP_NAME}
    networks:
      - nextcloud-network
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy

I don’t see the problem in your compose. few recommendations but nothing serious.

you don’t really need an external network nextcloud-network as compose itself creates an internal network {project-default} but this is only a small improvement and not a reason of your problem…

this looks like you mount data and config twice?

I think you have to troubleshoot your docker installation

  • first check if postgres container maybe restarts from time to time for some reason? use docker compose ps to see the uptime of containers… use docker logs postgres to see postgres container logs
  • review containers using docker inspect {container name} (double check if something weird is in "Dns": [], "DnsOptions": [], "DnsSearch": [], "ExtraHosts": [],)
  • review networks with docker network inspect nextcloud-network to verify the container is always connected to the network…
  • continously run docker compose exec nextcloud getent hosts postgres (or docker-compose for old compose versions) to verify DNS resolution for “postgres” on “nextcloud” container - if it fails/changes this could provide some hints…

This really looks like some great pointers for me to dive into. I am away from this stuff for a while now, but will look into this when I get back next week. I will get back with results and probably more questions. Thank you so much for now!