Nextcloud AIO becomes unresponsive even though all containers are healthy

Description:
I have recently started experiencing a strange issue with my Nextcloud AIO installation. Roughly every 6 to 8 hours, the entire instance becomes completely unresponsive — the web interface doesn’t load, the desktop client cannot connect, and the Talk client stops working as well.

However, in Portainer all AIO containers still show the status “Healthy”, as if everything was running normally.

At first, I thought the issue was caused by the Nextcloud Mail app, so I disabled it, but the problem continues to occur even with Mail disabled.

At the moment, the only way I can temporarily restore functionality is by restarting the Docker container nextcloud-aio-nextcloud. After that, everything works again for several hours before the issue reappears.

I have not been able to determine what exactly causes this situation or where to find logs that would explain it.

I would like to ask:

  1. Are there any known reasons why the entire AIO instance could appear “healthy” but be completely dead from the user’s perspective?
  2. What is the best way to diagnose this kind of issue?
  3. Which logs should I check, and where exactly can I find them within the AIO setup?
  4. Is there any recommended way to fix or recover the instance permanently, instead of having to restart the Nextcloud container manually?

Environment:

  • Nextcloud AIO: 11.9.0 and also 11.10.0
  • Nextcloud Server: 31.0.9
  • OS: Ubuntu Server 24.04 LTS (latest updates)
  • Virtualized on: Proxmox VE 9
  • Reverse proxy: Nginx Proxy Manager
  • VM resources: 12 vCPUs and 16 GB RAM

Original topic here - Nextcloud AIO becomes unresponsive even though all containers are healthy · nextcloud/all-in-one · Discussion #6996 · GitHub

Hi, have you checked the server resource usage with htop for example once this happens?

@szaimen glad for your response.

I have checked btop
CPU usage - 3-6%

RAM over 8GB free to use.

docker stats

VM runs on NVME drive, so there is no I/O disk issues.

Hm… Can you post the output of sudo docker info here?

Here you are:

Client: Docker Engine - Community
 Version:    28.5.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.29.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.40.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 25
  Running: 25
  Paused: 0
  Stopped: 0
 Images: 35
 Server Version: 28.5.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b98a3aace656320842a23f4a392a33f46af97866
 runc version: v1.3.0-0-g4ca628d1
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-85-generic
 Operating System: Ubuntu 24.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 14.64GiB
 Name: nextcloud-aio
 ID: 4a4be920-ff83-4d48-bdbc-f4cd6ac04a2f
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

I’m not really sure what the cause is, and this is just a wild guess, but I would try uninstalling ClamAV and see if the problem still occurs.

1 Like

During the time when Nextcloud was down, I checked the logs of the individual Nextcloud containers through Portainer. Unfortunately, I didn’t find anything that would point to an actual issue.
I was worried it could be caused by a sudden RAM spike, but both the Proxmox graphs and Grafana show that RAM usage stays constant without any sudden jumps.

I’ll wait for the next crash and might temporarily disable the ClamAV container.

1 Like

Just a quick addition as to why I think it could be ClamAV.

While I didn’t experience the exact same issue, I can confirm that ClamAV can impact performance and cause lock-ups. I’m currently testing AIO and experienced a similar issue when uploading large files, as the chunks were being reassembled. The reassembly took forever, and everything stopped responding. Without ClamAV, however, I was able to upload the same 5 GB file with no issues, and the reassembly took only a few seconds.

By the way, the next thing I’d look at, if it wasn’t ClamAV, would be the Fulltext Search.

2 Likes

It could just be a coincidence causing all this. I often upload large files myself, but I haven’t noticed any issues so far.
Maybe it’s because my Nextcloud server has enough performance and hardware resources, so the problem doesn’t show up that clearly.

Ironically, this issue happens kind of “randomly.” It doesn’t matter whether it’s during synchronization or when the system is “idle”.

I’m definitely curious to see what you find out about the Fulltext Search.