Description:
I have recently started experiencing a strange issue with my Nextcloud AIO installation. Roughly every 6 to 8 hours, the entire instance becomes completely unresponsive — the web interface doesn’t load, the desktop client cannot connect, and the Talk client stops working as well.
However, in Portainer all AIO containers still show the status “Healthy”, as if everything was running normally.
At first, I thought the issue was caused by the Nextcloud Mail app, so I disabled it, but the problem continues to occur even with Mail disabled.
At the moment, the only way I can temporarily restore functionality is by restarting the Docker container nextcloud-aio-nextcloud. After that, everything works again for several hours before the issue reappears.
I have not been able to determine what exactly causes this situation or where to find logs that would explain it.
I would like to ask:
Are there any known reasons why the entire AIO instance could appear “healthy” but be completely dead from the user’s perspective?
What is the best way to diagnose this kind of issue?
Which logs should I check, and where exactly can I find them within the AIO setup?
Is there any recommended way to fix or recover the instance permanently, instead of having to restart the Nextcloud container manually?
During the time when Nextcloud was down, I checked the logs of the individual Nextcloud containers through Portainer. Unfortunately, I didn’t find anything that would point to an actual issue.
I was worried it could be caused by a sudden RAM spike, but both the Proxmox graphs and Grafana show that RAM usage stays constant without any sudden jumps.
I’ll wait for the next crash and might temporarily disable the ClamAV container.
Just a quick addition as to why I think it could be ClamAV.
While I didn’t experience the exact same issue, I can confirm that ClamAV can impact performance and cause lock-ups. I’m currently testing AIO and experienced a similar issue when uploading large files, as the chunks were being reassembled. The reassembly took forever, and everything stopped responding. Without ClamAV, however, I was able to upload the same 5 GB file with no issues, and the reassembly took only a few seconds.
By the way, the next thing I’d look at, if it wasn’t ClamAV, would be the Fulltext Search.
It could just be a coincidence causing all this. I often upload large files myself, but I haven’t noticed any issues so far.
Maybe it’s because my Nextcloud server has enough performance and hardware resources, so the problem doesn’t show up that clearly.
Ironically, this issue happens kind of “randomly.” It doesn’t matter whether it’s during synchronization or when the system is “idle”.
I’m definitely curious to see what you find out about the Fulltext Search.