[SOLVED] Slow open of any application

Hi All,

This is just to share about issue and resolution.

The issue was that for quite some time, I’ve been experiencing weird slow down of my Nextcloud instance.

It took ~10s to open any application, either Dashboard, Files, or anything else.

It was weird as had this instance running previously on small Odroid N2 (4GB Arm v8).
Currently it is on i7-1360P, plenty of ram and SSDs for all but /data folder.
Architecture wise, it is running in docker as Php-fpm with Nginx frontend, backed up by Postgres and recently with Redis. All behind local PiHole as containers get IPs dynamically, etc.

Enabled all Php-fpm slow logging, same on Nginx - which showed the delay as at UI level. Postgres didn’t show any slow queries, all was below 300ms.
No signs of any load on any of components.

Reading through other questions, people complained about Nextcloud being slow on larger instances, with more than 1.5M files. Well I’ve more than 4M, yet it is just a family server.

What was missing (probably knowledge) was how to manually run php, reported by php-fpm to see which exact call takes time - this is where I’ve failed and chose to take another route in hopes for fast resolution. Other steps taken was another, fresh instance of Nextcloud with same apps and rough configuration of frontend, backend, etc.

This one was fast, super fast. It yet didn’t have all files. Next steps on the way were to copy all files, create same filecache size, etc. Copying DB wasn’t the best choice, as then it would have to go by elimination.

Got a gut feel that it smells closely to a timeout, DNS timeout.
That was good turn and good gut feel, as checking that turned out to be good shot.
Nextcloud issued number of queries and query about postgres followed another one, about what was set as LocalAI FQDN. This was a query which was being forwarded by PiHole to another DNS server which is down and as such did not respond, hence the ~10s of delay. Somehow PiHole did not respond to other queries whilst awaiting to finish that response - this seems to be a flaw in PiHole logic for same source client IP.

The solution was to remove the LocalAI FQDN, as this is what slowed down everything.
There was nothing in Nextcloud logs which could point any finger in that direction, even in Debug level (or I’ve missed it). Another good point learnt during investigation is that there’s nice calculator to set php-fpm settings based on memory: PHP-FPM Process Caluculator

This is not a complain/bug raise, it is to share with others what could be an issue.
If someone want’s to raise - it - please do, I’ve seen in much more serious situations Nextcloud team does not resolve issues (i.e. nextcloud desktop sync issue, or thai characters at server core), hence just sharing here.

With that said, I do not agree with what is being shared at multiple places that Nextcloud can’t keep up with more than 1.5M files - I’m 100% sure, Nextcloud has production customers/enterprises with hundreds times more than that. The architecture is pretty nicely set up and some clustering seems to be possible to, even if they don’t share documentation how to set it up.