Poor performance: Lazy initial state provider for status took [20+] seconds

Support intro

Sorry to hear you’re facing problems :slightly_frowning_face:

help.nextcloud.com is for home/non-enterprise users. If you’re running a business, paid support can be accessed via portal.nextcloud.com where we can ensure your business keeps running smoothly.

In order to help you as quickly as possible, before clicking Create Topic please provide as much of the below as you can. Feel free to use a pastebin service for logs, otherwise either indent short log examples with four spaces:

example

Or for longer, use three backticks above and below the code snippet:

longer
example
here

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can :heart:

Nextcloud version: 29.0.3
Operating system and version: NixOS 24.05, NixOS container (also reproduced on host)
Webserver version: Caddy 2.7.6 (also reproduced with Nginx 1.26.1)
PHP version: PHP 8.3.8

The issue you are facing: Very poor performance. Without any other clients attempting to connect, loading a page in the web client with an admin user who has no files takes 40+ seconds, usually even longer on subsequent page loads.

There is no apparent bottleneck. CPU, memory, and disk usage remain near-zero.

Is this the first time you’ve seen this error? No, this has eventually happened from a fresh db multiple times, and both in a container and on the host.

Steps to replicate it: On NixOS, run occ upgrade? I’m not sure if this is what actually caused it. I’m asking here for any feedback/what other logs I should collect first before I wipe the db and wait for the bug to trigger for a fourth time.

I’ve attempted maintenance:repair, as well as all db:add-missing-*.


Journals/logs and config can all be found in this gist. I’ve cut everything to just one instance of systemctl start container@nextcloud > wait for startup > https://nextcloud.REDACTED.ts.net/apps/files/files?XDEBUG_TRIGGER=Debugger > wait for page to load (~40s) > systemctl stop container@nextcloud.

The XDEBUG_TRIGGER did not seem to work as I expected, perhaps someone can tell me what I did wrong there.


I didn’t run it long enough for it to happen in the interest of keeping the logs consice, but there will also eventually build up a ton of php and postgres UPDATE waiting processes, all relating to the oc_authtoken table. My oc_authtoken table has only 8 rows.

You can see in the postgresql journal a few select ... from pg_stat_activity ...; calls. This was me in psql checking for blocking queries building up. It does not happen immediately, so this could be a side effect of whatever is actually going on. At one point there were ~40 queries blocked, and after shutting down phpfpm, it took 15-20 minutes for all of them to clear.


If there is other data which I should collect, please let me know. I hope that it is something very simple that I misconfigured.

Thank you!

What kind of system are you using?

For the setup, there are a few things to consider, do real system cronjobs (so the jobs are not done when you open a page), use caching in php (reduces the load on the database, you seem to have done that), use adapted database caches (so it reads frequently used stuff from memory and does not read from disk), …

There have been obscure things such as host name lookup for apache, for logging it tries to resolve a hostname for the ip, and this can take some time…
And for containers, if you go through reverse proxies, it can get a bit complicated. However, if it takes 40s, if you connect at 12:00:00, you should be able to go through the logs of your webservers and database, if they all get the request straight away and if there is one level that takes much longer.

Still another idea: just put a static web site instead of Nextcloud. Does it load quickly?

ODroid H3+, 32GB RAM

real system cronjobs

It uses real cronjobs. (nextcloud-cron.timer)

just put a static web site instead of Nextcloud

I’ve run Jellyfin simultaneously under the-same-subdomain.$TAILNET.ts.net/jellyfin/ without issue, if that answers your question. I’ll put up a static site as well.

So, the server restarted the container while I was out, and the postgres blocking queries built up, maxing out the db connections. Here’s the output I just got, and here’s a ps -ef filtered for php and postgres processes. I’m fairly certain this is happening without any clients connected.

if you connect at 12:00:00, you should be able to go through the logs of your webservers and database, if they all get the request straight away and if there is one level that takes much longer.

We can see that if we compare the HAR file I pulled from Firefox devtools and the postgres logs:

  • Firefox.HAR: "startedDateTime": "2024-07-05T15:17:00.192-05:00",
  • Postgres.journal: Jul 05 15:17:00 nextcloud postgres[832]: [832] LOG: statement: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED

The database is hit within a second. Looking closer at the logs, [832] seems to bridge the gap. If you want to see what I’m talking about, check out

curl https://gist.githubusercontent.com/xPMo/2af1f9d603a6da8e5be9e3a50dc8e7ed/raw/cc4ad46a22e71bbe689520853742f35884f60f72/postgresql.journal | grep -F '[832]'

Abridged to highlight the time jumps:

$ < postgresql.journal rg -F '[832]'
Jul 05 15:17:00 nextcloud postgres[832]: [832] LOG:  statement: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED
<SNIP>
Jul 05 15:17:00 nextcloud postgres[832]: [832] LOG:  statement: BEGIN
Jul 05 15:17:00 nextcloud postgres[832]: [832] LOG:  execute <unnamed>: INSERT INTO "oc_mounts" ("storage_id","root_id","user_id","mount_point","mount_id","mount_provider_class") SELECT $1,$2,$3,$4,$5,$6 FROM "oc_mounts" WHERE "root_id" = $7 AND "user_id" = $8 AND "mount_point" = $9 HAVING COUNT(*) = 0
Jul 05 15:17:00 nextcloud postgres[832]: [832] DETAIL:  parameters: $1 = '1', $2 = '1', $3 = 'root', $4 = '/root/', $5 = NULL, $6 = 'OC\Files\Mount\LocalHomeMountProvider', $7 = '1', $8 = 'root', $9 = '/root/'
Jul 05 15:17:00 nextcloud postgres[832]: [832] LOG:  statement: COMMIT
Jul 05 15:17:01 nextcloud postgres[832]: [832] LOG:  execute <unnamed>: UPDATE "oc_authtoken" SET "last_check" = $1, "last_activity" = $2 WHERE "id" = $3
Jul 05 15:17:01 nextcloud postgres[832]: [832] DETAIL:  parameters: $1 = '1720210620', $2 = '1720210620', $3 = '9'
Jul 05 15:17:18 nextcloud postgres[832]: [832] LOG:  execute <unnamed>: SELECT "provider_id", "enabled" FROM "oc_twofactor_providers" WHERE "uid" = $1
Jul 05 15:17:18 nextcloud postgres[832]: [832] DETAIL:  parameters: $1 = 'root'
<SNIP>
Jul 05 15:17:18 nextcloud postgres[832]: [832] LOG:  execute <unnamed>: UPDATE "oc_user_status" SET "status" = $1, "status_timestamp" = $2 WHERE "id" = $3
Jul 05 15:17:18 nextcloud postgres[832]: [832] DETAIL:  parameters: $1 = 'offline', $2 = '1720210638', $3 = '2'

Didn’t notice this until now, but it’s an UPDATE "oc_authtoken" statement, then 17 seconds before the next query.

There has to be something weird going on with either that table or that kind of query. I shut down php after making this post and postgres is slowly clearing the blocked queries, and will probably take over an hour to get through them all. 10 minutes later it’s down to 132 out of the original 159.

SELECT * from "oc_authtoken" is instant.

You have over 100 processes php processes running? You really want to allow so many parallel connections, this also consumes RAM…

I don’t know postgresql at all, but why are some queries blocked? If Nextcloud is waiting for it … this might explain a certain delay.
And logging everything all queries can take quite some resources as well, don’t you have an option to just log slow queries?
If you use some database caching, a number of updates to the sessions should be no problem, you could also use redis for the session handling:
https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/caching_configuration.html#using-the-redis-session-handler