Nextcloud eats all postgresql connections

Emi · June 16, 2021, 10:15am

Nextcloud version (eg, 20.0.5): 20.0.7
Operating system and version (eg, Ubuntu 20.04): Ubuntu 20.04
Apache or nginx version (eg, Apache 2.4.25): Apache 2.4.41
PHP version (eg, 7.4): 7.4
Database: Postgresql 12

The issue you are facing:
Nextcloud eats all postgresql connections and then it stops working until connections clear on their own

Is this the first time you’ve seen this error? (Y/N): No, it happens multiple times

Steps to replicate it:

My guess is that this is triggered by background jobs (cron.php) which is set to run every 5 minutes

The output of your Nextcloud log in Admin > Logging:

"Message":"Failed to connect to the database: An exception occurred in driver: SQLSTATE[08006] [7] FATAL:  remaining connection slots are reserved for non-replication superuser connections"

Additional info:

Postgresql connections are filled by same 2 queries:

SELECT * FROM "oc_appconfig" → which is in an idle state with wait_event = ClientRead
UPDATE "oc_appconfig" SET "configvalue" = '1623767609' WHERE ("appid" = 'user_ldap') AND ("configkey" = '_lastChange') AND (("configvalue" IS NULL) OR ("configvalue" <> '1623767609')) → which is in active state with wait_event_type = Lock; this looks like it’s updating the last time LDAP was checked for changes and some of these queries take even 6 minutes when the DB connections build up

My question is, how can I fix this and prevent the DB from reaching the max connections? Could it be a Nextcloud bug that doesn’t close the DB connections or maybe a misconfiguration on the postgresql server?

hmihov · October 22, 2021, 9:45am

Hey @Emi,

Did you find any fix for this ? I’m seeing the same behaviour on NC 21.0.5 with NGINX.

Emi · October 22, 2021, 10:32am

Hi @hmihov,

Unfortunately no, I haven’t manage to find the exact cause or some fix for this problem. When this started to happen one of the first things I did was upgrade Nextcloud. After that the problem stopped for a while, but came back after a month or so. Then I upgraded again if I recall correctly (running 21.0.4 at the time of writing this comment) and I don’t think it happened again ever since. At least not lately. But if you find any additional info on this issue I would be grateful for sharing.

Also some things to note: I manage multiple Nextcloud instances with (LDAP auth), but only this one instance encountered this issue. But it is also the biggest one I manage: 550 - 600 users coming from LDAP, ~100 active within an hour span, 17TB of data.

hmihov · October 25, 2021, 8:30am

oh, I regret now skipping the 21.0.4 version …

How many DB connection have you allowed to the database ?
I have similar sized instance, maybe about 450 users with ~110 online during the day, 40TB of data.

I’ve had to allow 400 DB connection to deal with the spikes of idle connections.

Emi · October 25, 2021, 9:10am

Only 150 and most of them are unused if the problem doesn’t appear. We didn’t increase the number of connections as we thought they will be eaten no matter how many we allow and instead tried to find/fix the problem, with accent on “tried”.

hmihov · October 25, 2021, 9:24am

I think we have another issue of eating/not closing connections too. I believe it started surfacing after we added more SMB storages and running the notify as a service. Might be separate than your issue though.

esprit-libre · January 30, 2024, 8:49am

Hello @Emi and @hmihov

I know this is very old but I face the same problem with the same SQL requests.
Nextcloud 27.1.5, PostgreSQL 13 and nginx on Debian 12.

What have you done to solve this? I also meet this problem recently even if it was working good before.

Thanks for your help.

tflidd · February 9, 2024, 5:05pm

They started the right way to check the number of users, the number of connections they allow in order to adjust the limits. What are you working with, how many parallel connections do you allow.

In terms of database connections, caching mechanisms can reduce the number of queries, but at a certain size of the setup, I suppose that this was already done.

esprit-libre · February 20, 2024, 8:31pm

Yes I did many investigations.

The problem appears since I upgrade from Nextcloud 26 to 27. Same postgresql config, more or less same usage (nb of users, apps, etc).

But since this upgrade, OOM-Killer terminate PostgreSQL regularly. I upgrade the number of simultaneous connections, and cache size. This is better but not solved.

I also suspicious about this NC27, and look at my MySQL instances. No OOM-killing but very high consumption of RAM. Unfortunately there were lot of move on this server, so I can’t compare RAM usage before and after Nextcloud upgrade.

In psql logs, I notice Deck SQL requests without LIMIT or WHERE. I don’t know if it concerns big data and produce RAM overusage, but this is not good practice IMHO.

If you have tips to identify apps or requests that are problematic, this is welcome!

mnoureldin · May 6, 2024, 10:12pm

Hi,

any news regarding this issue. I have the same issue and nextcloud pod crashes and restarts apparently because of this issue.

Is it the clean and “proper” way to just increase the postgresql max connections from 100 to something else? Or is there anything more correct to fix this?