After upgrade to 27.0.2 : Error 502 / error 504 after some time

rastaferraille · September 12, 2023, 4:03am

Hello,

Since upgrade to Debian bookworm and php8.2 plus ugrade of nextcloud 25 to nextcloud 27.0.2, we’re facing random 502 or 504 webserver errors. We are using nginx web server.

To get it work again, we need to restart php8.2-fpm service.
We’re using opcache and redis-server.
Nginx is configured to use a unix socket.

We were using Debian 11 and php7.4 before without any problem.

Unfortunately, we do not have releavant errors in nextcloud.log file nor in other system logs.

We’ve changed the pm.* values in pool.d without managing to resolve the issue.

Thank you,

rastaferraille · September 12, 2023, 6:58am

I found an error in php logs that confirms that it is a php performance setting problem

Here are the log extract :

seems busy (you may need to inc
rease pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there
are 0 idle, and 100 total children
[11-Sep-2023 13:53:05] WARNING: [pool nextcloud] seems busy (you may need to inc
rease pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, ther
e are 0 idle, and 106 total children
[11-Sep-2023 13:53:06] WARNING: [pool nextcloud] seems busy (you may need to inc
rease pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, ther
e are 0 idle, and 112 total children

I modified the settings and restarted the php service with those values :

[nextcloud]

pm = dynamic
pm.max_children = 432
pm.start_servers = 108
pm.min_spare_servers = 108
pm.max_spare_servers = 324
pm.max_requests = 500

We have 800 users, 1TB of files. The server has got 16GB of RAM.

Do you think that those values are correct or at least realistic ?

Thank you,

rastaferraille · September 12, 2023, 9:24am

Now we have a significant error in Nextcloud log.

The error is related to the database : “Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [1040] Too many connections”,

Maria-db is coming on another server with the default values. It was working normaly before the upgrade of Nextcloud.

We increased max_connections on maria_db server to 500 but it still crashes.

Could it be linked in a way or another to redis-server ?

quentingrap · September 12, 2023, 10:31am

Hello,

i’m facing similar issues on Nexcloud 26 in a new server we have just migrated to.
We noticed that some old sync clients causes huge re-upload of old files and it doesn’t help, but it doesn’t seem to be enough.

php-fpm logs are not good
here is an example before it crash.

[11-Sep-2023 10:48:59] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 281 idle, and 735 total children
[11-Sep-2023 10:51:30] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 282 idle, and 756 total children
[11-Sep-2023 10:51:54] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 278 idle, and 766 total children
[11-Sep-2023 10:51:55] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 282 idle, and 771 total children
[11-Sep-2023 10:52:43] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 248 idle, and 790 total children
[11-Sep-2023 10:52:44] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 258 idle, and 798 total children
[11-Sep-2023 10:52:45] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 274 idle, and 814 total children
[11-Sep-2023 10:52:46] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 282 idle, and 823 total children
[11-Sep-2023 10:53:17] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 242 idle, and 839 total children
[11-Sep-2023 10:53:18] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 249 idle, and 847 total children
[11-Sep-2023 10:53:19] WARNING: [pool www] server reached pm.max_children setting (851), consider raisin

I raise pm.max_children to 1500 randomly because it’s hard to find good documentation in Internet for server like mine.
We have 1.4Tb of datas.
Server has 94Gb of RAM, and all tutorials are made for smaller server so i don’t really know how to adjust

For the moment here are the parameters in www.conf

pm.max_children = 1500
; Default Value: (min_spare_servers + max_spare_servers) / 2
pm.start_servers = 425
pm.min_spare_servers = 283
pm.max_spare_servers = 800
; Default Value: 32
;pm.max_spawn_rate = 32
; Default Value: 10s
;pm.process_idle_timeout = 10s;
pm.max_requests = 2000

quentingrap · September 12, 2023, 12:26pm

Here the steps i followed for the moment (i will update this post)

glances says that there is often CPU_IOWAIT
So I search on the internet.
See (with journalctl --disk-usage) that my logs were big (4Go)
So I clean up one time with
sudo journalctl --vacuum-size=100M
Then modify parameter SystemMaxUse=100M in /etc/systemd/journald.conf and systemctl daemon-reload

Also, my nginx seems to use lot of cpu, my theory is that it waits to write on disk for log, so i change in my /etc/nginx/nginx.conf access_log on to access_log off

There was a user with full quota but its computer tries infinitely to upload last file.
It pollutes logs and maybe traffic ? I fixed it with him and update sync client to last version 3.9.4 (maybe it doesn’t try to upload infinitely)

rastaferraille · September 13, 2023, 9:25am

We did some modifications that seams to have (at least for the moment) solved the problem.

Here are the steps that we’ve followed :

Add quota to users that reached the limit
Upgraded all clients to last version (work still in progress).
disabled access log by adding access_log off; in nginx.conf
Changed the values of pool.d/nextcloud.conf :

pm.max_children = 532
pm.start_servers = 108
pm.min_spare_servers = 108
pm.min_spare_servers = 324
pm.max_requests=500

Changed max connections on mariadb server to 800
Changed mencache.local in config.php from redis to APCu
Changed opcache configuration in php.ini (can’t exactly remember which parameter we modidied, but here are the overridden values):

opcache.enable_cli=1
        opcache.memory_consumption=512
        opcache.interned_strings_buffer=32
        opcache.max_accelerated_files=10000
        opcache.revalidate_freq=60
        opcache.save_comments=1

rastaferraille · September 13, 2023, 2:09pm

The server has worked for 24 hours but php crashed again with the same error message…
Still investigating…

ralphy95 · September 13, 2023, 9:29pm

i have exact the same issue…

ralphy95 · September 13, 2023, 9:29pm

tried doing so much stuff already, let me please know if you have an resolution, i will also keep trying… best of luck

quentingrap · September 14, 2023, 7:29am

What does your php and mariadb/mysql logs says ?

For me in /var/log/php8.1-fpm.log and /var/log/mysql/mysql_error.log

rastaferraille · September 14, 2023, 10:28am

mariadb logs were in syslog in my case

Something interesting !

There are a lot of errors here, like :

Aborted connection to db (Got an error reading communication packets)
Also some Too many connections errors

php-fpm log says : Warning seems busy, you may need to increase pm.start_servers etc.

quentingrap · September 14, 2023, 12:52pm

Exactly the same logs errors for mysql and php-fpm

[14-Sep-2023 11:39:27] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 280 idle, and 458 total children

I think they were cause by nginx spamming the disk with its access logs, and in fact, it stops when i stop nginx log.
They reappears today but no that’s much.

Also, that’s why i raise pm.max_spare_servers

quentingrap · September 14, 2023, 12:58pm

We also noticed that your database went to ~1,5Gb to ~30Gb in few days juste because of the the table oc_activity that went biiiiig, because of old sync clients re-uploading files and files over and over.

For sure, it doesn’t help speed writing in database.

In my config/config.php, we have 'activity_expire_days' => 14, so it begin to clean that after 14 days. You might want to set to a fewer number to speed the cleaning

rastaferraille · September 14, 2023, 6:42pm

Hello, Thank you for your feedbacks !

I’ll check the database size tomorrow.

Thanks,

rastaferraille · September 15, 2023, 6:18am

My dump file is about 2.1GB.
I c’ant remember precisely what size it had before the upgrade.
I could check in backups, but i think it was approximatively around this size.

quentingrap · September 20, 2023, 6:50am

How it goes @rastaferraille and @ralphy95 ?
For me, it’s good since my last changes

rastaferraille · September 21, 2023, 4:57am

HI,

Thanks for asking !

I’m glad that it is ok for you !

It’s better, but we still have a crash per day.
Mysql logs are still showing errors, but it seams that the php-fpm crash does not occur at the same time.

Still working on that ! I will keep you informed.

Thanks,

quentingrap · September 27, 2023, 8:11am

Hello !

After a user error that delete thousand and thousand of files, i had to restore them, creating pick of server demand and… it crashs, I had to restart php-fpm and mariadb…
What is weird is that the RAM reaches a plateau to ~30%

I just read this awesome blog note : PHP-FPM tuning: Using 'pm static' for max performance
I think next time i’m gonna change my php-fpm conf to pm = static

Using ‘pm static’ to achieve your server’s max performance

The PHP-FPM pm static setting depends heavily on how much free memory your server has. If you suffer from low server memory, then pm ondemand or dynamic maybe be better options. On the other hand, if you have the memory available, you can avoid much of the PHP process manager (PM) overhead by setting pm static to the max capacity of your server.

In other words, when you do the math, pm.static should be set to the max amount of PHP-FPM processes that can run without creating memory availability or cache pressure issues. Also, not so high as to overwhelm CPU(s) and have a pile of pending PHP-FPM operations.

What do you think of that @rastaferraille ?

rastaferraille · September 27, 2023, 7:42pm

Hey thank you Quentin ! we’ve just decided to make this adjustment today following the advice of our partner that is now helping us on that case. We will soon see if it makes a difference. I keep you in touch.

quentingrap · September 28, 2023, 3:16pm

Which adjustement ?