Nextcloud 18 + Nextcloud Talk 8| php-fpm and mysqld eat CPU Time (Load Average : 6.54)

I recently moved a whole Nextcloud Instance from an old dedicated Corei3 Machine with old hdds to a a vserver 800gb ssd, 8vcores and 32gb ram - running arch linux.

this instance is accessed by 500 users daily and on the old machine it was running really smooth…
2-10% cpu not much ram.

on the new machine mysqld wastes 150% cpu during peaks and several php-fpm processes 30-60% which sums up to an average load of more than 6 !

on the old machine i ran php 7.3 on ubuntu which is the only difference i know of :innocent:
(of course this is a virtual machine now… but not a bad one… could it be … ? …)

im running

nextcloud/18.0.3
nginx/1.16.1
mysqld/10.4.12-MariaDB
php/7.4.4

i am using apcu for local caching and redis for the rest
all recommended extensions are active and configured :white_check_mark:

the number of mysql connections can’t be the problem because the cpu load is sometimes very high with 26 active connections and almost zero another time with 65 connections.
sql slowlog shows nothing
processlist shows 2-3 running queries and 10 sleeping processes during heavy load… doesn’t seem too much to me?

the sql database uses ~300MB
all of the files ~200GB

in php slowlog on the other hand the following scripts show up constantly

script_filename = /srv/http/cloud/remote.php
script_filename = /srv/http/cloud/ocs/v2.php

i’ve reduced the number of childprocessen for php-fpm down to 20 but the php log complains it’s not enough… so i will probably go back to 60 because it didn’t change anything on the cpu load but it produced lags onsite…

i reworked every config a hundret times now and read the nextcloud/latest manual twice… used all of the config examples there… nothing changes…

tried with http2 and without
with query cache and without
php, mysql, nginx log show no errors…

nevertheless the whole instance sometimes feels buggy… talk sometimes doesn’t show the chat history on first try. usermanagement shows users twice on first try… if i move a folder and enter another the moved folder is listet in the other folder until i reload…

one last thing:
nextcloud log itself shows a LOT of php errors… i tried a clean installation on the same machine to make sure it’s not my old instance thats buggy - same !

Error	PHP	Trying to access array offset on value of type bool 
Error	PHP	Trying to access array offset on value of type int 
Fatal	webdav	Sabre\DAV\Exception\BadRequest: Expected filesize of 
Error	PHP	imagettftext(): Problem doing text layout at 
Error	text	OCP\Files\NotFoundException: 

i really hope someone comes up with an idea…
thank you very much in advance !!

1 Like

First of all, nice post - informative!

Second, you could use the Nextcloud VM scripts as a reference. Do they cause the same errors?

thank you!
reading your configuration scripts finally showed me how to get rid of “rich workspaces” (my users always accidentially create readme.md files ) :innocent:

as for the php errors… these seem to be related to php 7.4 (i guess… at least i don’t see them on the old installation with ubuntu && php 7.3)

i would have downgraded to 7.3 days ago but on arch (rolling release) this is kinda not trivial ^^ (and nextcloud should be working with 7.4 anyway)

unfortunately my nextcloud instance is already running and there is not much room for experiments on this server (the service is under heavy use right now because of corona home-schooling)

Check out the 20.04_testing branch - it’s 7.4. :slight_smile:

1 Like

don’t get me wrong… but this still installs nextcloud/18.0.4 right? on ubuntu… i don’t see how this could help me figure out what the problem in my configuration is…

i just installed a nextcloud daily snapshot parallel to my current installation with a blank new database and it spammed the logs with

Error PHP Trying to access array offset on value of type null at /srv/http/nextcloud/3rdparty/leafo/scssphp/src/Compiler.php#5230

but i don’t think this is even related to the heavy cpu load…

i’m browsing the install scripts to figure out possible configuration option i could try…
https://github.com/nextcloud/vm/blob/20.04_testing/nextcloud_install_production.sh

i also ran every occ command to make sure the database is in a sane state…

currently the machine is under heavy load (load average 5.80)
the mysql command “show processlist;” shows:

MariaDB [(none)]> SHOW PROCESSLIST;
+---------+-------------+-----------+-----------+---------+------+--------------------------+------------------+----------+
| Id      | User        | Host      | db        | Command | Time | State                    | Info             | Progress |
+---------+-------------+-----------+-----------+---------+------+--------------------------+------------------+----------+
|       1 |  user |           | NULL      | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|       2 |  user |           | NULL      | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|       3 |  user |           | NULL      | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|       4 |  user |           | NULL      | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|       5 |  user |           | NULL      | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
| 1225557 | cadm  | localhost | nextcloud | Sleep   |   24 |                          | NULL             |    0.000 |
| 1225710 | cadm  | localhost | nextcloud | Sleep   |   16 |                          | NULL             |    0.000 |
| 1225731 | cadm  | localhost | nextcloud | Sleep   |   14 |                          | NULL             |    0.000 |
| 1225755 | cadm  | localhost | nextcloud | Sleep   |   14 |                          | NULL             |    0.000 |
| 1225770 | cadm  | localhost | nextcloud | Sleep   |   13 |                          | NULL             |    0.000 |
| 1225779 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
| 1225783 | cadm  | localhost | nextcloud | Sleep   |   13 |                          | NULL             |    0.000 |
| 1225840 | cadm  | localhost | nextcloud | Sleep   |   10 |                          | NULL             |    0.000 |
| 1225895 | cadm  | localhost | nextcloud | Sleep   |    7 |                          | NULL             |    0.000 |
| 1225911 | cadm  | localhost | nextcloud | Sleep   |    6 |                          | NULL             |    0.000 |
| 1225919 | cadm  | localhost | nextcloud | Sleep   |    6 |                          | NULL             |    0.000 |
| 1225933 | cadm  | localhost | nextcloud | Sleep   |    5 |                          | NULL             |    0.000 |
| 1225998 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
| 1226034 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
| 1226039 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
| 1226055 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
| 1226057 | cadm  | localhost | nextcloud | Sleep   |    0 |                          | NULL             |    0.000 |
+---------+-------------+-----------+-----------+---------+------+--------------------------+------------------+----------+

max used connecitons: 26

it seems all processes are sleeping… and should not use much cpu horsepower… right?
is this okay for mysql ?

if so… i could start concentrating on php-fpm or other things

Sorry, the VM scripts changed from MariaDB/MySQL some years ago now.

If you want and have time. try PostgreSQL instead.

A good read for you: https://www.techandme.se/we-migrated-to-postgresql/

Please @enoch85 me when writing, else I might miss it.

@enoch85 although i don’t think it is actually a mariadb/sql problem but a php-fpm/nginx misconfiguration i’m trying to find here, i’m willing to try another database (especially if what i just read in your blog post is true - concerning the performance)

but…
is it really that easy ? (this is from your script)

# Create DB
cd /tmp || exit
sudo -u postgres psql <<END
CREATE USER $NCUSER WITH PASSWORD '$PGDB_PASS';
CREATE DATABASE nextcloud_db WITH OWNER $NCUSER TEMPLATE template0 ENCODING 'UTF8';
END
check-command service postgresql restart

# Convert DB
sudo -u www-data php /var/www/nextcloud/occ db:convert-type --all-apps --password "$PGDB_PASS" pgsql $NCUSER 127.0.0.1 nextcloud_db
sudo -u www-data php /var/www/nextcloud/occ maintenance:repair

this reads like: create a postgre database and user, then convert your nextcloud database to a postgre database… the docs say that the occ command also makes the necessary changes in the config — and that’s it?

i’m going to give this a try on a separate clean installation i run in parallel on the same server…
any tips / caveats i need to know of before i try ?

thx !!

Yes, it’s really that easy! :slight_smile:

so… i…

  • informed everybody of going into maintenance mode at 10pm

  • switched to maintenance mode
    sudo -u http php occ maintenance:mode --on

  • made a backup of my mariadb nextcloud database
    mysqldump --single-transaction -h localhost -u **** -p nextcloud > nextcloud-sqlbkp_date +“%Y%m%d”.bak

  • made a snapshot of the vserver

  • created the postgresql user & database
    sudo -u postgres psql <<END
    CREATE USER ***** WITH PASSWORD '******';
    CREATE DATABASE nextcloud_db WITH OWNER ******* TEMPLATE template0 ENCODING 'UTF8';
    END

  • ran

occ db:convert (does everything - even alters config.php)
occ maintenance:repair
occ db:convert-filecache-bigint
occ db:add-missing-indices

  • removed mysql.utf8mb4 line from config.php

and thats it… the conversion took 30 minutes…
now i’m waiting for my users to come back :wink:

did it help?
i guess i can tell tomorrow at 10am (usually the peek in homeschooling activities)

@enoch85
well… i think it DID help …

Load average went down from 5-7 to 2-3
mysqld vanished from the top of the processlist
and postgresql uses 2-10% max…

with mariadb i had to write a config file in order to optimize the db for nextcloud… aren’t there any postgresql settings i need to tweak… php.ini settings are already set up as stated in the docs…

why isn’t postges default ?

well… php-fpm is still very cpu hungry but without mysqld eating all of the cpu time the problem is not so big anymore…

now i’m continuing my php nginx investigation

thx for the postgresql tip!!! :grinning:

1 Like

Why PostgreSQL isn’t default you’f have to ask the devs. :slight_smile:

Great to hear everything is better. There’s some PHP-FPM tuning msde in the VM as well. Have a look. :wink:

Last Saturday I upgraded to Ubuntu 20.04 from 18.04. That means a lot of packages got upgraded including:

  • PHP-FPM from 7.2 to 7.4
  • MariaDB from 10.1 to 10.3
  • Nginx from 1.14.0 to 1.17.10
  • Redis (not sure about the versions)

And after that I also upgraded Nextcloud from 18.0.3 to 18.0.4. I suppose that was a mistake and I should have wait a few days before doing that.
Since that I have been observing a few minutes long CPU usage peaks, similar to what @valueerror described. During these peaks following processes utilize over 50% of CPU:

  • PHP-FPM (php-fpm: pool www)
  • MariaDB (mysqld)
  • Redis server

Haven’t done any deep digging yet but

  • I do not see any errors in any (nextcloud, syslog, php) log
  • I double checked that there weren’t any major config changes to mariadb, php-fpm configs, redis. If any found then they were mainly in mariadb config relating to barracuda and innodb stuff
  • All peaks occurred right after nextcloud’s cron.php was fired

I’m going to monitor this. For now my best bet would be on preview-generator but during the peaks like up to 30 previews were generated…

while the cron job is running php-fpm stays around 35% and postgre runs up to 55%… i guess cron.php is doing some heavy lifting here and i thought that’s ok

i couldn’t see any differences between 18.0.3 and 18.0.4 concerning that behaviour

in my opinion the only thing that made a difference was the upgrade from php 7.3 to 7.4

Ok, complete newbie here. I just tried following your steps in converting my db to postgresql .
I am using Nginx and mysql at the moment, on Ubuntu 20 something. Nextcloud is version 20.2 I think.

When I get to the convert command I get “Failed to connect to the database: An exception occurred in the driver: could not find driver”

Also when I do php -m I do not see the pgsql module loaded. I have tried adding the line extension=php_pgsql.so line to what I think is the correct config.php .

Any ideas as to what the problem might be?

EDIT:
Alright, this is bizar. I installed the php driver (I got a choice for php or php8.0, I am using php7.4). And installed that, I then created the user and the DB. For some reason I found that I did indeed need the php7.4-pgsql module. After that the driver loaded fine. But I found out the DB and user I created did not exist? Did not get any errors and seemed to create just fine. Ah well: probably just lack of coffee. Got it working now, thanks for your short tutorial!

EDIT2:
Alright, dunno why but for some reason after the DB conversion Apache2 started. Meaning NGINX could not open the socket and did not work. After removing Apache from the system everything works ok now. But kind of strange that one of the commands sets apache to auto start again?