The cron.php tasks run for hours, constant CPU load, queries on oc_filecache

Nextcloud version (eg, 18.0.2): 18.0.6
Operating system and version (eg, Ubuntu 20.04): Debian 10.4
Apache or nginx version (eg, Apache 2.4.25): Apache/2.4.38 (Debian)
PHP version (eg, 7.1): 7.3.14-1~deb10u1

The issue you are facing:

CPU has been near 100% for over 24 hours now, on a dedicated VPS running only Nextcloud, with one client (me) who is not doing anything. All the time cpu.cron appears to be running at least once. It is triggered by a 15 minute job in cron.d and Nextcloud config is set to cron. Looking at the database processlist, there is nearly always a query running at least once: SELECT path FROM oc_filecache WHERE (storage = 3) AND (size < 0) ORDER BY fileid DESC. This will produce about 90,000 results from a table of over 200,000 entries. Nextcloud is naturally sluggish when used. I tried killing the process, but the same problem just starts again after a while.

Is this the first time you’ve seen this error? (Y/N): Y

Steps to replicate it:

  1. Run Nextcloud with background jobs set to cron
  2. Run a cron job every 15 minutes that starts cron.php
  3. Observe CPU and database activity

The output of your Nextcloud log in Admin > Logging:

Empty

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

$CONFIG = array (
  'instanceid' => '**************',
  'passwordsalt' => '*************************',
  'secret' => '************************************',
  'trusted_domains' => 
  array (
    0 => 'cloud.example.com',
    1 => 'example.cloud',
  ),
  'datadirectory' => '/var/lib/nextcloud',
  'overwrite.cli.url' => 'https://example.cloud',
  'dbtype' => 'mysql',
  'version' => '18.0.6.0',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'nextcloud',
  'dbpassword' => '************',
  'installed' => true,
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'maintenance' => false,
  'filelocking.enabled' => true,
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' => 
  array (
    'host' => '/var/run/redis/redis.sock',
    'port' => 0,
    'timeout' => 0.0,
  ),
  'theme' => '',
  'loglevel' => 2,
  'logfile' => '/var/log/nextcloud/nextcloud.log',
  'mysql.utf8mb4' => true,
  'updater.release.channel' => 'stable',
  'auth.bruteforce.protection.enabled' => false,
);

The output of your Apache/nginx/system log in /var/log/____:

Nothing unusual

Can you post “top”, “htop” and/or ps -ef ?
Can you delete cron.php from your cron-list and wait e.g. one hour or one day?
You also can delete cron.php from your cron-list and reboot.
Please compare then the cpu usage and process list.

Current tasks mentioning nextcloud are:

root@bsr:/var/www/nextcloud/www/config# ps aux | grep nextcloud
nextclo+  2574  0.2  4.4 288868 45336 ?        S    14:01   0:04 php-fpm: pool nextcloud
nextclo+  2583  0.2  4.5 288872 45604 ?        S    14:01   0:04 php-fpm: pool nextcloud
nextclo+  2584  0.1  5.3 313756 54512 ?        S    14:01   0:02 php-fpm: pool nextcloud
nextclo+  2585  0.3  4.3 286840 43504 ?        S    14:01   0:06 php-fpm: pool nextcloud
nextclo+  6237  0.0  0.0   2388   676 ?        Ss   10:15   0:00 /bin/sh -c /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  6246  0.4  6.1 267332 61804 ?        D    10:15   1:09 /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  9602  0.0  0.0   2388   756 ?        Ss   14:30   0:00 /bin/sh -c /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  9604  1.2  8.3 281512 84180 ?        S    14:30   0:02 /usr/bin/php /var/www/nextcloud/public_html/cron.php
root     10395  0.0  0.0   6212   876 pts/0    S+   14:32   0:00 grep nextcloud
nextclo+ 14947  0.2  5.7 388928 58088 ?        S    12:46   0:18 php-fpm: pool nextcloud
nextclo+ 15275  0.3  5.2 313544 53292 ?        S    12:47   0:19 php-fpm: pool nextcloud
nextclo+ 15650  0.3  5.4 313760 55324 ?        S    12:49   0:19 php-fpm: pool nextcloud
nextclo+ 15694  0.2  5.5 313540 55684 ?        S    12:49   0:17 php-fpm: pool nextcloud
nextclo+ 15768  0.3  5.4 313760 55044 ?        S    12:49   0:21 php-fpm: pool nextcloud
nextclo+ 27405  0.3  5.4 313760 55484 ?        S    13:34   0:13 php-fpm: pool nextcloud

The CPU is being used almost entirely by the database, repeatedly running the query quoted above. Screenshot of top here.

I can disable the cron job but individual jobs are running for many hours. In theory, isn’t cron.php supposed to not run if a previous activation is still running?

nextclo+  6237  0.0  0.0   2388   676 ?        Ss   10:15   0:00 /bin/sh -c /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  6246  0.4  6.1 267332 61804 ?        D    10:15   1:09 /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  9602  0.0  0.0   2388   756 ?        Ss   14:30   0:00 /bin/sh -c /usr/bin/php /var/www/nextcloud/public_html/cron.php
nextclo+  9604  1.2  8.3 281512 84180 ?        S    14:30   0:02 /usr/bin/php /var/www/nextcloud/public_html/cron.php

cron.php execution is too long or too often.
please deactivate cron.php-execution and wait (or reboot).

If cpu is then normal than execute it manually and post logs.

root with sudo to your nextcloud-user (nextclo+)

sudo -u nextcloud /usr/bin/php /var/www/nextcloud/public_html/cron.php

The situation existed earlier, and the system was rebooted early yesterday morning. But the cron.php starting at 12:30 ran at least until I stopped looking and went to bed. It looks to have been finished, but replaced by others, as shown in ps aux | grep nextcloud above. Here is a picture of the CPU load - invariably caused by that SQL query.

It is not normal to execute cron.php more than one time. Solve first this problem.

How do you mean it is not normal? The documentation appears to say that cron.php should be triggered every 15 minutes by cron. (Some places say every 5 minutes). What are you suggesting I should be doing differently?

Cron.d job:

# /etc/cron.d/bsr-nextcron
# Set with initial configuration by server installation scripts
#
MAILTO="martin@example.com"
# Every 15 minutes run the Nextcloud cron job
# */15 * * * * nextcloud /usr/bin/php /var/www/nextcloud/public_html/cron.php

Yes this is ok. But because your system has problems you can not execute cron.php in this intervall. In this intervall cron.php is not finished and it runs multiple and makes more problems.

Use a longer intervall or deactivate cron-execution for test.

OK, but won’t the problem then come back again? It did after I rebooted.

Ok. Now you cann kill the processes, stop/start apache2 and nextcloud or reboot.

Rebooted. Started cron.php manually. It’s now been running for 30 minutes. Huge CPU by the database server, constantly running the same query. Although it is a large result set, the actual query only takes a few seconds, so I don’t understand why it is being run repeatedly.

$ whoami
nextcloud
$ date
Thu 18 Jun 14:50:21 BST 2020
$ /usr/bin/php /var/www/nextcloud/public_html/cron.php

Nothing in the Nextcloud log, nothing in the PHP-FPM log, nothing in the Apache error log.

No real idea. But if cron.php has stopped is the cpu normal and nextcloud works fine? Is it only a cron.php-problem or is the nextcloud also slow?

I’ve killed cron.php. CPU has tumbled to below 10% and load average is falling fast (was running at 1.5 or more, now down to 0.15 or less). Database is mostly doing nothing.

If now also Nextcloud works fine with active user, the problem is the cron.php-script.

Start the script and post logs.

Thanks. Well, yes, I think the problem is cron.php. Although it’s hard to know what it is doing, as it runs through a number of tasks. Nextcloud works even when CPU is near 100%, it is more responsive when it isn’t. There isn’t anything helpful in logs to post. I’m not sure what further information is available. Did you have particular logs in mind?

Perhaps a php-configuration is not good and only cron.php makes problems. Perhaps check your php-configuration (memory, …).

Also you can perhaps switch for a few days from cron.php to AJAX in the nextcloud-gui.
(Administration -> Basic settings). I use additional a hosted nextcloud with AJAX.

On the same site you can find information to your background jobs.

It doesn’t look like memory is the problem. Running PHP from a command line, memory is only constrained by the physical memory, which is 1GB. I can’t think of a PHP configuration that would create a problem without causing a crash. I was trying to get away from Ajax as it clearly must have a detrimental effect on user performance! It wasn’t spectacular before. Use of cron is recommended in the documentation (and the admin UI says to run every 5 minutes). But I’ve switched to Ajax - not sure if there is a way to get it on to cron.

Ran for some hours on Ajax. Now been running for several hours at near 100% cpu. Same SQL query being run all the time. SELECT path FROM oc_filecache WHERE ( storage = 3) AND ( size < 0) ORDER BY fileid DESC.

Looks to be stacking up PHP instances:

nextclo+  6980  0.5  6.4 338136 65284 ?        S    14:50   2:26 php-fpm: pool nextcloud
nextclo+  7240  0.3  5.1 313584 52120 ?        S    14:51   1:29 php-fpm: pool nextcloud
nextclo+  7283  0.3  5.1 313540 52260 ?        S    14:51   1:25 php-fpm: pool nextcloud
nextclo+  8510  0.3  5.6 389180 57512 ?        S    14:55   1:25 php-fpm: pool nextcloud
nextclo+  8935  0.3  5.8 389152 59640 ?        S    14:57   1:20 php-fpm: pool nextcloud
nextclo+ 15581  0.2  4.5 288864 45528 ?        S    21:44   0:02 php-fpm: pool nextcloud
nextclo+ 15603  0.5  4.5 288868 45880 ?        S    21:44   0:05 php-fpm: pool nextcloud
nextclo+ 19349  0.0  0.7 281908  7744 ?        S    21:59   0:00 php-fpm: pool nextcloud
nextclo+ 19366  0.3  5.7 388060 58356 ?        S    15:37   1:10 php-fpm: pool nextcloud
root     19598  0.0  0.0   6212   876 pts/2    S+   22:00   0:00 grep nextcloud
nextclo+ 28582  0.2  5.1 313540 52132 ?        S    18:21   0:36 php-fpm: pool nextcloud
nextclo+ 32306  0.3  5.4 313836 55148 ?        S    16:29   1:03 php-fpm: pool nextcloud

Restarting PHP-FPM clears out the long running processes, but doesn’t really solve the problem.

I can confirm the described issue. Did you check the GitHub issue tracker?

No, but I’ve looked now. The entry https://github.com/nextcloud/server/issues/18960 looks as if it is similar, doesn’t seem to be the same. It doesn’t appear to have made any progress.

1 Like