Unused nextcloud installation hitting database with 10,000 queries per second!

Nikki_Locke · May 6, 2021, 8:01am

Nextcloud version : 19.0.10
Operating system and version (eg, Ubuntu 20.04): Debian buster
Apache or nginx version (eg, Apache 2.4.25): nginx/1.14.2
PHP version (eg, 7.4): 7.3.27-1

The issue you are facing:
I have a largeish Netxcloud installation, which performs really slowly. I just installed Mariadb monitoring, and it is seeing between 2 and 5 thousand queries per second.

In order to help narrow down the problem, I installed an exact copy of the installation on a VM with 8 cores, 24GB RAM, and a fast SSD disk. This copy sees almost no user activity (only me). Scarily, it is seeing 10,000 queries per second!

I have redis and APCu (see config below).

I have a crob job running every 5 minutes to run php -f /data/nextcloud/html/cron.php. But ps reveals there is only a single job running, which started 7 minutes ago.

Is this the first time you’ve seen this error? (Y/N): N

The output of your Nextcloud log in Admin > Logging:

There are no log entries since yesterday (when I took the Mariadb server down briefly for some tweaks).

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'passwordsalt' => 'removed',
  'secret' => 'removed',
  'trusted_domains' => 
  array (
    0 => 'localhost',
    1 => 'removed',
  ),
  'datadirectory' => '/data/nextcloud/ncdata',
  'dbtype' => 'mysql',
  'version' => '19.0.4.2',
  'overwrite.cli.url' => 'http://localhost',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'nextcloud',
  'dbpassword' => 'removed',
  'installed' => true,
  'instanceid' => 'removed',
  'activity_expire_days' => 1,
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' => 
  array (
    'host' => 'localhost',
    'port' => 6379,
    'timeout' => 1.5,
    'password' => '',
  ),
  'memcache.distributed' => '\\OC\\Memcache\\Redis',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'mail_from_address' => 'ddat+cloud',
  'mail_smtpmode' => 'smtp',
  'mail_sendmailmode' => 'smtp',
  'mail_domain' => 'removed',
  'mail_smtphost' => '127.0.0.1',
  'mail_smtpport' => '25',
  'maintenance' => false,
  'default_locale' => 'en-GB',
  'default_language' => 'en-GB',
  'loglevel' => 2,
  'updater.release.channel' => 'stable',
  'theme' => '',
);

The output of your Apache/nginx/system log in /var/log/____:
I don’t seem to have a system log. The error log is empty. The access log shows every 30 seconds or so:

GET /ocs/v2.php/apps/notifications/api/v2/notifications

devnull · May 6, 2021, 8:04am

Can you post a few examples of the queries to MariaDB?

Can you perhaps deactivate Redis to exclude Redis from the error?

Nikki_Locke · May 6, 2021, 11:44am

How do I find out what the queries are?

redis seems to be working - here is the log:

18690:M 06 May 2021 11:38:35.071 * 10 changes in 300 seconds. Saving...
18690:M 06 May 2021 11:38:35.072 * Background saving started by pid 25393
25393:C 06 May 2021 11:38:35.087 * DB saved on disk
25393:C 06 May 2021 11:38:35.089 * RDB: 0 MB of memory used by copy-on-write
18690:M 06 May 2021 11:38:35.173 * Background saving terminated with success
18690:M 06 May 2021 11:43:36.021 * 10 changes in 300 seconds. Saving...
18690:M 06 May 2021 11:43:36.022 * Background saving started by pid 26367
26367:C 06 May 2021 11:43:36.037 * DB saved on disk
26367:C 06 May 2021 11:43:36.038 * RDB: 0 MB of memory used by copy-on-write
18690:M 06 May 2021 11:43:36.122 * Background saving terminated with success

Nikki_Locke · May 6, 2021, 12:06pm

OK, figured it out:

210506 12:03:16  11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)
                 11360 Query    SELECT `path` FROM `oc_filecache` WHERE (`storage` = 4407) AND (`fileid` = 22860)

I just had logging on for a few seconds, and there are 125,000 lines like this!

Running this query on the database returns no data.

devnull · May 6, 2021, 1:11pm

Perhaps @tflidd can help you because of this thread:

How to shrink oc_filecache (440GB, vs 360GB files hosted) - #6 by wbk

Nikki_Locke · May 6, 2021, 2:13pm

Can’t see anything that strikes me as relevant there.

It looks like ~~the cron job~~ the cloud is looping, trying to get a single record from oc_filecache, and looping round to try again when there isn’t one. This happens when I log in as the admin, who is joined to all the groups, and can access all the files.

As soon as I log out, it calms down again.

If I had a clue what filecache was for, what the storage and fileid columns represented, and which bit of code was running, I might get further.

Any knowledge or explanation welcome.

Nikki_Locke · May 6, 2021, 4:54pm

I suspect the problem is here in lib/private/Files/View.php in the getPath function:

		foreach ($mounts as $mount) {
			/**
			 * @var \OC\Files\Mount\MountPoint $mount
			 */
			if ($mount->getStorage()) {
				$cache = $mount->getStorage()->getCache();
				$internalPath = $cache->getPathById($id);
				if (is_string($internalPath)) {
					$fullPath = $mount->getMountPoint() . $internalPath;
					if (!is_null($path = $this->getRelativePath($fullPath))) {
						return $path;
					}
				}
			}
		}

Nextcloud is displaying about 800 folders, some of which have about 800 mounts. Every folder, times every mount, results in a query to the database!

tflidd · May 7, 2021, 8:16am

Can you check what file it is? Perhaps a shared file… or an app?

You could temporarily stop the cron jobs and see if the large number of queries continue when there is no cronjob at all (and in case there is no client, there is no reason to have so many of them).

Nikki_Locke · May 7, 2021, 10:00am

I’m ahead of you there. It isn’t the cron job.

Our setup involves lots of users. They are divided into about 800 groups (some are in more than 1, and 1 group has all the users in it). Most groups have a group folder (as per the group folders app).

The admin user is in all the groups.

We have a folder called Library that is shared readonly with all our users. A few dozen subfolders within that are shared read-write with particular groups. So the group can put stuff in their read-write share, and it becomes available readonly to everyone in the Library. I don’t know if it is relevant, but all the shared library folders were originally named (e.g.) “Media Group (r/w)”, shared with the group, then the actual folder in Library was renamed without the “(r/w)” - e.g. “Media Group”.

If I log on as the admin user, who is in all the groups, the transactions per second accelerates to the maximum the database can handle. This is clearly because of the code I quoted above - for every file or folder that is being displayed, a query seems to be made to the database for every group it might be in, to get the path. Usually only one of those actually finds it.

An obvious solution is to go directly to the database to find the first cache entry which matches one of the groups, but I am a bit confused by the fact that getPath seems to be called recursively. Really need some help with someone who understands the design and code (I don’t really do php, other than fiddling around the edges).

tflidd · May 7, 2021, 11:56am

If you have the just certain files that pop up, there might be something special about them, perhaps several have them open in a text editor (which might be normal).
Regarding your database, did you review your cache settings? For recurring queries that makes a huge difference. The other thing you can track are slow queries.

Question is also a bit what to expect from such a number of users.

Nikki_Locke · May 7, 2021, 12:10pm

The queries are returned really fast (it is managing 10,000/second). The problem is the number of queries - why does it take over 100,000 queries to populate the top level files list (about 800 folders and 10 shared files)?

tflidd · May 7, 2021, 6:54pm

For details about the code, it is better to get in touch with the developers directly. I have no idea how this is managed and for example if there is a difference between normal shared folders and the groupfolder app.
general NC server → Issues · nextcloud/server · GitHub
groupfolder app → https://github.com/nextcloud/groupfolders

Nikki_Locke · May 12, 2021, 7:57pm

Version 20 has a small optimisation which reduces the amount of database access significantly. It’s still really inefficient, but not in quite such a mind-boggingly degree!