[Solved]High IO wait because mysql read IOPs

Support intro

Sorry to hear you’re facing problems :slightly_frowning_face:

help.nextcloud.com is for home/non-enterprise users. If you’re running a business, paid support can be accessed via portal.nextcloud.com where we can ensure your business keeps running smoothly.

In order to help you as quickly as possible, before clicking Create Topic please provide as much of the below as you can. Feel free to use a pastebin service for logs, otherwise either indent short log examples with four spaces:

example

Or for longer, use three backticks above and below the code snippet:

longer
example
here

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can :heart:

Nextcloud version (eg, 29.0.5): 27.1.4
Operating system and version (eg, Ubuntu 29.04): Ubuntu 22.04
Apache or nginx version (eg, Apache 2.4.25): 2.4.52
PHP version (eg, 8.3): 8.1

The issue you are facing:

Is this the first time you’ve seen this error? (Y/N): y

I have an NC instance for calendar only and Mariadb is eating all the disk IOs,
most of the iops are read ones using 200-300MB/s all the time causing big IO wait.

Looking at Mariadb i see a few process running that looks like they are looping example:

| 4935608 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER1'
| 4935621 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER2'
| 4935750 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER3'
| 4936034 | ncuser | localhost | nextdb | Query   |    1 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER4'
| 4936179 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER5'
| 4936457 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER6'
| 4936575 | ncuser | localhost | nextdb | Query   |    0 | Sending data | SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER7'

They are users using MacOs as client for the calendar(Caldav).
I don’t see this queries with user using thunderbird for example…

if i re-run all the time ‘show full processlist;’ in mysql always i see this users
and those queries.

Not sure but if something related with MacOs and caldav inbox(not sure exactly
what is for) i add apache logs.

LDAP backend but the clients are using app tokens for auth.

Don’t know what else can i try o how to continue debugging this.
Any help or ideas to discover what is going on is appreciated.

The output of your Nextcloud log in Admin > Logging:

Nothing related about this.

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'instanceid' => 'SECRET',
  'skeletondirectory' => '',
  'defaultapp' => 'calendar',
  'passwordsalt' => 'SECRET',
  'secret' => 'SECRET',  
  'trusted_domains' =>
  array (
    0 => 'REMOVED.TLD',
  ),
  'datadirectory' => '/var/www/nextcloud/data',
  'dbtype' => 'mysql',
  'version' => '27.1.4.1',
  'overwrite.cli.url' => 'https://REMOVED.TLD',
  'dbname' => 'nextdb',
  'dbhost' => 'localhost',
  'dbport' => '',  
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'ncuser',
  'dbpassword' => 'SECRET',
  'installed' => true,
  'ldapProviderFactory' => 'OCA\\User_LDAP\\LDAPProviderFactory',
  'mail_from_address' => 'noreply',
  'mail_smtpmode' => 'smtp',
  'mail_sendmailmode' => 'smtp',
  'mail_domain' => 'REMOVED:TLD',
  'mail_smtphost' => 'localhost',
  'mail_smtpport' => '25',
  'maintenance' => false, 
  'session_lifetime' => 14400,
  'remember_login_cookie_lifetime' => 172800,
  'token_auth_enforced' => true,
  'loglevel' => 2,
);

The output of your Apache/nginx/system log in /var/log/____:

RELATED lines i see often are similar like this one maybe with more bytes returning:

172.19.204.98 - USER5 [02/Jul/2024:19:10:36 +0000] "PROPFIND /remote.php/dav/calendars/USER5/inbox/ HTTP/1.1" 207 112937 "-" "macOS/14.4.1 (23E224) dataaccessd/1.0"
172.19.204.98 - USER5 [02/Jul/2024:19:10:39 +0000] "REPORT /remote.php/dav/calendars/USER5/inbox/ HTTP/1.1" 207 546884 "-" "macOS/14.4.1 (23E224) dataaccessd/1.0"

Output errors in nextcloud.log in /var/www/ or as admin user in top right menu, filtering for errors. Use a pastebin service if necessary.

NOTHING RELATED

Before anyone troubleshoots this much you’ll likely save everyone (including yourself) some time and hassle by, at a minimum, upgrading to the final maintenance release of v27 (27.1.11) before it went end of support. Your v27 is very very out-of-date: Nextcloud server changelog

Ideally you bump up to a still supported major release as well (e.g. v28), but that’s at your discretion since v27.1.11 just hit end-of-support.

Hi Jtr, im waiting some fixes, because there is a regression with the calendar email notifications in 27.1.5 and newers. This one: [stable27] fix(scheduling): don't send iMIP emails to rooms / resources by backportbot-nextcloud[bot] · Pull Request #41315 · nextcloud/server · GitHub and others… too it will take some time test the new version and search for regression like this one.

I attach a screenshot as example of iotop showing mariadb using lots of reads.

I think is related to that inbox endpoint… some users are geting100 secs spent time in that path example:

"REPORT /remote.php/dav/calendars/USER/inbox/ HTTP/1.1" 207 479986 99 "-" "macOS/14.5 (23F79) dataaccessd/1.0"

That 99 are the secs of apache waiting to finish the request.
And if i run this SELECT in mysql console takes like 1 sec for that user:

SELECT `uri`, `calendardata`, `lastmodified`, `etag`, `size` FROM `oc_schedulingobjects` WHERE `principaluri` = 'principals/users/USER'

But for some reason is looping that select multiple times when USER
request that path. Any idea what can be ?

Are the developers aware of all these bugs? You should have open github issues for those. For your linked pull request, there is only one linked issue that was solved with this pull request: fix(caldav): When message is a reply compare the message sender not the recipient by SebastianKrupinski · Pull Request #44893 · nextcloud/server · GitHub
It got into NC 27 last week as well, however the last version of NC 27, 27.1.11 was release end of June and there won’t be another release for NC 27.
So for any problem you have, make sure there is a github issue linked to it, and that it will hopefully be fixed in NC 28. There have been security issue in the mean time, so keeping the old version is not great …

Regarding your database problem, using caches reduces the load on the database by a lot. On top of that, you can optimize database caches as well, which is another speedup (it won’t have to do disk read/writes every time).

hi tflidd, yeah i opened a github issue and was fixed in:

I will try some caching options and see what happens…
but a week ago i don’t have this issue that is the weird part.

its like something is wrong. 14 days of history green is read.

Question: Exists a tool or a occ command to check if an user calendar its “ok” ?

oh i think im having this problem Remove old scheduling objects from INBOX and `oc_schedulingobjects` via cron · Issue #43621 · nextcloud/server · GitHub … time to update and see what regression we find :smiley:

as you can see, the merged code goes into 28.0.8, which will be released later this month… (it was merged for NC27 as well, but there won’t be a new version). If this fix is critical for you, you can also apply it manually yourself.

The fix should already be in current versions (e.g. 27.1.11).

In your place, I’d update to 27.1.11 first, then perhaps apply the patch to the other bug and check if your system works properly. Don’t forget to do a full backup now, and then when you are at 27.1.11. After that, if you don’t see major problems, you could go to NC 28. Even if you discover problems, they are only relevant for NC 28 because NC 27 is not supported any more.

No, here are a few commands but nothing for checks:
https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/occ_command.html#dav-commands

You could think of checks, do this separately and if it’s really helping one might think of integrating this. But what exactly would you check?

For the future:
Still check the potential optimization regarding caches, this will speed up the interface and also have a positive impact on up- and download speeds (number of files).
Also for versions, no problem to stay a bit behind with the major versions. The minor versions should be updated rather quickly because they fix bugs and security issues. If you find regressions, report them as early as possible.
In a more ideal world, you help a bit of testing. So you have a testing setup, where you can check new version and report problems very early. Advantage, you can test the stuff that is important for you, so it is not missed.

1 Like

Hi guys, I mark this as solved. I update to 27.1.11 and apply the patch i need it. Now works smooth, no more ~20k IOP/s. Thanks for the steps and suggestions. regards.

1 Like

This topic was automatically closed 8 days after the last reply. New replies are no longer allowed.