Nextcloud / MySQL crashes overnight

welcometopwn · May 18, 2022, 2:52pm

Nextcloud version: 23.0.4
Operating system and version: Deebian 11 (Bullseye)
Nginx version: nginx/1.21.6
PHP version: PHP 7.4.29
MySQL version: MariaDB 10.7.3

The issue you are facing:
Nextcloud stops working after some period, usually after 12-20 hours and i’m forced to reboot the vm in order to make it work again. When this issue occurs i am not able to login to MySQL or restart the service. I’ve reinstalled MariaDB by purging the config files and re-creating the database (mysqlcheck shows no errors).

Is this the first time you’ve seen this error? (Y/N): No

The output of your Nextcloud log in Admin > Logging:

[webdav] Fatal: The REQUEST_URI (/remote.php/dav/files/user//) did not end with the contents of PATH_INFO (/dav/files/user/). This server might be misconfigured.

PROPFIND /remote.php/dav/files/user//
from 192.168.1.91 by user at 2022-05-18T14:38:51+00:00

[cron] Error: An exception occurred while executing a query: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

at 2022-05-18T14:38:51+00:00

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'instanceid' => 'x',
  'passwordsalt' => 'x',
  'secret' => 'x',
  'trusted_domains' => 
  array (
    0 => 'x.com',
    1 => '192.168.1.8',
    2 => 'www',
    3 => 'www.local',
  ),
  'datadirectory' => '/var/www/_data/nextcloud/',
  'dbtype' => 'mysql',
  'version' => '23.0.4.1',
  'overwrite.cli.url' => 'https://x.com',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'loglevel' => 2,
  'logfile' => '/var/log/nextcloud/nextcloud.log',
  'log_rotate_size' => 5242880,
  'dbport' => '3306',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'nextcloud',
  'dbpassword' => 'x',
  'installed' => true,
  'mail_from_address' => 'x',
  'mail_smtpmode' => 'smtp',
  'mail_domain' => 'x',
  'mail_smtpsecure' => 'tls',
  'mail_smtphost' => 'x',
  'mail_smtpport' => '587',
  'mail_smtpauth' => 1,
  'mail_smtpname' => 'x',
  'mail_smtppassword' => 'x',
  'preview_libreoffice_path' => '/usr/bin/libreoffice',
  'htaccess.RewriteBase' => '/',
  'remember_login_cookie_lifetime' => 7776000,
  'session_lifetime' => 19440000,
  'simpleSignUpLink.shown' => false,
  'auth.bruteforce.protection.enabled' => true,
  'lost_password_link' => '#',
  'default_phone_region' => 'US',
  'enable_previews' => true,
  'enabledPreviewProviders' => 
  array (
    0 => 'OC\\Preview\\Image',
    1 => 'OC\\Preview\\MP3',
    2 => 'OC\\Preview\\TXT',
    3 => 'OC\\Preview\\PDF',
    4 => 'OC\\Preview\\Movie',
    5 => 'OC\\Preview\\Photoshop',
    6 => 'OC\\Preview\\TIFF',
    7 => 'OC\\Preview\\SVG',
    8 => 'OC\\Preview\\OpenDocument',
  ),
  'filelocking.enabled' => 'true',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' => 
  array (
    'host' => '/var/run/redis/redis-server.sock',
    'port' => 0,
  ),
  'maintenance' => false,
  'theme' => '',
  'mysql.utf8mb4' => true,
  'updater.release.channel' => 'stable',
  'mail_sendmailmode' => 'smtp',
  'app_install_overwrite' => 
  array (
    0 => 'dicomviewer',
    1 => 'bruteforcesettings',
    2 => 'previewgenerator',
    3 => 'metadata',
    4 => 'camerarawpreviews',
  ),
);

nginx logs:

error.log

2022/05/18 16:05:08 [error] 70754#70754: *45003 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.1.1, server: mydomain.com, request: "GET /apps/dashboard/ HTTP/2.0", upstream: "fastcgi://127.0.0.1:9000", host: "mydomain.com"

access.log

192.168.1.1 - - [18/May/2022:16:05:08 +0200] "GET /apps/dashboard/ HTTP/2.0" 504 569 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"
192.168.1.1 - - [18/May/2022:16:06:08 +0200] "GET /favicon.ico HTTP/2.0" 504 569 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36"

nextcloud.log

https://pastebin.com/1mGBG7ei

php7.4-fpm.log

[16-May-2022 23:56:01] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[16-May-2022 23:56:14] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[17-May-2022 18:36:06] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[18-May-2022 11:41:18] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

welcometopwn · May 20, 2022, 12:33pm

I highered pm.max_children to 12 but that’s not helping at all. Anyone has some clue what the issue could be?

welcometopwn · May 24, 2022, 10:04am

I deactivated Preview Generator since i thought i might have this issue Query on oc_filecache uses wrong index - Cron job runs very long · Issue #24401 · nextcloud/server · GitHub

Now again my Nextcloud is no longer working but this time i was able to login to MySQL and check the processes. Any idea what could cause them to not finish the query?

MySQL Processes

welcometopwn · May 28, 2022, 2:34am

Does someone have a clue? I really don’t feel like reinstalling everything from scratch.

tflidd · May 29, 2022, 9:56am

If your database crashes, the database logs would be the most interesting logfile.

For the cronjobs, you can directly execute them from command line and see if there are errors: sudo -u www-data php -f cron.php

And for the auth-tokens, check if the table gets extremely large, there was a report on the bugtracker: Hundreds of odd oc_authtoken entries for a user slow down performance · Issue #27603 · nextcloud/server · GitHub

welcometopwn · June 1, 2022, 4:54am

Thanks for the reply!

After deactivating the Preview Generator i am able to login into MySQL just fine, but as mentioned above i do have those processes that are just being stuck. It’s either in oc_filecache, oc_jobs or oc_authtoken. Since the service/db is not fully crashing it also doesn’t return any errors.

Output of journalctl -u mariadb.service

-- Boot 26554ea3cc0f4fcfb46e175ae564f0cc --
May 28 04:37:25 www systemd[1]: Starting MariaDB 10.7.3 database server...
May 28 04:37:25 www mariadbd[234]: 2022-05-28  4:37:25 0 [Note] /usr/sbin/mariadbd (server 10.7.3-MariaDB-1:10.7.3+maria~bullseye-log) starting as process 234 ...
May 28 04:37:27 www systemd[1]: Started MariaDB 10.7.3 database server.
May 28 04:37:28 www /etc/mysql/debian-start[356]: Looking for 'mysql' as: /usr/bin/mysql
May 28 04:37:28 www /etc/mysql/debian-start[356]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
May 28 04:37:28 www /etc/mysql/debian-start[356]: This installation of MariaDB is already upgraded to 10.7.3-MariaDB.
May 28 04:37:28 www /etc/mysql/debian-start[356]: There is no need to run mysql_upgrade again for 10.7.3-MariaDB.
May 28 04:37:28 www /etc/mysql/debian-start[356]: You can use --force if you still want to run mysql_upgrade
May 28 04:37:28 www /etc/mysql/debian-start[415]: Checking for insecure root accounts.
-- Boot 5277aaeca832492d825e403aeca07f8c --
May 30 23:08:24 www systemd[1]: Starting MariaDB 10.7.3 database server...
May 30 23:08:25 www mariadbd[233]: 2022-05-30 23:08:25 0 [Note] /usr/sbin/mariadbd (server 10.7.3-MariaDB-1:10.7.3+maria~bullseye-log) starting as process 233 ...
May 30 23:08:27 www systemd[1]: Started MariaDB 10.7.3 database server.
May 30 23:08:27 www /etc/mysql/debian-start[304]: Upgrading MySQL tables if necessary.
May 30 23:08:27 www /etc/mysql/debian-start[362]: Checking for insecure root accounts.

The cronjob runs just fine when executing it directly.

oc_authtoken only contains 2 rows. I emptied the tables before just to rule that issue out.

Right now there are new processes being “stuck”.

MariaDB [(none)]> show processlist;
+------+-----------+-----------+-----------+---------+------+-----------------+------------------------------------------------------------------------------------------------------+----------+
| Id   | User      | Host      | db        | Command | Time | State           | Info                                                                                                 | Progress |
+------+-----------+-----------+-----------+---------+------+-----------------+------------------------------------------------------------------------------------------------------+----------+
| 6326 | nextcloud | localhost | nextcloud | Query   | 2294 | Update          | INSERT INTO `oc_filecache` (`mimepart`, `mimetype`, `mtime`, `size`, `etag`, `storage_mtime`, `permi |    0.000 |
| 6345 | nextcloud | localhost | nextcloud | Query   | 2174 | Update          | INSERT INTO `oc_filecache` (`mimepart`, `mimetype`, `mtime`, `size`, `etag`, `storage_mtime`, `permi |    0.000 |
| 6360 | nextcloud | localhost | nextcloud | Query   | 2003 | init for update | DELETE FROM `oc_filecache` WHERE `fileid` = 18766                                                    |    0.000 |
| 6601 | root      | localhost | NULL      | Query   |    0 | starting        | show processlist                                                                                     |    0.000 |
+------+-----------+-----------+-----------+---------+------+-----------------+------------------------------------------------------------------------------------------------------+----------+
4 rows in set (0.000 sec)

welcometopwn · June 5, 2022, 9:56am

It was running fine for 4 days and it’s starting again.

Screenshot of the processes

tflidd · June 5, 2022, 7:47pm

oc_jobs table contains for me 66 lines. So all queries should be fast. time_ms is the runtime of the command or timestamp when it was run?

welcometopwn · June 7, 2022, 9:04am

That would be the runtime same as the time column but in ms.

“The value displayed in the TIME and TIME_MS columns is the period of time that the given thread has been in its current state. Thus it can be used to check for example how long a thread has been executing the current query, or for how long it has been idle.”

The strange part is it is running for quite some days just fine and at some point it starts to pile up those processes and they are basically stuck and i can’t kill the process.
oc_jobs contains 37 rows in total.

CREATE TABLE `oc_jobs` (
	`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
	`class` VARCHAR(255) NOT NULL DEFAULT '' COLLATE 'utf8mb4_bin',
	`argument` VARCHAR(4000) NOT NULL DEFAULT '' COLLATE 'utf8mb4_bin',
	`last_run` INT(11) NULL DEFAULT '0',
	`last_checked` INT(11) NULL DEFAULT '0',
	`reserved_at` INT(11) NULL DEFAULT '0',
	`execution_duration` INT(11) NULL DEFAULT '0',
	`argument_hash` VARCHAR(32) NULL DEFAULT NULL COLLATE 'utf8mb4_bin',
	`time_sensitive` SMALLINT(6) NOT NULL DEFAULT '1',
	PRIMARY KEY (`id`) USING BTREE,
	INDEX `job_class_index` (`class`) USING BTREE,
	INDEX `job_argument_hash` (`class`, `argument_hash`) USING BTREE,
	INDEX `job_lastcheck_reserved` (`last_checked`, `reserved_at`) USING BTREE,
	INDEX `jobs_time_sensitive` (`time_sensitive`) USING BTREE
)
COLLATE='utf8mb4_bin'
ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
AUTO_INCREMENT=36527
;

oc_jobs
---
| id | class | argument | last_run | last_checked | reserved_at | execution_duration | argument_hash | time_sensitive | 
| ---: | --- | --- | ---: | ---: | ---: | ---: | --- | ---: | 
| 1 | OCA\Files_Trashbin\BackgroundJob\ExpireTrash | null | 1654592102 | 1654592402 | 0 | 0 | some hash | 1 | 
| 2 | OCA\DAV\BackgroundJob\CleanupDirectLinksJob | null | 1654515601 | 1654592401 | 0 | 0 | some hash | 0 | 
| 3 | OCA\DAV\BackgroundJob\UpdateCalendarResourcesRoomsBackgroundJob | null | 1654590901 | 1654592401 | 0 | 0 | some hash | 1 | 
| 4 | OCA\DAV\BackgroundJob\CleanupInvitationTokenJob | null | 1654515601 | 1654592402 | 0 | 0 | some hash | 0 | 
| 5 | OCA\Federation\SyncJob | null | 1541441701 | 1654605301 | 0 | 0 | some hash | 1 | 
| 6 | OCA\Files_Sharing\DeleteOrphanedSharesJob | null | 1654591502 | 1654592402 | 0 | 0 | some hash | 1 | 
| 7 | OCA\Files_Sharing\ExpireSharesJob | null | 1654515601 | 1654592402 | 0 | 0 | some hash | 0 | 
| 8 | OCA\Files_Sharing\BackgroundJob\FederatedSharesDiscoverJob | null | 1654515601 | 1654592402 | 0 | 0 | some hash | 1 | 
| 9 | OCA\Files_Versions\BackgroundJob\ExpireVersions | null | 1654592102 | 1654592402 | 0 | 0 | some hash | 1 | 
| 10 | OCA\UpdateNotification\Notification\BackgroundJob | null | 1654515601 | 1654592402 | 0 | 66 | some hash | 1 | 
| 11 | OCA\Files\BackgroundJob\ScanFiles | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 | 
| 12 | OCA\Files\BackgroundJob\DeleteOrphanedItems | null | 1654590901 | 1654592402 | 0 | 0 | some hash | 1 | 
| 13 | OCA\Files\BackgroundJob\CleanupFileLocks | null | 1654592102 | 1654592402 | 0 | 0 | some hash | 1 | 
| 14 | OCA\NextcloudAnnouncements\Cron\Crawler | null | 1580998534 | 1654605301 | 0 | 1 | some hash | 1 | 
| 16 | OCA\Activity\BackgroundJob\EmailNotification | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 | 
| 17 | OCA\Activity\BackgroundJob\ExpireActivities | null | 1654515667 | 1654592402 | 0 | 0 | some hash | 0 | 
| 18 | OCA\Support\BackgroundJobs\CheckSubscription | null | 1541453402 | 1654605301 | 0 | 0 | some hash | 1 | 
| 19 | OC\Authentication\Token\DefaultTokenCleanupJob | null | 1653050809 | 1654605301 | 0 | 0 | some hash | 1 | 
| 20 | OC\Log\Rotate | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 | 
| 21 | OC\Preview\BackgroundCleanupJob | null | 1654590901 | 1654592402 | 0 | 0 | some hash | 1 | 
| 26 | OCA\UpdateNotification\ResetTokenBackgroundJob | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 | 
| 31 | OCA\DAV\BackgroundJob\RefreshWebcalJob | {"principaluri":"principals\/users\/user1","uri":"user1"} | 1654388702 | 1654592402 | 0 | 9 | some hash | 1 | 
| 208 | OC\Core\BackgroundJobs\CleanupLoginFlowV2 | null | 1654590901 | 1654592402 | 0 | 0 | some hash | 1 | 
| 1794 | OCA\DAV\BackgroundJob\EventReminderJob | null | 1654592102 | 1654592402 | 0 | 0 | some hash | 1 | 
| 1795 | OCA\Text\Cron\Cleanup | null | 1654592102 | 1654592402 | 0 | 0 | some hash | 1 | 
| 1797 | OCA\Files\BackgroundJob\CleanupDirectEditingTokens | null | 1654591502 | 1654592402 | 0 | 0 | some hash | 1 | 
| 1890 | OCA\AdminAudit\BackgroundJobs\Rotate | null | 1654588802 | 1654592402 | 0 | 0 | some hash | 1 | 
| 12802 | OCA\WorkflowEngine\BackgroundJobs\Rotate | null | 1654588802 | 1654592402 | 0 | 0 | some hash | 1 | 
| 12803 | OCA\ContactsInteraction\BackgroundJob\CleanupJob | null | 1654515667 | 1654592402 | 0 | 0 | some hash | 0 | 
| 21523 | OCA\Activity\BackgroundJob\DigestMail | null | 1654590901 | 1654592402 | 0 | 0 | some hash | 1 | 
| 21525 | OCA\UserStatus\BackgroundJob\ClearOldStatusesBackgroundJob | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 | 
| 21968 | OC\Security\Bruteforce\CleanupJob | null | 1654515667 | 1654592402 | 0 | 0 | some hash | 0 | 
| 26338 | OCA\DAV\BackgroundJob\CalendarRetentionJob | null | 1654576202 | 1654592402 | 0 | 0 | some hash | 1 | 
| 26339 | OCA\ServerInfo\Jobs\UpdateStorageStats | null | 1654588802 | 1654592402 | 0 | 0 | some hash | 1 | 
| 26341 | OCA\Circles\Cron\Maintenance | null | 1653306903 | 1654605301 | 0 | 0 | some hash | 1 | 
| 30444 | OCA\Notifications\BackgroundJob\GenerateUserSettings | null | 1654515667 | 1654592402 | 0 | 0 | some hash | 1 | 
| 30445 | OCA\Notifications\BackgroundJob\SendNotificationMails | null | 1654592402 | 1654592402 | 0 | 0 | some hash | 1 |

tflidd · June 7, 2022, 11:24am

can you analyze the cache settings of the database (mysqltuner.pl, tuning-primer.sh)?

Your jobs table looks similar to mine. Not each job runs each time, not sure if it is obvious in the logs somewhere if a different job creates the database problem.

welcometopwn · June 7, 2022, 6:03pm

I just ran both of the mentioned tools, never heard of them before… pretty neat!

tuning-primer.sh
mysqltuner.pl

I too noticed that when i posted the table that some of the jobs are not running each cron (i suppose they should?) sadly i can’t find anything in any logs related to that.

It’s probably not best practise but should i upgrade my db version (maybe php as well) to see if it makes any difference?

Thank you so far for your help!

welcometopwn · June 29, 2022, 8:34pm

I reinstalled Nextcloud and just kept the data that i later migrated after finishing the setup and it is still crashing. 504 Gateway error. I’m really out of ideas…

running mysql processes

mcnesium · July 14, 2022, 8:52am

For a couple of weeks I am also seeing those messages pretty much daily:

PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in /var/www/html/3rdparty/doctrine/dbal/src/Driver/PDO/Statement.php:92

This is in a container environment within their separate nextcloud_default Docker network using nextcloud:stable and mariadb:10.5 images, and the error message is caused by the docker exec -u www-data -i cloud_app php -f cron.php cron job.

Not sure whether that is related but about a month ago I started using Watchtower to update both these images and I think these error messages appeared only afterwards. Is it possible that the watchtower scans somehow disrupts the nextcloud environment? Another option is that it only started after upgrading to NC v24.

infecticide · November 13, 2022, 7:42pm

I’m having the same issue here, a query gets stuck on oc_filecache or oc_jobs. This most recent lockup shows the following in SHOW PROCESSLIST;

Id User Host db Command Time State Info Progress
44446 nextcloud_user localhost nextcloud Query 1747 Update INSERT INTO oc_filecache (mimepart, mimetype, mtime, size, etag, storage_m time, `permi 0.000

Full Query is:

—TRANSACTION 952900, ACTIVE 5700 sec inserting
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 1
MariaDB thread id 44446, OS thread handle 139644487255616, query id 274803 localhost nextcloud Update
INSERT INTO oc_filecache (mimepart, mimetype, mtime, size, etag, storage_mtime, permissions, name, parent, checksum, path_hash, path, storage) VALUES(‘1’, ‘2’, ‘1668366450’, ‘-1’, ‘637140729cd4c’, ‘1668366450’, ‘31’, ‘5’, ‘100858’, ‘’, ‘e3beec3a110151336b4946242594c106’, ‘appdata_ocg64ev6513q/preview/d/8/b/5’, ‘1’)

This happens every couple of days. Haven’t been able to find a cause yet.