MariaDB/MySQL crash/restart cycle

Hey there,

I run Nextcloud
on a Raspberry Pi 2
with up to date Raspbian Jessie
on Linux kernel 4.9.24-v7+,
Apache/2.4.25
mod_php 7.0.18-1
MariaDB 10.0.30.

Yesterday I had a total server crash, starting to show up with nextcloud not being reachable anymore. The first errors I recovered now on nextcloud log were:

Fatal	webdav	Sabre\DAV\Exception\ServiceUnavailable: HTTP/1.1 503 Doctrine\DBAL\Exception\DriverException: An exception occurred while executing 'SELECT * FROM `oc_bruteforce_attempts` WHERE (`occurred` > ?) AND (`subnet` = ?) AND (`action` = ?)' with params [1494213647, "192.168.178.21\/32", "login"]: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away	2017-05-08T17:20:47+0200
Error	no app in context	Doctrine\DBAL\Exception\DriverException: An exception occurred while executing 'SELECT * FROM `oc_bruteforce_attempts` WHERE (`occurred` > ?) AND (`subnet` = ?) AND (`action` = ?)' with params [1494213647, "192.168.178.21\/32", "login"]: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away	2017-05-08T17:20:47+0200
Error	PHP	PDOStatement::execute(): Error reading result set's header at /var/www/owncloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php#91	2017-05-08T17:20:47+0200
Error	PHP	PDOStatement::execute(): MySQL server has gone away at /var/www/owncloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php#91	2017-05-08T17:20:47+0200

This error messages were repeating on every attempt to open nextcloud web ui, connect with dav or cron.php, resulting in a crash and restart of mysql:

May  8 17:21:18 micha mysqld: 170508 17:21:18 [ERROR] mysqld got signal 4 ;
May  8 17:21:18 micha mysqld: This could be because you hit a bug. It is also possible that this binary
May  8 17:21:18 micha mysqld: or one of the libraries it was linked against is corrupt, improperly built,
May  8 17:21:18 micha mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: We will try our best to scrape up some info that will hopefully help
May  8 17:21:18 micha mysqld: diagnose the problem, but since we have already crashed,
May  8 17:21:18 micha mysqld: something is definitely wrong and this may fail.
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: Server version: 10.0.30-MariaDB-0+deb8u1
May  8 17:21:18 micha mysqld: key_buffer_size=16777216
May  8 17:21:18 micha mysqld: read_buffer_size=131072
May  8 17:21:18 micha mysqld: max_used_connections=1
May  8 17:21:18 micha mysqld: max_threads=52
May  8 17:21:18 micha mysqld: thread_count=1
May  8 17:21:18 micha mysqld: It is possible that mysqld could use up to
May  8 17:21:18 micha mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 130185 K  bytes of memory
May  8 17:21:18 micha mysqld: Hope that's ok; if not, decrease some variables in the equation.
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: Thread pointer: 0x73fa0008
May  8 17:21:18 micha mysqld: Attempting backtrace. You can use the following information to find out
May  8 17:21:18 micha mysqld: where mysqld died. If you see no messages after this, something went
May  8 17:21:18 micha mysqld: terribly wrong...
May  8 17:21:18 micha mysqld: stack_bottom = 0x769f6de4 thread_stack 0x30000
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: Trying to get some variables.
May  8 17:21:18 micha mysqld: Some pointers may be invalid and cause the dump to abort.
May  8 17:21:18 micha mysqld: Query (0x55c15018): is an invalid pointer
May  8 17:21:18 micha mysqld: Connection ID (thread ID): 3
May  8 17:21:18 micha mysqld: Status: NOT_KILLED
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
May  8 17:21:18 micha mysqld: information that should help you find out what is causing the crash.
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld: We think the query pointer is invalid, but we will try to print it anyway.
May  8 17:21:18 micha mysqld: Query: SELECT * FROM `oc_bruteforce_attempts` WHERE (`occurred` > '1494213678') AND (`subnet` = '192.168.178.21/32') AND (`action` = 'login')
May  8 17:21:18 micha mysqld:
May  8 17:21:18 micha mysqld_safe: Number of processes running now: 0
May  8 17:21:18 micha mysqld_safe: mysqld restarted
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] /usr/sbin/mysqld (mysqld 10.0.30-MariaDB-0+deb8u1) starting as process 918 ...
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Using mutexes to ref count buffer pool pages
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: The InnoDB memory heap is disabled
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Compressed tables use zlib 1.2.8
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Using Linux native AIO
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Not using CPU crc32 instructions
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Initializing buffer pool, size = 128.0M
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Completed initialization of buffer pool
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Highest supported file format is Barracuda.
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: The log sequence numbers 1122994427 and 1122994427 in ibdata files do not match the log sequence number 1279369285 in the ib_logfiles!
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Database was not shutdown normally!
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Starting crash recovery.
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Reading tablespace information from the .ibd files...
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: Restoring possible half-written data pages
May  8 17:21:18 micha mysqld: 170508 17:21:18 [Note] InnoDB: from the doublewrite buffer...
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] InnoDB: 128 rollback segment(s) are active.
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] InnoDB: Waiting for purge to start
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 1279369285
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] Plugin 'FEDERATED' is disabled.
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] Plugin 'FEEDBACK' is disabled.
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] Server socket created on IP: '127.0.0.1'.
May  8 17:21:19 micha mysqld: 170508 17:21:19 [Note] /usr/sbin/mysqld: ready for connections.
May  8 17:21:19 micha mysqld: Version: '10.0.30-MariaDB-0+deb8u1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Raspbian)

Before I saw that mysql was the error, I tried to resolve it by restarting apache2 which made apache also crash. Looking at the apache log I realized, that the malfunction already started one night before:

[Mon May 08 00:23:18.816612 2017] [:error] [pid 638] [client 192.168.178.21:51493] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] No su$
[Mon May 08 00:23:18.821868 2017] [:error] [pid 638] [client 192.168.178.21:51493] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] No su$
[Mon May 08 15:30:22.196331 2017] [:error] [pid 640] [client 192.168.178.21:51501] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] No su$
[Mon May 08 15:30:23.460774 2017] [:error] [pid 640] [client 192.168.178.21:51501] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] No su$
...
...
[Mon May 08 17:19:51.076168 2017] [mpm_prefork:notice] [pid 626] AH00169: caught SIGTERM, shutting down
[Mon May 08 17:20:17.195993 2017] [mpm_prefork:notice] [pid 586] AH00163: Apache/2.4.25 (Raspbian) OpenSSL/1.0.2k configured -- resuming normal operations
[Mon May 08 17:20:17.199810 2017] [core:notice] [pid 586] AH00094: Command line: '/usr/sbin/apache2'
[Mon May 08 17:20:45.754630 2017] [:error] [pid 693] [client 192.168.178.21:52289] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 17:20:45.761301 2017] [:error] [pid 693] [client 192.168.178.21:52289] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 17:35:42.106924 2017] [:error] [pid 720] [client 192.168.178.21:52340] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 17:35:42.112498 2017] [:error] [pid 720] [client 192.168.178.21:52340] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 17:35:42.461523 2017] [:error] [pid 693] [client 192.168.178.21:52341] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 17:35:42.466391 2017] [:error] [pid 693] [client 192.168.178.21:52341] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 18:05:41.540229 2017] [:error] [pid 695] [client 192.168.178.21:52446] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 18:05:41.544724 2017] [:error] [pid 695] [client 192.168.178.21:52446] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 18:35:41.529881 2017] [:error] [pid 696] [client 192.168.178.21:52550] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 18:35:41.534437 2017] [:error] [pid 696] [client 192.168.178.21:52550] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 19:05:41.588284 2017] [:error] [pid 698] [client 192.168.178.21:52655] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 19:05:41.592769 2017] [:error] [pid 698] [client 192.168.178.21:52655] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 19:35:41.646038 2017] [:error] [pid 700] [client 192.168.178.21:52892] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 19:35:41.650591 2017] [:error] [pid 700] [client 192.168.178.21:52892] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 19:46:16.587373 2017] [autoindex:error] [pid 698] [client 192.168.178.21:52925] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by O$
[Mon May 08 19:48:11.948414 2017] [autoindex:error] [pid 17087] [client 192.168.178.21:52975] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by$
[Mon May 08 19:48:43.919354 2017] [autoindex:error] [pid 696] [client 192.168.178.21:53023] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by O$
[Mon May 08 20:05:41.691529 2017] [:error] [pid 2995] [client 192.168.178.21:53145] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conn$
[Mon May 08 20:05:41.696027 2017] [:error] [pid 2995] [client 192.168.178.21:53145] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conn$
[Mon May 08 20:29:36.931417 2017] [autoindex:error] [pid 720] [client 192.168.178.21:53295] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by O$
[Mon May 08 20:35:41.735594 2017] [:error] [pid 695] [client 192.168.178.21:53312] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 20:35:41.740307 2017] [:error] [pid 695] [client 192.168.178.21:53312] PHP Fatal error:  Uncaught Doctrine\\DBAL\\DBALException: Failed to connect to the database: An exception occured in driver: SQLSTATE[HY000] [2002] Conne$
[Mon May 08 21:20:59.638302 2017] [autoindex:error] [pid 2995] [client 192.168.178.21:53534] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by $
[Mon May 08 23:31:20.836474 2017] [autoindex:error] [pid 698] [client 192.168.178.21:56451] AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by O$
[Mon May 08 23:51:14.331756 2017] [mpm_prefork:notice] [pid 586] AH00169: caught SIGTERM, shutting down
[Tue May 09 00:17:42.746276 2017] [mpm_prefork:notice] [pid 581] AH00163: Apache/2.4.25 (Raspbian) OpenSSL/1.0.2k configured -- resuming normal operations
[Tue May 09 00:17:42.748362 2017] [core:notice] [pid 581] AH00094: Command line: '/usr/sbin/apache2'
[Tue May 09 00:19:32.395113 2017] [mpm_prefork:notice] [pid 581] AH00171: Graceful restart requested, doing restart
[Tue May 09 00:19:32.825239 2017] [mpm_prefork:notice] [pid 581] AH00163: Apache/2.4.25 (Raspbian) OpenSSL/1.0.2k configured -- resuming normal operations
[Tue May 09 00:19:32.825410 2017] [core:notice] [pid 581] AH00094: Command line: '/usr/sbin/apache2'

It seems, that if you use the apache mysql modules and mysql does not work, apache also crashes on restart, showing this error, right?

AH01276: Cannot serve directory /var/www/owncloud/: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by...

I remembered, that the morning before the whole server stuck and I made a reboot, which seemed to fix it. I guess during that crash the mysql database got corrupted. Sadly I was not able to dump and restore the database, because mysqldump also stopped with error, that mysql gone away. After trying back and forth, I finally was able to mysqlcheck [–optimize] (rebuild and analyze for innodb) all mysql databases. At first also mysqlcheck stuck/stopped at 2. nextcloud table (oc_activity). But also after all this worked without error, I was not able to connect to nextcloud or dumping the database for same reason/error. Also google didn’t help me, I gave up and finally found no other way than purging mysql-server and reinstall it and therefore loose all dbs. At least I have all the data/calendar/contacts on clients to restore it finally. Also ownBackup dump were there, but since it does not work anymore with nc 12, dbs were not up-to-date and also I was not able to restore them with mysql -u -p [<table>] < <filename>, are they somehow packed and only restoreable via ownBackup app itself?

So basically I am restoring stuff now from clients, maybe later I will set up the whole server new. Runs already since 2 years with maaaany experiments, changes, upgrades, wheezy → Jessie ownCloud → Nextcloud etc. etc., so maybe it can resolve some sleeping issues.

My questions are, if due to some server crash the mysql dbs get corrupted and mysqldump is not working, is there some other way to backup the databases? And how to restore ownBackup dumps without web interface working?

The initial problem might have been caused due to changes in php.ini according to the new recommendations of opcache settings since Nextcloud 12 Beta.

I always get problems with suddenly stuck apache, being not able to be restarted with some SEGV error when enabling this to settings in php.ini:

opcache.max_accelerated_files=10000
opcache.memory_consumption=128

Commenting them out makes everything work again. Might be due to memory limitations of my Raspberry Pi 2? I checked memory consumption in between and actually there is enough still available, but I have no idea how some services might do memory reservation or whatever or how much available memory changes during loading the nextcloud web ui with caches (opcache and apcu and redis for file locking) might change, overload and crash in result?

Is there some way to check, if really the limited memory is the problem? In this case increasing the memory values for opcache seems to be no good advice in nextcloud admin panel, as some systems might not provide enough.

My nextcloud web ui is just not loadable again with server can’t resolve request error. I checked every log one-by-one and found some syntax error in 20-apcu.ini. I checked that file and found some reaaaaly crazy content:

%xtension=apcu.so
apc.enabled91
apc.enable_cli=1
aqc.shm_s)ze=128M
apc.ttl=3&00
apk.qser_ttl=7200
apc.wc_ttl=360

After fixing this back to what it should be, nextcloud web ui started again perfectly.

More and more I am thinking that maybe my sd card is dying, or does anyone have an idea how this strange file changes could occur??
I there some way to check the filesystem of the running system device? fsck just works for unmounted devices…