General error: 2006 MySQL server has gone away on multiple file upload

Nextcloud version: 21.0.0 beta4
Operating system and version: Ubuntu Server LTS 20.04
Apache version: 2.4.41
PHP version: 7.4.3
Galera cluster: 10.3.25-MariaDB (READ-COMMITED)
Redis version: 5.0.7
Haproxy version: 2.0.13

The issue you are facing:
When uploading multiple folders with many files some of the files get error -
Mysql connection lost. They tend to occur for larger files 200-400MB.

Is this the first time you’ve seen this error? (Y/N):
N

Steps to replicate it:

  1. Upload single 25GB folder (contains 14 387 Files, 1 263 Folders)

The output of your Nextcloud log in Admin > Logging:

Doctrine\DBAL\Exception\ConnectionLost: An exception occurred while executing 'SELECT `filecache`.`fileid`, `storage`, `path`, `path_hash`, `filecache`.`parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `metadata_etag`, `creation_time`, `upload_time` FROM `oc_filecache` `filecache` LEFT JOIN `oc_filecache_extended` `fe` ON `filecache`.`fileid` = `fe`.`fileid` WHERE (`storage` = ?) AND (`path_hash` = ?)' with params [5, "c6824ec9d0d18193586e90315284931f"]: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "***REMOVED SENSITIVE VALUE***"
        ],
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***",
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "21.0.0.11",
        "overwriteprotocol": "https",
        "overwrite.cli.url": "https:\/\/0.0.0.0",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "skeletondirectory": "",
        "filelocking.enabled": true,
        "filelocking.debug": false,
        "memcache.local": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "htaccess.RewriteBase": "\/",
        "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory",
        "app_install_overwrite": [
            "files_automatedtagging",
            "socialsharing_email"
        ],
        "maintenance": false,
        "data-fingerprint": "9d8df6d381f52560cd2ed6712ab596bc",
        "updater.release.channel": "beta",
        "theme": "",
        "loglevel": 2,
        "redis.cluster": {
            "seeds": [
                "***REMOVED SENSITIVE VALUE***:6379",
                "***REMOVED SENSITIVE VALUE***:6379",
                "***REMOVED SENSITIVE VALUE***:6379",
                "***REMOVED SENSITIVE VALUE***:6379",
                "***REMOVED SENSITIVE VALUE***:6379",
                "***REMOVED SENSITIVE VALUE***:6379"
            ],
            "timeout": 0,
            "read_timeout": 0,
            "failover_mode": 1
        }
    }
}

php.ini settings:

post_max_size = 16384M
upload_max_filesize = 16384M
max_file_uploads = 65536
memory_limit = 2048M
max_input_time = 3600
max_execution_time = 3600
output_buffering = 0

opcache.enable=1
opcache.enable_cli=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=10000
opcache.save_comments=1
opcache.revalidate_freq=1

php-fpm settings:

pm = dynamic
pm.max_children = 120
pm.start_servers = 12
pm.min_spare_servers = 6
pm.max_spare_servers = 18

Is there a reason you use the beta-version? Did it work in the regular version and is just related to the beta version (-> that’s more development and regression problems should be discussed directly with developers).

Here just your database went away, you might want to check why this is a case, is there a just a timeout (not configured correctly, other problems why the query takes so long) or did the database crash entireley?

It occured in stable release too. So i tried beta channel in hope that the problem will be gone.

Galera cluster Mysql is running fine and not crashing, haproxy too. I moved mysql data to optane volume and the problem still happened.

Now i created redis cluster and configured memory caching in nextcloud. Moved php session handling to redis too.

Seems like the problem is not happening anymore. But there was a single deadlock case when uploading large 280GB folder.

Also configuring haproxy to only one active mysql backend server and two backup servers reduced deadlocks alot.

If i configure least conn load balancing and all mysql backend servers as active then i get lots of deadlocks when uploading files.
Note. Mysql transaction isolation is configured as read-commited.

I configured timeouts according to nextcloud big file uploads documentation. Is there also some timeouts for mysql to configure?

There are larger setups that seem to run fine with current versions.
For the php<->mysql timeouts => PHP: mysqli::options - Manual