"connection to server lost" and can't find the solution

Hi all,

I’m having issues with my locally hosted self built install of Nextcloud. This has been going on for some time but it really is getting irritating now. I can’t add new users, I can’t add new devices and I can’t change the config.

Here’s the config specs:
Nextcloud version (eg, 12.0.2): 17.0.1
Operating system and version (eg, Ubuntu 17.04): Ubuntu 18.04.3
Apache or nginx version (eg, Apache 2.4.25): NGINX 1.14.0
PHP version (eg, 7.1): 7.2
MySQL Server: 5.8

The issue you are facing:
While logging in on the web console, user or admin get message “Connection to server is lost”. It is not possible to authenticate new devices, view folders or files via webserver or retrieve config info via the website.
However
File sync with already known clients works fine.

Is this the first time you’ve seen this error? (Y/N): N, had the same issue in v16. In fact, that is where it started.

Steps to replicate it:
The server is a virtual machine running on VMware vSphere (ESXi + vCenter) v6.7.
The situation occurred when there was a network disruption and the SAN connection was severed. The Nextcloud VM was corrupt. The VM was recovered using an Ubuntu recovery CD. Since then the issue exists. We recently upgraded from v16.x to v17.0.1. The error remained. It feels like a database connection problem or comparable.

The output of your Nextcloud log in Admin > Logging:

Can't retrieve logging, "Connection to server lost"

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'instanceid' => 'ocd28a9v7r7m',
  'passwordsalt' => 'k6kDR+X217NYmnaaTPqE8n+tnEgI5m',
  'secret' => '88888888',
  'trusted_domains' =>
  array (
    0 => 'lair.internetdomain.net',
    1 => '172.16.100.80',
  ),
  'datadirectory' => '/cloudstore/',
  'dbtype' => 'mysql',
  'version' => '17.0.1.1',
  'overwrite.cli.url' => 'https://lair.internetdomain.net',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'nextclouduser',
  'dbpassword' => '************', <I removed the passwordP
  'installed' => true,
  'mail_smtpmode' => 'smtp',
  'mail_sendmailmode' => 'smtp',
  'mail_from_address' => 'filecloud',
  'mail_domain' => 'internetdomain.net',
  'mail_smtphost' => '10.10.10.40',
  'mail_smtpport' => '25',
  'twofactor_enforced' => 'false',
  'twofactor_enforced_groups' =>
  array (
    0 => 'users',
    1 => 'admin',
  ),
  'twofactor_enforced_excluded_groups' =>
  array (
  ),
  'mysql.utf8mb4' => true,
  'maintenance' => false,
  'data-fingerprint' => '9e7450beb58d8e188f7a29f5b3cf7d93',
  'logtimezone' => 'Europe/Amsterdam',
  'log_type' => 'file',
  'logfile' => '/var/log/filecloud-debug.log',
  'syslog_tag' => 'filecloud',
  'loglevel' => 0,
);

The output of your Apache/nginx/system log in /var/log/____:

**Error.log:**
2019/11/17 22:59:34 [crit] 17747#17747: *48365 SSL_do_handshake() failed (SSL: error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol) while SSL handshaking, client: 93.113.125.89, server: 0.0.0.0:443
2019/11/18 00:47:05 [crit] 1554#1554: *1 connect() to unix:/run/php/php7.2-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.40.45, server: , request: "PROPFIND /remote.php/dav/files/alex/ HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "lair.internetdomain.net"

That’s strange. Your file system shouldn’t be that fragile.

In any case, I would consider setting up a fresh installation and migrating to it, if your VM was that badly damaged.

Well after I fixed the VM itself, which is now running fine, Nextcloud is what is the remaining problem. I have no reason to assume that the VM has sustained permanent damage in any way. I have also taken additional measures so I have more options in case the issue happens again. That is good for the next time but it does not solve the problem now.

I did my best to secure the install. I’ve enabled HTTP2, SSL, etcetera. Also enabled 2-factor authentication. It may or may not have anything to do with the issues I see, but as I cannot modify the configuration via the web gui and I otherwise do not know where to start, it is hard to troubleshoot.

i’d look into this error:

SSL routines:tls_early_post_process_client_hello:unsupported protocol)

maybe you’ve configured your server a little too securely (some clients need older/less secure protocols)?
you can also debug this with the openssl s_client command (i did this a long time ago so i do not know its syntax right now).
GOOD LUCK!

Hey Pete,

You could be right. Then again, it worked like a charm prior to the crash. It would be strange that it suddenly would not. Just to be sure, I’ll give it a go. Thanks!

ok, fixed the SSL errors… no joy on the rest.

So, I tried a couple of small things (enter the MySQL port in the config.php, adding the internet name to the hosts file after local host) No joy with those

I tried also to throw away my config.php and have the system generate a new one. After logon with the added admin account, the first message you see is “Connection to server lost”.

So, the issue does not seem to be found in the config.php file. Still something is going wrong.

Maybe another sign that helps to tell what is wrong: in the interface, the notification icon is not loading or being displayed. All other icons to the left are fine.

My concern is if a SAN connection loss ripped your file system a new one that badly, and you repaired the OS with a recovery disc, corruption in your database likely remains unaddressed. That could potentially be involved in your problems. Unfortunately I don’t think I can personally offer any guidance on that one besides try to export the DB and load it on a new VM and see if that fixes it (or carries the problem to the new system). In any case it would help narrow it down.

Thanks Karl, it’s highly appreciated anyway. Sometimes troubleshooting is just having someone to bounce your ideas off of. :slight_smile: Unfortunately this time it did not bring me further. And I agree with your assessment, however I would also like to know WHAT goes wrong so I can fix it when I run into it again and the server isn’t as easily rebuilt as this one.

So I’ll keep a copy of the server around, if anyone has an idea. In the mean time I’m buiding a new one next to it.

Well… You have a layer of VMFS and then a layer of I assume ext4? Both of which are journaling file systems that should (should) be able to weather a disruption like that. But file system journaling is just like the seatbelt in your car. It’s great to have, but you don’t really want to find out if it works as expected.

Don’t forget to address whatever caused your storage disruption too. What you’re describing is some nightmarish data corruption.