Intermittent This document could not be saved error

About 3 out of 5 times when I try to open a document on my NextCloud server using OnlyOffice I get the dreaded error:
“The document could not be saved. Please check connection settings or contact your administrator…”

NextCloud V24.0.7
OnlyOffice Plugin V7.5.8
OnlyOffice V7.2.1.34

NextCloud and OnlyOffice run on separate servers.

Both servers are fully patched Ubuntu 20.04

The certificates are fine on both servers

Both servers have the same ‘secret’

Running this from the NextCloud server
curl ‘https://only.office.mydomain.com/healthcheck
gives:
true

sudo -u www-data ./occ onlyoffice:documentserver --check
gives:
Document server https://only.office.mydomain.com/ version 7.2.1.34 is successfully connected

Both servers can see each other at all times because they are VMs on the same subnet on the same ESXi server

It seems like a very common problem but what I don’t understand is that it doesn’t ‘always’ do it on my system.

Is there a problem with the latest version of something?

What else can I check/try?

I have done more investigation. I pulled a copy of the NextCloud and OnlyOffice VMs down from the server and run them up in a test environment on my workstation. Everything seems fine there so it’s something they don’t like on the ESXi environment. Doesn’t make a lot of sense though, running under ESXi they have excellent connectivity including redundant NICs and switches.

Just can’t see what the problem could be.
As far as I can tell to get OO and NC to work together we need:

  • NextCloud server can connect to OnlyOffice Server - checked, works
  • OnlyOffice server can connect to NextCloud Server - checked, works
  • User can https to NextCloud server - checked, works
  • User can https to OnlyOffice server - checked, works

What am I missing?

OK, I have solved my problem.

The problem was that I had the NextCloud server performing time sync and I had ESXi syncing the VM as well. I turned off the ESXi to VM time sync and the problem went away.

I started noticing that an application in another VM which uses a timer that should have accuracy within the second was only accurate to within 30 seconds. That lead down a long winding road and I discovered that it is very bad to have both the guest OS and ESXi attempt to keep the guest OS time synced.

After fixing that, and thereby my timer issue, I then thought maybe that affected the NextCloud/OnlyOffice networking issue. After testing and letting it run in this new arrangement for nearly a week I have not had the error since.

Hopefully this will help someone else running these on ESXi