Nextcloud refuses to connect to my Collabora server

OK I admit I’m a bit of a cowboy and do try to make things hard at times for myself. Still, if you can prioritise lending a hand over judging and preaching I have what is on the face of it a rather simple problem but which demands one step more expertise or documentation than I have or have found. Which is the point I turn to a form plea or github issue so here I am.

Basically it’s a common problem, and looks like this:

Not rocket science. A simple enough error and one reported over and over. Yet, having followed many guides already, read many reports I have not nailed this one. Let me explain, you can even try some of it yourself. The server in question looks fine and seems to work fine.

$ systemctl status loolwsd
● loolwsd.service - LibreOffice Online WebSocket Daemon
     Loaded: loaded (/lib/systemd/system/loolwsd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-09-08 20:51:52 AEST; 15min ago
   Main PID: 1487893 (loolwsd)
      Tasks: 7 (limit: 18988)
     Memory: 106.7M
     CGroup: /system.slice/loolwsd.service
             ├─1487893 /usr/bin/loolwsd --version --o:sys_template_path=/opt/lool/systemplate --o:child_root_path=/opt/lool/child-ro>
             ├─1487915 /usr/bin/loolforkit         --losubpath=lo --systemplate=/opt/lool/systemplate --lotemplate=/opt/collaboraoff>
             └─1487917 /usr/bin/loolforkit         --losubpath=lo --systemplate=/opt/lool/systemplate --lotemplate=/opt/collaboraoff>

Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487918 2020-09-08 10:52:05.572178 [ accept_poll ] DBG  StreamSocket ctor #21|>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487918 2020-09-08 10:52:05.572261 [ accept_poll ] DBG  Accepted socket has fa>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487918 2020-09-08 10:52:05.572295 [ accept_poll ] DBG  Accepted client #21| n>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487918 2020-09-08 10:52:05.572323 [ accept_poll ] DBG  Inserting socket #21 i>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487918 2020-09-08 10:52:05.572352 [ accept_poll ] DBG  #21 Thread affinity se>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572485 [ websrv_poll ] DBG  #21 Thread affinity se>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572676 [ websrv_poll ] INF  #21: Client HTTP Reque>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572716 [ websrv_poll ] INF  Handling request: /loo>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572758 [ websrv_poll ] INF  Admin request: /lool/a>
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572784 [ websrv_poll ] INF  Admin::handleInitialRe>

It responds fine and you cna check this yourself:

https://cadmus.thumbs.place/hosting/discovery
https://cadmus.thumbs.place/hosting/capabilities
https://cadmus.thumbs.place/loleaflet/dist/admin/admin.html

Even this:

https://cadmus.thumbs.place/lool/adminws

while hanging in the browser, reports credibly fine activity in the collabora debug log when loaded:

Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572716 [ websrv_poll ] INF  Handling request: /lool/adminws| wsd/LOOLWSD.cpp:2291
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572758 [ websrv_poll ] INF  Admin request: /lool/adminws| wsd/LOOLWSD.cpp:2330
Sep 08 20:52:05 nephele loolwsd[1487893]: wsd-1487893-1487919 2020-09-08 10:52:05.572784 [ websrv_poll ] INF  Admin::handleInitialRequest bad request| wsd/Admin.cpp:376

bad request I presume as this interface wants some more arguments operating as a web socket.

Now I know I’m taking some risks here because, wait for it, this is not behind Apache, nor Ngninx. In fact it matters not what it’s behind, but all the guides provide Apache and Ngninx server configs and none that I have found actually explain WTF they do, or said another way, what Collabora needs. So I’m left guessing a little.

For the most part it’s a simple reverse proxy of requests to port 9980 but two cases, often described as the Admins console websocket (tested above) and the Main websocket, the samples seem to add to headers to the request like:

Connection: Upgrade
Upgrade: upgrade

if I read the Nginx configs right, though Apache configs seem not to worry about it. And this is kind of shady territory as I’m not real sure what they are for, and how necessary and for what etc this is. But I do my best to emulate that and have tried a few variants but are also hard to test as I would need collabora to log the full request with headers its received in order to confirm what it sees.

For now though I’m blown away that I can even use the WOPI discovery URL to guess at a URL like:

https://cadmus.thumbs.place/loleaflet/ed4f732/loleaflet.html?Test.ods

And while I’m no pro and clearly have that WOPI URL mangled (am only guessing at what comes after ? in a WOPI URL. But the stunning thing is even that URL opens a clear LibreOffice sheet menu system with one error message about bad WOPI paramaters.

Now slowly you’re getting my drift. Everything I can test so far works just fine. Still, Nextcloud is unhappy. So the question is, “What does nextcloud need and how to diagnose that?” To find out I tried this:

journalctl -fu loolwsd

With the log level on collabora set to debug (trace just floods the log with pulses). Anyhow this watches the log and I can watch it as I do various requests in the browser. In this case I watch it while clicking the Save button on the Collabora Online setup I shared a screenshot of above … thnking that clicking Save does something that returns with that response: Could not establish connection to the Collabora Online server.

Only clicking Save produces precisely zero response in the log, not the collabora log above nor the Nextcloud log. Suggesting that whatever it’s doing does not even get as far as collabora. So I check my web sever access and error logs, watching them as I click Save, also no sign of life.

So, we pull out the big guns and sniff the network traffic clicking Save produces. Turns out it’s a POST to:

https://mynextcloud.tld/index.php/apps/richdocuments/ajax/admin.php

albeit a cryptic kind of post, with no data posted, and only two headers that seem to carry any consequence a Cookie and a requesttoken.

Either way the post returns 500 error and the browser console even graces us with some feedback:

Error: Request failed with status code 500
    exports createError.js:16
    exports settle.js:17
    onreadystatechange xhr.js:61
    exports xhr.js:36
    exports xhr.js:12
    exports dispatchRequest.js:52
    promise callback*c.prototype.request Axios.js:61
    e Axios.js:86
    exports bind.js:9
    n AdminSettings.vue:445
    c runtime.js:45
    _invoke runtime.js:274
    t runtime.js:97
    j admin.js:450
    i admin.js:450
    I admin.js:450
    I admin.js:450
    updateSettings AdminSettings.vue:442
    t AdminSettings.vue:431
    c runtime.js:45
    _invoke runtime.js:274
    t runtime.js:97
    j admin.js:450
    i admin.js:450
    I admin.js:450
    I admin.js:450
    updateServer AdminSettings.vue:428
    submit AdminSettings.vue:1
    VueJS 3
AdminSettings.vue:437
    t AdminSettings.vue:437
    c runtime.js:45
    _invoke runtime.js:274
    t runtime.js:97
    j admin.js:450
    a admin.js:450
    (Async: promise callback)
    j admin.js:450
    i admin.js:450
    I admin.js:450
    I admin.js:450
    updateServer AdminSettings.vue:428
    submit AdminSettings.vue:1
    VueJS 3
        He
        n
        _wrapper

I wish there was some greater clue. I kind of wish the message Nextcloud provided was a tad more useful.

As I’m not using Apache or Nginx, my intuition suggests the issue lies in the request preparation and specifically in what these Nginx configs do:

    # main websocket
    location ~ ^/lool/(.*)/ws$ {
        proxy_pass http://localhost:9980;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $http_host;
        proxy_read_timeout 36000s;
    }

    # Admin Console websocket
    location ^~ /lool/adminws {
        proxy_pass http://localhost:9980;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $http_host;
        proxy_read_timeout 36000s;
    }

As these are recommended in all the guides. Yet none I have found explain what it is they do, and moreover what it is Collabora and/or Nextcloud need or expect.

I have what seems a fully functional collabora server that the Nexcloud is snubbing and it’s not being clear why. Forsooth. Any guidance is appreciated. I’m slowly stuck though can debug the PHP /apps/richdocuments/ajax/admin.php that is producing the 500 error. I’m hoping there’s a lower effort path ahead with some support than debugging the nextcloud app … hmmm.

My crystal ball says the PHP is sendinga request to the collabora server and not getting the response it deisres, but without debugging the PHP I can’t see what request it’s sending or what expectation it has that is unmet.

Looks like I’ve stumped the collective audience here! But I did drill down into the PHP as stated above with a debugger (I run Eclipse with PHP Development Tools and Chromium on the client side, and xdebug on the server side) and aside from an expensive mire of indirections and request handling procedures that it sucks you down into, it becomes clear that what should be and isn’t documented is that the URL it tests for connection is in fact the discovery URL:

https://cadmus.thumbs.place/hosting/discovery

and the reason it fails which very very badly reported in the UI (because it’s lumped into the general category of 500 errors before delivery to the UI, grrrr) is because the SSL cert did not validate. Doh! If only it had just said so.

I didn’t notice the little insecure warning in the browser it seems. It’s quite subtle. What was happening was my webserver delivered the wrong cert and there was a caching issue that took a while to resolve before I got it delivering the right one! (this was a hangover from the collabora install where I installed a spare cert I had just to get rolling before I went and generated a valid one for the server, and when I did I forgot to point the webserver at it sill me).

So now it connects fine!

Alas doesn’t work (yet). Next error has to do with the “Content-Security-Policy” header though, it’s blocking the load of the collabora services. 'll fix that next and see how far it gets.

Added the the Collabra server to the Content Security Policy for frame-src and it now sort of work … well sort of. Look good until I try and actually edit a file. I get the empty Collabora frame (well menus and such, like LibreOffice sheet say without the sheet :wink:

And the reason is the call to the main websocket returns 400. Bad Request. So looks like it boils down to this nginx config:

    # main websocket
    location ~ ^/lool/(.*)/ws$ {
        proxy_pass http://localhost:9980;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $http_host;
        proxy_read_timeout 36000s;
    }

and how, given I’m not using nginx I do the same. Which is where providing a template like this without an explanation is not very helpful to some of us. But I’ll read up on nginx and what this exactly does and how I might do same.

The equivalent Apache configs are generally cited as:

  # Main websocket
  ProxyPassMatch "/lool/(.*)/ws$" ws://127.0.0.1:9980/lool/$1/ws nocanon

nocanon is documented:

http://httpd.apache.org/docs/current/mod/mod_proxy.html

and for the nginx configs $http_upgrade is sort of documented:

still, what I’d give to see what a full request (with headers) looks like. I’ll keep drilling.

And it is ALL solved. What the docs don’t state clearly and should is that it is kind of this simple:

  1. Two types of request hit the collabora server, standard (GET/POST style) and websocket requests.
  2. They all go through port 9980 but …
  3. Websocket requests have a request header “Upgrade: websocket” which has a single hop lifetime. So your reverse proxy needs to re-apply this header if present. It must be accompanied with a “Connection: upgrade” header.

Get that right and all works fine! Yeah!

One last learning getting this up:

  1. In your Collabora config (/etc/loolwsd/loolwsd.xml) make sure that under storage/wopi you specifically allow both the collabora server and nextcloud servers. Or you’ll be looking WOPI authorization issues in the face.

My Collabora for Nextcloud seems now to be fully functional. Let’s see what surprises it throws up next ;-). That was an ordeal. And it could have been made a 15 minute job but for two things (hint, feedback to Nextcloud and Collabora)

  1. When attempting to connect Nextcloud to a Collabora instance Nextcloud tries to access ‘/hosting/discovery’. If there are any errors in the response these are tossed out and a 500 error is sent tot he Nextcloud UI. Tsk, tsk. Bad move costing significant tracing and diagnostic effort to even learn this much let alone find what the error actually was and fix it.

  2. To access the Collabora server you should use a reverse proxy which forwards all requests to port 9980 on the collabora server, but for URLs matching (^/lool/adminws|^/lool/(.*)/ws) make sure to include the headers “Connection: upgrade” and “Upgrade: websocket” in the forwarded request, and make sure that both the collabora server and nextcloud server are enabled WOPI hosts in the Collabora config (/etc/loolwsd/loolwsd.xml/storage/wopi).

These two tips will help anybody trying to install Collabora for Nexcloud behind a server/proxy that isn’t Apache or Nginx … (yes the world is not as tidy as it might be).

This is a reply of deep gratitude! I was about to give up on nextcloud/collabora, the documentation for both is appalling. I have been trying for weeks to make things work behind an apache reverse proxy serving a number of separate machines. Originally, I had nextcloud and collabora on the same machine behind the proxy. No joy! Even though many times i thought I was almost there. In the end I put collabora on the proxy server but still nothing worked until I added the nextcloud server to the allowed WOPI hosts. It seems OK now but, like you, I’m expecting problems. Does your setup still work?
The main problem for me (I’m a lawyer not a developer - although I have long worked with Linux in the hope of one day being free of Microsoft) is that the documentation provides no ‘big picture’. Up until now I was not even sure it was possible to get it working unless everything was in the same box and ‘docker’ I found to be a nightmare. Anyway, many thanks again.

Yes indeed, has been working fine since the. I run it all behind lighttpd myself and have Collabora and Nextcloud on the same box albeit with different hostnames. More than happy to share all the configs above, with the caveat that mine are lighttpd configs on the gateway (reverse proxy) and the server proper.