Talk + external signaling server (HPB) - only working with mobile app

Hi Talk users & developers,

I’ve finally succeeded in installing the HPB on a separate machine. So far, everything looks fine: Janus, NATS and nextcloud-signaling services are running and listening to the right ports. coTURN is still installed on my NC machine and working fine as well. Ports are open and checking coTURN and the external signaling server via NC admin GUI in “Settings - Talk” returns OK.

So good so far.

The problem is: everytime I want to start a call using the NC Talk GUI in my web browser (Firefox 84 or Chrome 87), NC returns a “Failed to establish signaling connection. Retrying…” and nothing happens.

Bildschirmfoto zu 2021-01-04 20-11-40

Then when I use my mobile phone’s Talk app (latest version) it works pretty well (with some of the usual hiccups like poor stability and poor connectivity). But basically, it works and it works equally well in both my local network and in my mobile carriers network.

But why doesn’t Talk connect when I use it via NC Hub’s browser based GUI?

To give an overview, here’s what I’ve observed:

Desktop Windows Firefox via local network: Failed to establish signaling connection.
Desktop Windows Edge via local network: Failed to establish signaling connection.
Desktop Ubuntu Firefox via local network: Failed to establish signaling connection.

Desktop Windows Firefox via remote (mobile carrier) network: Failed to establish signaling connection.
Desktop Windows Edge via remote (mobile carrier) network: Failed to establish signaling connection.

Mobile Android App via local network: working
Mobile Android App via remote (mobile carrier) network: working

Mobile Webbrowser Firefox Android via local network: Failed to establish signaling connection.
Mobile Webbrowser Chrome Android via local network: Failed to establish signaling connection.
Mobile Webbrowser Firefox Android via remote (mobile carrier) network: Failed to establish signaling connection.
Mobile Webbrowser Chrome Android via remote (mobile carrier) network: Failed to establish signaling connection.

Logs do not look suspicious and there aren’t any error messages. It’s just the NC GUI which returns this “Failed to establish signaling connection” notification in Talk and then keeps waiting forever.

I use the latest versions of NC, Firefox, Edge, Chrome, Windows 10 and Ubuntu. Android version is 10.

The signaling server is setup on a 64-bit-ARM-Ubuntu machine using Janus 10.5 and the spreed standalone signaling server 0.2.0. NATS is also the latest version and is set up as a docker container.

Thanks in advance for your thoughts and hints!

Niklas

1 Like

I’ve been struggling with a similar looking error for days now. I thought the signaling server was reachable only from the internet (=outside my local network) and played for example with the dns rebind protection options of my router.

Your post then made me check if the mobile phone with wifi turned on is working, which is the case. Then Somehow the WebSocket protocol came into my mind and I remembered that I’m forwarding a port to my server. Bypassing my Traefik-Reverse Proxy which is terminating the TLS connections and routing to the NC resources in the local network.

Long story short:

Have a look at your browser config (about:config) and check the parameter “network.websocket.allowInsecureFromHTTPS”. If it is FALSE you have three options:

  1. Fix your certificate issue
  2. Set it to TRUE
  3. Get WebSocket protocol through your TLS reverse proxy???

Sadly I don’t understand this whole setup (nats, coturn, janus, spreed) and therefore the risk if it would be acceptable to set the parameter to TRUE.

Is there someone that can bring some clarity to these components in view of functionality? What is the purpose of each?

Bobby,

thanks for your reply!

I’ve just changed “network.websocket.allowInsecureFromHTTPS” to TRUE but that didn’t help. Finally, I put it back to FALSE as I’m also not fully aware of the consequences for my setup (which follows pretty much the standard setup for the external signaling server)

Nevertheless, I think you’re on the right track with your suggestion that the certificate and/or the websocket may have something to do with my problem.

I’ll check that over the course of the coming days and will report my findings here :slight_smile:

Cheers,

Niklas

It might help if you share more information of your whole setup. Private or corporate network? Internet via FritzBox/…? Ports? Dockerfile or docker-compose file?

It’s a private network based on FritzBox 4040 connected via LTE Huawei E3372:

Nextcloud Server (incl. coTURN): LTE->FritzBox4040->Windows Host->VirtualBox->Debian 10.3 x86 guest
Ports open and forwarded to server: 443 (HTTPS), 3478 TCP (TURN), 3478 UDP (TURN)

Spreed Standalone Signaling Server (incl. Janus, NATS):
LTE->FritzBox4040->Ubuntu 20.04 LTS arm64
Ports open and forwarded to server: 8443 (HTTPS)

default-ssl.conf:

ProxyPass “/standalone-signaling/” “ws://127.0.0.1:8086/”
RewriteEngine On
RewriteRule ^/standalone-signaling/spreed$ - [L]
RewriteRule ^/standalone-signaling/api/(.*) http://127.0.0.1:8086/api/$1 [L,P]

ports.conf:
Listen 8443

Janus 10.5 Gateway Installation via ppa:fancycode/janus
janus.jcfg:
stun_server = “mydomain.net
stun_port = 3478
nice_debug = false
full_trickle = true
turn_server = “mydomain.net
turn_port = 3478
turn_type = “udp”
turn_rest_api_key = “mykey12345”
janus.transport.http.jcfg:
json = “indented”
base_path = “/janus”
http = true
port = 8088
ip = 127.0.0.1
https = false
janus.transport.websockets.jcfg:
ws = true
ws_port = 8188
ws_ip = 127.0.0.1
wss = false

NATS installation via docker pull nats:latest
docker run --name=natsserver -d -p 127.0.0.1:4222:4222 -ti nats:latest

Spreed standalone signaling server installation via make build from https://github.com/strukturag/nextcloud-spreed-signaling.git

server.conf:
[http]
listen = 127.0.0.1:8086
[sessions]
hashkey =
blockkey =
[backend]
backends = backend1
[backend1]
url = https://mydomain.net/nextcloud
secret = <secret 1>
[nats]
url = nats://localhost:4222
[mcu]
type = janus
url = ws://127.0.0.1:8188
[turn]
apikey = <apikey
secret = <secret 2>
servers = turn:mydomain.net:3478?transport=udp,turn:mydomain.net:3478?transport=tcp

That’s pretty much a standard setup… :expressionless:

Thanks,
Niklas

Okay, in my words: You’re running two machines behind your router. Each one is addressed on https using different ports.

To sum up:

  • 443/https routes to NC on Debian
  • 3478/tcp+udp routes to TURN on Debian
  • 8443/https routes to the signaling installation on Ubuntu
    – Webserver listens to 8443 and routes to 8086 on localhost

Idea:

  • Choose “localhost” or “127.0.0.1”. I think it’s not a problem, but just to be sure. Look at “docker run” and config file for NATS.

Unclear:

  • How did you set the config for Signaling and TURN in NC settings?
  • Where does TLS termination happen for both hosts? Where is the config?
  • Do the quick checks on the config site work?
    grafik
    grafik

And appeding to the ‘Unclear’ Section :wink:

  • Did you have a look into your browser log already, when starting the call?

Background: I had a similar problem today, even that it was not exactly the same error-message. I had the signaling set up in Talk-Settings, getting the Checks working, but no call established. In the the browser log i found somewhen an error, showing me my problem. :wink:

Bobby:

  • I’ve now replaced all 127.0.0.1 by “localhost” and then changed it to “127.0.0.1” but both ways haven’t solved the issue.

  • NATS is listening to 127.0.0.1:4222 as it should

  • the config for signaling and TURN in NC->Settings->Talk:
    TURN:
    mydomain.net:3478 ===> Ok
    config:
    listening-port=3478
    fingerprint
    use-auth-secret
    static-auth-secret=
    realm=mydomain.net
    total-quota=0
    bps-capacity=0
    stale-nonce
    syslog
    no-multicast-peers
    external-ip=192.168.178.25/10.0.2.15
    no-tlsv1
    no-tlsv1_1

SIGNALING:
https://mydomain.net:8443/standalone-signaling ===> Ok
I did NOT activate the validation of the SSL certificate. When I activated it, NC returned “unknown error” which didn’t bother me because 8443 is routed to ws://127.0.0.1:8086 and ws doesn’t use a certificate. Hence I decativated the validation and then I got “OK”. Was this right? Or was NC’s objection of the certificate rather related to my default https snakeoil certificate than to my non-existing wss certificate?

Jotoeri:
Browser’s web console returns the following error when trying to access the call:

Firefox kann keine Verbindung zu dem Server unter wss://mydomain.net:8443/standalone-signaling/spreed aufbauen. signaling.js:573:15

error { target: WebSocket, isTrusted: true, srcElement: WebSocket, currentTarget: WebSocket, eventPhase: 2, bubbles: false, cancelable: false, returnValue: true, defaultPrevented: false, composed: false, … } signaling.js:593:10

Could it be that the NC web browser client is trying to establish the connection using wss whereas I’ve configured ws only and via https (see default-ssl.conf above): -> https://mydomain.net:8443 -> ws://127.0.0.1:8086 ???

Is this the same issue which you’ve encountered?

Thanks,

Niklas

You stated:

I did NOT activate the validation of the SSL certificate. When I activated it, NC returned “unknown error” which didn’t bother me because 8443 is routed to ws://127.0.0.1:8086 and ws doesn’t use a certificate. Hence I decativated the validation and then I got “OK”. Was this right? Or was NC’s objection of the certificate rather related to my default https snakeoil certificate than to my non-existing wss certificate?

Just to be sure, the certificates at :8086 are self-signed, right?
If yes try to import your certificate to your browser like the following.
It it is working then you should think about getting proper CA signed certificate.
For example by using Let’s encrypt.

Fettgedruckter Text
Just for completion:
When I join a conversation js successfully connects using this URL:
Connecting to wss://sub.domain.tld:443/spreed for …

yes, the SSL certificate for this machine is self-signed. It’s the standard certificates which comes with Ubuntu 20.04 Server. In regards to ws://127.0.0.1:8086 I’m unsure if this certificate would apply. I thought that in my setup this certificate would only apply to https traffic via port 8443. Furthermore, I thought that I don’t need to setup wss and a certificate for using the signaling server. Perhaps I’m wrong, I’m absolutely no professional in regards to those kind of issues.

I’ve just tried to import the ssl-cert-snakeoil.pem as you’ve suggested but the browser returns that this is not a certificate for a certification authority (Zertifizierungsstelle).

Nevertheless, do you think, it’s worth trying to replace it with a Let’s Encrypt certificate and then activate the SSL validation in NC settings for Talk?

Thanks,
Niklas

If I understand you correctly the signaling server is listening both on 8443/https and 8086/http? And you just redirected https traffic to the http port? That surely doesn’t work then. If your browser initiates a encrypted connection there must be the corresponding endpoint listening to it.

The certificate in your ubuntu might be in the wrong format. I’m not an expert for certs either.

In my opinion more security is always worth it :slight_smile:

To get a bit off-topic within this thread:

Regarding certs:
In your case it depends on the use case of the NC installation. Test setup for experimental use? Go for unsecured if you must. Production ready with users in real life? TLS is an absolut must!

Does your NC has a certificate or are you accessing it unsecured, which I would totally not recommend to do? How did you create that certificate?

Regarding Let’s encrypt:
I’m using a dockerized reverse proxy setup with three containers, no dealing with certificates, loving it! :smiley:

  • Traefik as reverse proxy
  • NC
  • Signaling + STUN + TURN

Would also be interesting to understand why you are running it in a VM and on different hosts?

Huh, ok. So -

I think it would be worth a try to create a letsencrypt cert. Not that much of a work, but afterwards you can cancel out, this is the error.
For letsencrypt, i use certbot, installed from snap as recommended on their page:

# Install up-to-date snap
apt install snapd fuse; snap install core; snap refresh core
# Just to be sure, there is no apt certbot package
apt remove certbot
# Install certbot
snap install --classic certbot

Then you can create your cert with certbot certonly --apache. That’s it, quite easy stuff, even renewal should be set automatically. But prove that, once everything is working.

However, if i see that right, you use an apache reverse proxy in front, no? Then apache handles the ssl part and it is just perfectly right, using only ws socket between apache and signaling, as this happens within the same machine and ssl would just produce overhead.
Which domain did you configure in Talk? And just a dumb question to be sure - You seem to use apaches default-ssl.conf -> Did you set the VHost to listen on 8443, too?

Bobby,
no, the signaling server is listening on 8443/https only and then through Apache’s default-ssl.conf (“ProxyPass”) the traffic is reverse proxied to ws://127.0.0.1:8086. There’s no additional certificate beside the SSL snakeoil certificate which came with Ubuntu 20.04 Server and which is referred to within default-ssl.conf to secure 8443/https.

And yes, this setup is for experimental use. It’s a prototype which will be migrated to a professional surrounding ( VM) at a later stage.

The NC server on Debian (VM) is productive and has it’s own Let’s Encrypt certificate. I installed it using apt install certbot python-certbot-apache && certbot certonly --apache

My idea was to install the signaling server separately and to leave the NC server untouched. This ought to have helped in identifying issues if things didn’t not work out as planned. For the signaling server I chose a small physical machine over a VM machine. The intention was again to reduce complexity for setting up and testing this prototype by avoiding the additional layer (NAT) from the virtual host.

Jotoeri,
yes, I use the apache reverse proxy and it’s listening to 8443!

I’ve just installed the letsencrypt certificate on the signaling server’s apache and this seems to have solved a big part of the problem! :+1:

The initial error message is gone now and connection is established. Participants can join the call!

But, similar to the initial problem, the audio and video data reaches the android app only. Browser based clients do not receive any audio or video data. And again, this is not related to the type of internet connection. (local/remote/WiFi/LTE)

Firefox Web console says:

Content Security Policy: Die Einstellungen der Seite haben das Laden einer Ressource auf eval blockiert (“script-src”). [1aa60d21-8dd3-40bb-bb16-fb86167082e6:27:22]

As far as I know, content security policy is handled on the server’s side. Probably, this error is related to the upgrade to NC20.0.5 this morning…

The error implies that there was an attempt to call an external script. I don’t know what that could be:

The corresponding source code is:
contentWindow.eval(
“(” + injectedToString() + “)(’” + eventName + “’, true);”
);

When using Edge, it’s working fine! :+1:

To sum up:

  • problem is solved - obviously NC Talk objects a self-signed certificate and replacing it with a letsencrypt certificate did the trick
  • Firefox blocked data and seems to be more restrictive than Edge and the Android Talk app in regards to content security. Could it be that this is a bug in NC 20.0.5 and shall I report it?

Anyway, thank you both for your help, I really appreciate it! :+1:

Cheers,
Niklas

1 Like

no, the signaling server is listening on 8443/https only and then through Apache’s default-ssl.conf (“ProxyPass”) the traffic is reverse proxied to ws://127.0.0.1:8086.

If you proxy the traffic to a port where no one is listening … how is the traffic supposed to be heard? :smiley: Might also be the problem which was solved by using the cert bot.

I would not call it a bug if an unsigned certificate (through a certificate authority) is not working properly. This is obviously a browser configuration and has rather less to do with nextcloud.

You’re right and I was imprecise - the signaling server is listening on 127.0.0.1:8086

Regarding the bug: I was referring to the second issue where Firefox blocked a part of NC code and referred to the content security policy. In my opinion, this issue is unrelated to the certificate.

It’s rather a question of NC’s content security policy which usually is defined server side. In my particular case, Firefox seems to try to prevent cross-site-scripting whereas Edge doesn’t see a problem.

It’s just that I’m unsure if there’s really cross-site-scripting ocurring on my NC server as implied by the Firefox error.