Coturn on VPS with NGINX Proxy Manager Connecting to VM in a VLAN (All This Trouble for Nextcloud Talk)

Hi all,

Iā€™ve been trying to sort this out for well over a week and it is exhausting. I have made over 100 test calls and followed so many guides in German and English - plus I think I found one in Russian.

Firstly, my topology: 2 servers are involved, 1 VPS and 1 VM on Proxmox in a segregated VM. VPS utilises NPM and tailscale to proxy to Nextcloud AIO Docker VM on Apache port - this works perfectly fine. At first, I also used the NPM Stream function to expose 3478 from the VM to the VPS, calls would work but I suspect they were using the tailnet (giving the illusion of the same network) and had quite a few issues with people dropping off the call or audio not being heard for random participants. This is why I have coturn as it seems to be a better solution.

It should be noted that the VM with Nextcloud is within a VLAN with no access to my main lab or the Macbook I am using for testing. I would assume it would just connect over WAN and ignore any local connections as more of our users are also outside the network.

Below is my coturn docker-compose:

services:
    coturn:

        container_name: coturn
        image: coturn/coturn
        network_mode: host
        tmpfs:
          - "/var/lib/coturn"
        command: "--listening-port 3478 --fingerprint --use-auth-secret --static-auth-secret=***** --realm=***** --total-quota=0 --bps-capacity=0 --stale-nonce --no-multicast-peers --verbose --min-port=49100 --max-port=49200 --prometheus"
        restart: always

Aside from some strange 401 forbidden errors in the log, nothing truly stands out as being a problem to me. The WAN IPs of both users are shown and it attempts to connect but no connections. Telnet is successful on 3478 and the UDP range has been opened.

Now in Nextcloud Talk (where isnā€™t there a separate docker container for this?), I have seen some strange log entries like the one below and mDNS resolver. Google unfortunately has not given me anything concrete.

[WARN] [210253568013808] ICE failed for component 1 in stream 1, but let's give it some time... (trickle pending, answer received, alert not set)
[WARN] [4705455986961398] Error resolving mDNS address (ab23f4fe-eeef-4187-a25e-be9f6649d9af.local): Error resolving ā€œab23f4fe-eeef-4187-a25e-be9f6649d9af.localā€: Try again
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)
[WARN] [210253568013808] ICE failed for component 1 in stream 1, but we're still waiting for some info so we don't care... (trickle pending, answer received, alert not set)

Is there an additional step I may be missing? At this point, I was tempted to port forward with OPNSense the port 3478 straight to the VM but I would like to avoid this and have everything go via the VPS.

I did notice during my testing that if I change to my local VLAN the webcam would load during the test call (no loading sign) but the other user would be unable to connect either way. Firewall rules did show that the Nextcloud VM was trying to connect to my laptop (despite being on different VLANs). The other test user is entirely external and on another network and therefore WAN - no connection has been successful here.

Happy to explain further and give more details!

Hey, did I understand correctly that you want to access the turn port through nginx ? This is not going to work, as the turn protocol is not http based.

this sounds like your problem. ICE (interactive connectivity establishment) is method to connect participants of the call through firewall and NAT devices. this message looks like you server can not reach the TURN server - there must be a connection from Talk to coturn server (UDP is best fo media). ideally you expose the coturn server as close as possible to the internet - direct port forward in the router is best. each additional component adds delay which reduce media quality.

good reference in German: ICE, Kandidaten, STUN und TURN

Should have been a bit clearer on the NPM, I use it on the VPS to connect it to the VM running Nextcloud, however I donā€™t use it for coturn. Coturn turns on the VPS in docker and already has a public IP.

I can telnet the public IP on coturn port of 3478 and that works but still the ICE canā€™t communicate with any client aside from the local client if I change the vlan rules.

telnet uses tcp which is not recommended (maybe not possible at all).

Each participant in other words, every user, the Talk server must have ā€œline of sightā€ to the TURN server using udp/3478 - if your network/routing/firewall/VPN prevents such connections media will fail.

Iā€™m having hard time to follow your explanations. maybe you can provide a picture and more config and logs.

Sure, let me phrase it differently.

I have 2 servers, each with a set of functions/services. Server #1 is a Hetzner VPS with a dedicated IP running coturn and Nginx Proxy Manager both in docker. Nginx has no issues and functions well - no issues reported by users. I use Nginx to proxy my second server.

Server #2 runs on a VM on a Proxmox server, it has Nextcloud AIO including the talk functionality. As this VM/server is separated from my usual network with a VLAN it has connections to the VPS a few other tailscale hosts and the internet as a whole.

Regarding firewalls, the VPS aka Server #1 has ports 3478 open for TCP and UDP which are coturn specific (the UDP min and max range stated in the docker command line is also open). Port 443 is open open for Nginx.

As a side note, before I saw that coturn would be a requirement for me, I simply used the NPM stream functionality to stream port 3478 (UDP and TCP) to the VPS. Thinking that it would simply ā€œworkā€ - however users kept getting audio drop-offs and other performance issues hence why I moved to implement coturn.

Also, Server #2 is behind OPNSense firewall if it makes a difference, ISP router is in bridge mode so no double NAT. Not sure if Nextcloud Talk needs additional changes on the internal firewall as all the other functionality works fine.

With this setup I would expect client from the internet have no problems connecting with TURN server living on server #1 - but maybe something prevents your Talk from connecting there? I would analyze the connections one by one and checking if every possible combination works as expected.

browsers have very handy about:webrtc (Firefox) and chrome://webrtc-internals (Chromium-based) resource showing you active connections and candidates so you can easily see the IPs and connection metricsā€¦
Maybe testing procedure described in the docs helps you further?

look at further references provided there as well - i find them very useful:

AiO is supposed to run itā€™s own TURN server - Iā€™m not sure if you configure your new cuturn server (and if AiO follows doesnā€™t fallback to the built-in one upon e.g. restart) you have to troubleshoot everything step by step.

Thanks, Iā€™ve read through these and this is what Iā€™ve found so far - albeit no solution yet.

Firstly I ran the turnutils_uclient command and output looks fine, I donā€™t think coturn is the issue as such. I ran this on the VPS with coturn and also a random VM in the lab (output from VM below).

$ turnutils_uclient -p 3478 -W **KEY** -v -y **DOMAIN**
0: : IPv4. Connected from: **TEST SERVER**:38397
0: : IPv4. Connected from: **TEST SERVER**:38397
0: : IPv4. Connected to: **COTURN SERVER**:3478
0: : allocate sent
0: : allocate response received: 
0: : allocate sent
0: : allocate response received: 
0: : success
0: : IPv4. Received relay addr: **COTURN SERVER**:49118
0: : clnet_allocate: rtv=12284642123838701597
0: : refresh sent
0: : refresh response received: 
0: : success
0: : IPv4. Connected from: **TEST SERVER**:45135
0: : IPv4. Connected to: **COTURN SERVER**:3478
0: : IPv4. Connected from: **TEST SERVER**:34888
0: : IPv4. Connected to: **COTURN SERVER**:3478
0: : IPv4. Connected from: **TEST SERVER**:51758
0: : IPv4. Connected to: **COTURN SERVER**:3478
0: : IPv4. Connected from: **TEST SERVER**:46688
0: : IPv4. Connected to: **COTURN SERVER**:3478
0: : allocate sent
0: : allocate response received: 
0: : allocate sent
0: : allocate response received: 
0: : success
0: : IPv4. Received relay addr: **COTURN SERVER**:49119
0: : clnet_allocate: rtv=0
0: : refresh sent
0: : refresh response received: 
0: : success
0: : allocate sent
0: : allocate response received: 
0: : allocate sent
0: : allocate response received: 
0: : success
0: : IPv4. Received relay addr: **COTURN SERVER**:49190
0: : clnet_allocate: rtv=12387908135660666924
0: : refresh sent
0: : refresh response received: 
0: : success
0: : allocate sent
0: : allocate response received: 
0: : allocate sent
0: : allocate response received: 
0: : success
0: : IPv4. Received relay addr: **COTURN SERVER**:49191
0: : clnet_allocate: rtv=0
0: : refresh sent
0: : refresh response received: 
0: : success
0: : allocate sent
1: : allocate response received: 
1: : allocate sent
1: : allocate response received: 
1: : success
1: : IPv4. Received relay addr: **COTURN SERVER**:49196
1: : clnet_allocate: rtv=672973193270116376
1: : refresh sent
1: : refresh response received: 
1: : success
1: : channel bind sent
1: : cb response received: 
1: : success: 0x655e
1: : channel bind sent
1: : cb response received: 
1: : success: 0x7eff
1: : channel bind sent
1: : cb response received: 
1: : success: 0x7987
1: : channel bind sent
1: : cb response received: 
1: : success: 0x5527
1: : Total connect time is 1
1: : start_mclient: msz=4, tot_send_msgs=0, tot_recv_msgs=0, tot_send_bytes ~ 0, tot_recv_bytes ~ 0
2: : start_mclient: msz=4, tot_send_msgs=0, tot_recv_msgs=0, tot_send_bytes ~ 0, tot_recv_bytes ~ 0
3: : start_mclient: msz=4, tot_send_msgs=5, tot_recv_msgs=5, tot_send_bytes ~ 500, tot_recv_bytes ~ 500
4: : start_mclient: msz=4, tot_send_msgs=7, tot_recv_msgs=6, tot_send_bytes ~ 700, tot_recv_bytes ~ 600
5: : start_mclient: msz=4, tot_send_msgs=15, tot_recv_msgs=15, tot_send_bytes ~ 1500, tot_recv_bytes ~ 1500
6: : start_mclient: msz=4, tot_send_msgs=15, tot_recv_msgs=15, tot_send_bytes ~ 1500, tot_recv_bytes ~ 1500
6: : done, connection 0x71d2a892e010 closed.
6: : done, connection 0x71d2a7fdf010 closed.
6: : done, connection 0x71d2a7fbe010 closed.
6: : done, connection 0x71d2a7f9d010 closed.
6: : start_mclient: tot_send_msgs=20, tot_recv_msgs=20
6: : start_mclient: tot_send_bytes ~ 2000, tot_recv_bytes ~ 2000
6: : Total transmit time is 5
6: : Total lost packets 0 (0.000000%), total send dropped 0 (0.000000%)
6: : Average round trip delay 32.000000 ms; min = 30 ms, max = 43 ms
6: : Average jitter 1.100000 ms; min = 0 ms, max = 11 ms

I found something strange following the: OCA.Talk.SimpleWebRTC.webrtc.config.peerConnectionConfig.iceTransportPolicy = ā€˜relayā€™ activity, I expected it to fail and it did, however the console also turned:

Could not connect to server using backend url https://*****/ocs/v2.php/apps/spreed/api/v3/signaling/backend {id: '1', type: 'error', error: {ā€¦}}

Clicking the link returns an Access denied page with CSRF check failed. Google didnā€™t return any solutions here. Clearing all site info in Chrome didnā€™t indicate a caching issue.

Lastly, webrtc-internals indicates a few things:

  1. ICE Candidate pair: (not connected) is always returned.
  2. connectionstatechange always results in failed
  3. ICE candidate grid only shows output when the NextCloud server (not the VPS, the VPS is just a proxy) and the laptop being used for testing are both on the same local network with no VLAN restrictions. The call still fails are the other peer has no local access to the server.

This leads me to think that the Nextcloud AIO docker image requires some sort of additional step for Talk to function/WAN access that is not super clear.

Any other ideas?

same for me

This exactly provides a proof coturn doesnā€™t work as expected. either one or anther client can not access the TURN server. Connection within same network segment doesnā€™t require any STUN/TURN server and for this reason always work.

Iā€™m not sure it helps but here is an output of a successful call. you can see caller and callee are in different networks and candidates of type host failedā€¦ connection happens using ā€œsrflxā€ candidates which are provided by the STUN server (this is the ā€œoutgoingā€ client ip:port combination as seen from the STUN server perspective). In my case there must be a problem with TURN server as well as there are no relay candidatesā€¦ but this is another story.

you must see srflx and relay candidates if your STUN/TURN server works right. If there are no such candidates the server doesnā€™t work (maybe only for one side)

This is very useful thank you, Iā€™ll focus on coturn then.

Does coturn need port 443? I donā€™t think it does but Iā€™m curious as I hope it isnā€™t clashing with the reverse proxy.

I spend quite a long time making my coturn work as expected so write down the results of my testing :wink:

  • port 443 is not required
  • max-bps= and bps-capacity= and user-quota= if set to a non-zero value always resulted in error 486: Allocation Quota Reached in the log and as result no relay candidates where generated. more investigation is required here

running a test tool like e.g. https://ourcodeworld.com/articles/read/1526/how-to-test-online-whether-a-stun-turn-server-is-working-properly-or-not doesnā€™t work with a shared secret as used by Nextcloud, so use hard-coded user/password combination for testing (#lt-cred-mech + #user=hello:world) - as soon your network works you can switch to a shared config again.

Donā€™t be scared I ended with a mixed config with some settings in compose file and others in turnserver.conf there is no real reason behind it - I just wanted to have ā€œmore dynamicā€ values to be exposed in compose file while things I consider constant remain in turnserver.conf.

turnserver.conf
#verbose
#user-quota=0
#max-bps=0
#bps-capacity=0
max-allocate-lifetime=3600
fingerprint
no-tlsv1
no-tlsv1_1
no-tlsv1_2
#lt-cred-mech
#user=hello:world
no-loopback-peers
no-multicast-peers
docker-compose.yml
services:
  coturn:
    # switch from instrumentisto/coturn to coturn/coturn
    image: coturn/coturn
    container_name: coturn
    restart: unless-stopped
    ports:
      - 3478:3478
      - 3478:3478/udp
      - 50000-50099:50000-50099/udp
    environment:
      - DETECT_EXTERNAL_IP=yes
      - DETECT_RELAY_IP=yes
    command:
      - -n
      - --log-file=/var/turn.log
      - --realm=${COTURN_FQDN}
      - --use-auth-secret
      - --static-auth-secret=${COTURN_SECRET}
      - --verbose
    volumes:
      - ./coturn/:/var/
      - ./turnserver.conf:/etc/coturn/turnserver.conf

With working config a test tool should output some candidates of srflx and relay type - as soon you see them you can assume your TURN server network config works right.

I see more or less following in the cotrun log now (verbose) for each user session:

91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <>: incoming packet message processed, error 401: Unauthorized
91: (14): INFO: session 000000000000000006: new, realm=<turn.mydomain.tld>, username=<1725003225:ch/ObBxxxxxxxwwjm>, lifetime=600
91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet ALLOCATE processed, success
91: (14): INFO: session 000000000000000006: peer 192.168.11.203 lifetime updated: 300
91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet CREATE_PERMISSION processed, success
91: (14): INFO: session 000000000000000006: peer 172.28.0.2 lifetime updated: 300
91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet CREATE_PERMISSION processed, success
91: (14): INFO: session 000000000000000006: peer 172.28.0.2 lifetime updated: 300
91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet CREATE_PERMISSION processed, success
91: (14): INFO: session 000000000000000006: peer 83.xxx.zzz.86 lifetime updated: 300
91: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet CREATE_PERMISSION processed, success
100: (14): INFO: session 000000000000000006: refreshed, realm=<turn.mydomain.tld>, username=<1725003225:ch/ObBxxxxxxxwwjm>, lifetime=0
100: (14): INFO: session 000000000000000006: realm <turn.mydomain.tld> user <1725003225:ch/ObBxxxxxxxwwjm>: incoming packet REFRESH processed, success
100: (14): INFO: session 000000000000000006: TCP socket closed remotely 178.197.223.179:43016
100: (14): INFO: session 000000000000000006: usage: realm=<turn.mydomain.tld>, username=<1725003225:ch/ObBxxxxxxxwwjm>, rp=21, rb=2692, sp=11, sb=1136
100: (14): INFO: session 000000000000000006: peer usage: realm=<turn.mydomain.tld>, username=<1725003225:ch/ObBxxxxxxxwwjm>, rp=4, rb=256, sp=14, sb=1400
100: (14): INFO: session 000000000000000006: closed (2nd stage), user <1725003225:ch/ObBxxxxxxxwwjm> realm <turn.mydomain.tld> origin <>, local 172.28.0.2:3478, remote 178.197.223.179:43016, reason: TCP connection closed by client (callback)
100: (14): INFO: session 000000000000000006: delete: realm=<turn.mydomain.tld>, username=<1725003225:ch/ObBxxxxxxxwwjm>
100: (14): INFO: session 000000000000000006: peer 83.xxx.zzz.86 deleted
100: (14): INFO: session 000000000000000006: peer 172.28.0.2 deleted
100: (14): INFO: session 000000000000000006: peer 192.168.11.203 deleted

look like Nextcloud creates for each client an individual user/password using the shared-secret :slight_smile:

I am using the Nextcloud All-In-One installer (the primary installation method recommended by Nextcloud) and also see this error on a default installation.


Access /ocs/v2.php/apps/spreed/api/v3/signaling/backend

Results in Access forbidden / CSRF check failed