Connection issue with Talk high performance backend (on-prem)

Support intro

Sorry to hear you’re facing problems :slightly_frowning_face:

help.nextcloud.com is for home/non-enterprise users. If you’re running a business, paid support can be accessed via portal.nextcloud.com where we can ensure your business keeps running smoothly.

In order to help you as quickly as possible, before clicking Create Topic please provide as much of the below as you can. Feel free to use a pastebin service for logs, otherwise either indent short log examples with four spaces:

example

Or for longer, use three backticks above and below the code snippet:

longer
example
here

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can :heart:

Some useful links to gather information about your Nextcloud Talk installation:
Information about Signaling server: /index.php/index.php/settings/admin/talk#signaling_server
Information about TURN server: /index.php/settings/admin/talk#turn_server
Information about STUN server: /index.php/settings/admin/talk#stun_server

Nextcloud version (eg, 24.0.1): 31.0.9
Talk Server version (eg, 14.0.2): 21.1.5
Custom Signaling server configured: yes version 2.0.4~docker
Custom TURN server configured: yes
Custom STUN server configured: yes

In case the web version of Nextcloud Talk is involved:
Operating system (eg, Windows/Ubuntu/…): Win
Browser name and version (eg, Chrome v101): Firefox/Edge/…

In case mobile Nextcloud Talk apps are involved:
Mobile was not involved as far as I know.

The issue you are facing:

I tried to setup NC talk with the high performance backend (HPBE). I had it running without HPBE and faced performance issues at 4~5 participants that were not able to join a room. Loading time grew with every participant without any significant load on the server or the TURN server (neither CPU-wise, nor RAM-wise, nor network-wise).

With installing the HPBE, it became apparently better but not yet good. We had yesterday a meeting with 4~5 persons and it worked… okay. For the 4 participants it worked but with every person joining it became more laggy: The NC was still running smoothly but the joining (I suspect the WebRTC part with HPBE/Janus) took longer with each participant. The last user took 60 seconds to join the meeting.

Once connected, the session went well as far as I can tell. No audio glitches and video was present (somewhat blurry on individual participants but this could be their networking connection).

I have still the feeling that my setup is quite brittle.

  • I had the call summary bot active in the room. Suddenly (for no apparent reason), the call summary was printed in the middle of the call. No one has quit, joined, or whatever.
  • After the incident with the bot, one user had audio issues and wanted to reload/rejoin. He was not able to join (as a guest) as the web frontend indicated there is no call ongoing (join icon was grayed out).
  • He logged in then and started a new call in the same room. He was alone (both audio and video, just showing the waiting for participants screen) but the list of participants showed in the right side the other 4 people as in the same call. We were talking in the very same room at that time.

All in all, I have the impression that the setup I have is not setup correctly and some fallback to the non-HPBE is triggered or that the HPBE somehow gets out of sync with the NC Talk state. I have no real clue on what is going on but yeah, that is why am here. I call it the frontend got out of sync.

We tried the setup prior with only 2 participants (plus some guests via mobile phone on remote networks like UMTS) and saw similar behavior. I thought the problem at that time was the networking capacity of the UMTS network.

So long story short:
I want to debug the issue of a brittle talk setup to have a stable and working video conferencing solution (for our small use case). Ideally, I want the solution to run smoothly.

Is this the first time you’ve seen this error? (Y/N):

Sort of yes: First time I use the self-hosted HPBE on a larger (4~5 participants) meeting.

Steps to replicate it:

Unfortunately, I do not know how to replicate it. I have seen it now twice on the instance.

Logging

In general, I have no clue on where to look and the logs got quite lengthy. So, I cannot effectively filter out for what to look exactly. Therefore, I will not paste them verbosely here but instead post a link to a gist

As there is personal data in this, I might decide to remove the logs in case this issue is solved.

The output of your Nextcloud log in Admin > Logging or errors in nextcloud.log in /var/www/:

See gist

The output of your Apache/nginx/system log in /var/log/____:

My apache log is full of errors of other pages at that time (a script kiddie tried to get in as it looks). Instead, I looked for issues in access_log.
I put them in the gist as well. There are filtered files for convenience as well (1, 2).

Your browser log if relevant (javascript console log, network log, etc.):

See gist

Additional information

I have separated the HPBE from the NC server. I know the HPBE can get greedy in terms of resources and I am not conformant with the 8GB of RAM for example. However, I looked on the output og htop, iotop, and iftop while the session was open. There was no high impact visible that could explain a reduced performance.

Config of HPBE

I want to avoid search engines from finding my turn server (I got enough spam traffic there, yet). I will write TURNCW in the config down as redaction. The real address is the subdomain turn of christian-wolf.click. Similarly, HPBCW is the subdomain hpb.

Docker-compose file:

networks:
  default: {}
  traefik:
    external: true
    name: traefik

volumes:
  coturn: {}

services:
  nats:
    image: nats
    restart: unless-stopped
  
  janus:
    build: ./janus
    restart: unless-stopped
    volumes:
      - ./janus.jcfg:/etc/janus/janus.jcfg:ro
  
  coturn:
    image: coturn/coturn
    restart: unless-stopped
    network_mode: host
    # ports:
    #   - 3478:3478
    #   - 3478:3478/udp
    #   - 5349:5349
    #   - 5349:5349/udp
    #   - 50000-55000:50000-55000/udp
    volumes:
      - coturn:/var/lib/coturn
      - ./coturn.conf:/coturn.conf:ro
    command:
      - -c
      - /coturn.conf
  
  hpb:
    image: strukturag/nextcloud-spreed-signaling
    restart: unless-stopped
    volumes:
      - ./server.conf:/config/server.conf:ro
    ports:
      - 127.0.0.1:8081:8080
    depends_on:
      - nats
      - janus
    networks:
      - default
      - traefik
    labels:
      - com.centurylinklabs.watchtower.enable=true
      - traefik.enable=true
      - traefik.http.routers.hpb.rule=Host(`HPBCW`)

Coturn config (redacted):

min-port=50000
max-port=55000
listening-port=3478
fingerprint
#lt-cred-mech # Only on coTURN below v4.5.0.8!
use-auth-secret
static-auth-secret=Y<REDACTED>
realm=TURNCW
total-quota=0
bps-capacity=0
stale-nonce
#no-loopback-peers # Only on coTURN below v4.5.1.0!
no-multicast-peers

HPBE server config (redacted):

[http]
listen = :8080

#[https]
#certificate = /etc/nginx/ssl/server.crt
#key = /etc/nginx/ssl/server.key

[app]
debug = false

[sessions]
hashkey = f<REDACTED>
blockkey = 9<REDACTED>

[clients]
internalsecret = 2<REDACTED>

[backend]
backends = backend_tsc,backend_rur,backend_slt
allowall = false
# secret = <nextcloud-secret-key>
timeout = 10
connectionsperhost = 8

[backend_tsc]
url = https://cloud.tsc-vfl.de
secret = 3<REDACTED>

[backend_rur]
url = <REDACTED>
secret = <REDACTED>

[backend_slt]
url = <REDACTED>
secret = <REDACTED>


[nats]
url = nats://nats:4222

[mcu]
type = janus
url = ws://janus:8188

[turn]
apikey = u<REDACTED>
secret = Y<REDACTED>
# I add some spaces here to avoid search engines to find my turn server too easily, no spaces in the right hand side here
servers = turn:TURNCW:3478?transport=udp, turn:TURNCW:3478?transport=tcp

[geoip]

[geoip-overrides]

[stats]

Janus config (redacted):


general: {
        configs_folder = "/etc/janus"                   # Configuration files folder
        plugins_folder = "/usr/lib/x86_64-linux-gnu/janus/plugins"                      # Plugins folder
        transports_folder = "/usr/lib/x86_64-linux-gnu/janus/transports"        # Transports folder
        events_folder = "/usr/lib/x86_64-linux-gnu/janus/events"                        # Event handlers folder
        loggers_folder = "/usr/lib/x86_64-linux-gnu/janus/loggers"                      # External loggers folder
        debug_level = 4                                                 # Debug/logging level, valid values are 0-7
        admin_secret = "janusoverlord"  # String that all Janus requests must contain
        protected_folders = [
                "/bin",
                "/boot",
                "/dev",
                "/etc",
                "/initrd",
                "/lib",
                "/lib32",
                "/lib64",
                "/proc",
                "/sbin",
                "/sys",
                "/usr",
                "/var",
                "/opt/janus/bin",
                "/opt/janus/etc",
                "/opt/janus/include",
                "/opt/janus/lib",
                "/opt/janus/lib32",
                "/opt/janus/lib64",
                "/opt/janus/sbin"
        ]
}
certificates: {
}
media: {
}
nat: {
        stun_server = "TURNCW"
        stun_port = 3478
        nice_debug = false
        full_trickle = true
        turn_server = "TURNCW"
        turn_port = 3478
        turn_type = "udp"
        turn_rest_api_key = "u<REDACTED>"
        ice_ignore_list = "vmnet"
}
plugins: {
}
transports: {
        disable = "libjanus_rabbitmq.so,libjanus_mqtt.so,libjanus_pfunix.so"
}
loggers: {
}
events: {
}

Logging of HPBE

I looked parallel on the logs of the HPBE (hpbe server/coturn/janus). Unfortunately, I had to copy from tmux, there is no timestamp. As it is lengthy as well, I put it on the gist as well.

I found a few suspicious entries in the logs. Some I think are uncritical, but I do not know if they indicate a direction:

Interesting log entries I identified
Wrong TURN IP address
janus-1   | [WARN] Could not set TURN server, is the address correct? (TURNCWIP:3478)

Here, the TURNCWIP is the IPv4 address of the TURNCW (and HPBCW) machine.

Adding of remote candidate failed
janus-1   | [WARN] [8997184525103053] Failed to add some remote candidates (added 0, expected 1)
Janus could not send package
janus-1   | [ERR] [ice.c:janus_ice_outgoing_stats_handle:4374] [1540167836497906] Got 1 SRTP/SRTCP errors in the last few seconds (last error: srtp_err_status_replay_fail)

This seems rather normal although marked as ERR.

HPBE cannot reach NC server?
hpb-1     | room_ping.go:176: Error sending combined ping session entries [{UserId:franziska.schnell SessionId:IPvAc8S/yh0fHan3N8gqUUX7lHHTr2Elt8lWzGBoNJPIpFxzQaLl+J7qM8Hs1IE5X0VUp4Q+9hQVATvuURRpzQNbymh1tAWiNYYo2oqFXqcwGmaoihGb5invfrZ6
WvUFbiDi1ou+atSfzT51TCfSVR2Pvx7uauoW4xPrj6RUNZWJOfjcuQShJrbHmuidrZZurS+WlVougjJX5OeQ/eEYMSw4frqzH1JGFF06Cw1TJoKUndMWFkTT5S+Bwpj71hb} {UserId:stefan.dietl SessionId:CgJgWTUKK+MG5w0POdpe5bU5+dDGt82kcwcvlQ5K84m4am02yb2EWS6CIdkeS4MuB5gn4Dh
Pf2dbDLrpn2ZpnkYydLEHglSAG/FFg8+PsGxRGqUYeb9uWiptzsE5yk8mhRF1E8mWif1/klKiP2wvKeRuOLXT9bPOp+Yg0gtDKvbAkpV/Krtvt9iWt9gQfYSWCMmT7t7mQKsfm+Qd8fhPnoWqcTkuamxL5c1oP+G3ZDGcVq2fQKxAzLj0qvPC6om} {UserId: SessionId:cIynbxb+l7r1OTkP78vk47G8O4DAta
xHS+bbo+o670hdKnhbcH779lmCrsYDTKa3mY2rP/M4bzftXZWTCl75M9ATgdmSm5saopJ6EC7P2zMoNj30nQkA4vGJKvjlg4+EmXOo+cgBTN6eNO5tmwEMatksQHzLMMg6cDN47gEwNRlovhQa0l7cfrOZqNnKYfoyQlhJyESzSX2EXjg3xQGguIjKUx+VDQcNYRf877Nzqqh1S5j7LWuaxwJ2oaysqH4} {UserId:
christianwolf SessionId:CIJbo8nC6qj3aesnfVwv0rCd08MAiI92vE/Lma7pUDr0aK9q69214weeGxDymJkIv9KcN2syp8ponL/rPb1RyCY0vD+g995JIS8lQdViVX1Tx4n1/NeMnrWfFMFh4kATD9oJnX3+tjI9/spgT/XK+0oRvYS1rTD8uGxnxx634Szp32KR035adByyOvioJZs4l6Qg334WD8u7V8U1qKX
nKougDtZFru+howuwVp554f9Rm61RJWLVttbd8ETFEn7} {UserId: SessionId:py0T47YGIIf6fkfNnq8aaoPvgc8YN2W9UBaZntfa/e2NLVXIYqT8B8KJfII3tKoHco2J15a6SiatnVue5Zwb2JVdIn95BZc075G+v3w/hENC0xQKZAU5298OwBz73xDee6pL7hZG1x5wsXmUpIGF+sjgJ9g+rBRIqxnUCPCary
tJyWs7fu1jMDB/wo58FeZoRxFeWdiRTLrsJzXzfPsLJVCPrZh/LYao8tbG3jHAUiHs5PaoX7RI8PktOkDs1sG}] to https://cloud.tsc-vfl.de/ocs/v2.php/apps/spreed/api/v1/signaling/backend: Post "https://cloud.tsc-vfl.de/ocs/v2.php/apps/spreed/api/v3/signaling
/backend": context deadline exceeded
User not in the meeting

See e.g. line 1565ff

hpb-1     | backend_server.go:444: User map[actorId:alexander.kieper actorType:users displayName:Alexander Kieper inCall:0 lastPing:0 participantPermissions:1 participantType:3 sessionId:0 userId:alexander.kieper] is not in the
meeting, ignoring

I haven’t read through everything, but that seems suspicious at least. Can you double check the URL with the webserver logs? Also check if the signaling server was BFP by accident by NC.

The last user took 60 seconds to join the meeting.

That does not sound right. Are enough workers available on the webserver? Also check Server system requirements - Nextcloud Talk API documentation, it could have weird side effects.

Edit: Missed a “not” when writing

Just to be 100% precise: I checked the webserver logs on the NC server if there was a request found, correct?

There are a bunch of these lines all returning 200 as HTTP status result.

cloud.tsc-vfl.de 217.154.228.83 - - [20/Oct/2025:18:04:21 +0200] "POST /ocs/v2.php/apps/spreed/api/v3/signaling/backend HTTP/1.1" 200 106 "-" "nextcloud-spreed-signaling/2.0.4~docker"

Aaah, sorry, what is BFP?

Uuhm, I do not know. I use the docker image nextcloud:31-fpm-alpine as a basis, currently. Although, I am considering migrating to nextcloud:31 for simpler config. I did not alter the defaults.
I know that there were no significant other loads at that time (no other talk session), so I suspect most were available (even if there is a certain background noise).

I am on PHP-FPM + mpm_events, currently. I did not look into the wasm or tflite stuff. I see at least no errors in the browser logs.

I see there (now) some errors bu these could be from recently as I kept the browser window open. So could be a red herring.

Also, I do not have TURN on port 443 but no user is connecting from a highly protected environment.

Bruteforce Protection

Nop, there are no attempts listed for the IP.