[Project] Single-command full HA Nextcloud deployment (MariaDB Galera, Redis Cluster, RustFS S3, Talk HA)

Hi everyone,

I’d like to share an open-source project I’ve been building: NXT Maxscale, a fully automated script that deploys a production-grade, high-availability Nextcloud infrastructure with a single command.


What it does

curl -fsSL https://raw.githubusercontent.com/oboeglen/Azure-NXT-Maxscale/main/deploy.sh \
  -o /tmp/deploy.sh && sudo bash /tmp/deploy.sh

That’s it. The script handles everything: OS detection, Docker installation, interactive configuration, SSL certificates, and full deployment with real-time health monitoring.


What gets deployed

Component HA level
Nextcloud FPM N nodes behind nginx + HAProxy leastconn
MariaDB Galera N nodes (odd, quorum-safe)
Redis Cluster ≥ 6 nodes (3 masters + 3 replicas)
RustFS S3 Erasure coding — survives N/2 disk failures
Collabora CODE N nodes, auto-patched binary
Whiteboard N nodes + dedicated Redis Streams
Talk HA N spreed-signaling nodes + NATS 3-node cluster + coturn TURN/STUN
Notify Push Real-time WebSocket notifications
HAProxy SSL termination, load balancing, /stats dashboard

Key design decisions

Talk cross-node signaling — A known race condition in nextcloud-spreed-signaling causes cross-node WebRTC sessions to fail silently. I diagnosed it, filed an upstream issue (#1261), and implemented a retry fix in the gRPC server handler. Also discovered that nats://loopback cannot relay messages between signaling nodes — an external NATS cluster (3 nodes) is mandatory.

PHP-FPM auto-sizing — The default pm.max_children=5 is far too low for concurrent Whiteboard saves (causes 6–10s delays). The script measures actual PHP-FPM PSS via /proc/$pid/smaps (ps_mem approach) and calculates optimal pool settings per node based on available RAM.

Zero-touch deployment — Every secret is generated automatically, DNS is verified before certbot runs, HAProxy backends are dynamically regenerated on every scale operation, and the entire health check phase uses a compact real-time progress bar instead of verbose scrolling output.


Scale without data loss

[1] Quick update   — pull new images
[2] Scale nodes    — add/remove FPM, DB, Redis, Collabora, Signaling nodes
[3] Full redeploy  — regenerate everything from scratch

Node counts, secrets, and configuration are preserved across scale operations. MariaDB requires an odd number for quorum; Redis requires even ≥ 6; everything else scales freely.


GitHub

Feedback, bug reports, and contributions very welcome. I’m especially interested in whether anyone has experience getting the upstream gRPC patch accepted, or alternative approaches to Talk HA cross-node relay.