[Project] Single-command full HA Nextcloud deployment (MariaDB Galera, Redis Cluster, RustFS S3, Talk HA)

AzureInformatique · June 4, 2026, 1:46pm

Hi everyone,

I’d like to share an open-source project I’ve been building: NXT Maxscale, a fully automated script that deploys a production-grade, high-availability Nextcloud infrastructure with a single command.

What it does

curl -fsSL https://raw.githubusercontent.com/oboeglen/Azure-NXT-Maxscale/main/deploy.sh \
  -o /tmp/deploy.sh && sudo bash /tmp/deploy.sh

That’s it. The script handles everything: OS detection, Docker installation, interactive configuration, SSL certificates, and full deployment with real-time health monitoring.

What gets deployed

Component	HA level
Nextcloud FPM	N nodes behind nginx + HAProxy `leastconn`
MariaDB Galera	N nodes (odd, quorum-safe)
Redis Cluster	≥ 6 nodes (3 masters + 3 replicas)
RustFS S3	Erasure coding — survives N/2 disk failures
Collabora CODE	N nodes, auto-patched binary
Whiteboard	N nodes + dedicated Redis Streams
Talk HA	N spreed-signaling nodes + NATS 3-node cluster + coturn TURN/STUN
Notify Push	Real-time WebSocket notifications
HAProxy	SSL termination, load balancing, `/stats` dashboard

Key design decisions

Talk cross-node signaling — A known race condition in nextcloud-spreed-signaling causes cross-node WebRTC sessions to fail silently. I diagnosed it, filed an upstream issue (#1261), and implemented a retry fix in the gRPC server handler. Also discovered that nats://loopback cannot relay messages between signaling nodes — an external NATS cluster (3 nodes) is mandatory.

PHP-FPM auto-sizing — The default pm.max_children=5 is far too low for concurrent Whiteboard saves (causes 6–10s delays). The script measures actual PHP-FPM PSS via /proc/$pid/smaps (ps_mem approach) and calculates optimal pool settings per node based on available RAM.

Zero-touch deployment — Every secret is generated automatically, DNS is verified before certbot runs, HAProxy backends are dynamically regenerated on every scale operation, and the entire health check phase uses a compact real-time progress bar instead of verbose scrolling output.

Scale without data loss

[1] Quick update   — pull new images
[2] Scale nodes    — add/remove FPM, DB, Redis, Collabora, Signaling nodes
[3] Full redeploy  — regenerate everything from scratch

Node counts, secrets, and configuration are preserved across scale operations. MariaDB requires an odd number for quorum; Redis requires even ≥ 6; everything else scales freely.

GitHub

Repo: GitHub - oboeglen/Azure-NXT-Maxscale: High-availability Nextcloud infrastructure — HAProxy + Galera + Redis Cluster + RustFS S3 + Collabora CODE + Talk HA, deployable with a single command · GitHub
Latest release: v2.6.0
Deployment guide: DEPLOYMENT.md

Feedback, bug reports, and contributions very welcome. I’m especially interested in whether anyone has experience getting the upstream gRPC patch accepted, or alternative approaches to Talk HA cross-node relay.