Nextcloud site replication

Hi,

I’m looking into setting up a Nextcloud configuration replicated between two sites. HAProxy’s only purpose would be to redirect a user to the other site should there be any health issue on the local site. This being achieved by:

  • Application layer synchronisation via Syncthing

  • Database Layer synchronisation via Galera

  • Storage Layer synchronisation via Syncthing

However, if only the session state data is not replicated, which would require a user to login again after being redirected, would this setup avert the READ-COMMITTED isolation issue last addressed by @DasLeo in @JasonBayton 's thread “Help me test this 3 node cluster” - Help me test this 3 node cluster

As such, I’d appreciate any caveats and/or advice from the old timers such as @guidtz @dev0 @tsueri @joergschulz @Krischan @Heracles31@zeigerpuppy and @aventrax .

Hi,

Honestly, I would not try to achieve Active - Active redundancy without the professional and official support for it. First, if indeed your need is that critical, it is evidence that you need to have professional support. Second, the risk to corrupt something is much higher.

As for file replication, my storage is based on ZFS (TrueNAS) and ZFS is the mechanism I use to synchronise multiple backends.

For the database, I would do native database replication configure a first MySQL instance as Master and the second as a Slave.

As for the applications, that will highly depend of what apps you are using and how they work. For example, you can easily point two mail apps to the same IMAP account. For an OnlyOffice backend, you can have 2 active at once but a workload will not move from one to the other. As for another example, to have two instances of Talk would problematic because by definition, multiple users must interact through that server. If they are on two different servers, there will be issues…

Start by having a proper and complete DR solution : complete backup of data, configs, database, apps, … and be able to restore them from scratch.

Once that is working, go for a warm site : the same mechanism used for DR is deployed to a specific site with a higher frequency. In case of trouble, you can recover faster and with a higher precision.

Once that is achieve, you go for Active - Passive : two setups online but only one is active and the second one is updated in realtime. When needed, you do a manual failover.

Up to that I would try to work it myself. But an Active - Active is a completely different game and that I would not do without professional support.