I’ve just finished setting up a cluster across 3 servers with MariaDB Galera and Syncthing for almost instant replication.
It seems to be working very well and I can’t force it to error out on me with large uploads, however if anyone would like to try to break it I’d be most appreciative!
If it remains stable I’ll have a guide written up sometime this coming week.
Links:
Links removed
Username and password: user
These servers are currently HTTP with no cache enabled, just so you’re aware. They’re also v. 11.0.3 simply because that’s what comes down when grabbing latest.zip.
Topology diagram will be available tomorrow if anyone’s interested.
So it’s all well and fine having servers each accessing one database in a remote location, and accessing those servers via individual hostnames, but it’s not exactly a transparent, seamless experience.
So I’ve spun up a couple of haproxy servers to handle the loadbalancing, and now there’s one hostname: http://j.son.bz/nccweb0 and one database hostname for all servers to connect to:
haproxy handles it well, currently favouring one combination of servers unless they go down, then dynamically switching to the other(s) within ~3 seconds. Not too bad.
I’ve got HAProxy configured to preserve sessions normally, so it’ll hit one node and stick to that unless it goes down.
Yesterday in conjunction with running a loadtest I switch it to favour least connected, so with every connection it calculated the node with lest usage and forwarded requests there.
I then started getting the infamous CSRF check failed error when loading the cluster hostname.
I’m aware this is likely a configuration error on my part with PHP so won’t raise an issue, but if anyone has advice on how I can overcome this in a clustered environment I’d be super grateful.
the next thing is, what if user A edits a doc/odt file with collabora or onlyoffice on node A, User B edits that same file on node C? Will they see each other then? If not, then there will be sure a dataloss.
@LukasReschke thank you, I’ll have a look First impressions suggest redis doesn’t do multi-master, so if this is spread across multiple DCs I’d wonder how it’d scale
OK, it looks like session storage is setup, No more errors under load I’ll have to come back to clustering redis somehow though as that won’t scale.
Interestingly I just noted this on the deployment recommendations:
When master fails another slave can become master.
A multi-master setup with Galera cluster is not supported, because we require READ-COMMITTED as transaction isolation level. Galera doesn’t support this with a master-master replication which will lead to deadlocks during uploads of multiple files into one directory for example.
I guess that basically brings this experiment to a close as master-master has been pretty fundamental to what I’m trying to achieve here. Without instant replication to and from any node using Galera, I can’t see any way of having users on nodes throughout the world being able to instantly sync changes back and forth seamlessly…
Furthermore I don’t think redis is going to be suitable; it doesn’t look like it’s designed to be run open on the net which it will need to be since no master-master configuration is supported and running VPN/tunneling isn’t super convenient between nodes.
Note that Nextcloud doesn’t support a master-master on database level, this can lead to nasty problems. We’re usually recommending using something like MaxScale to split read/writes to different databases.
The idea was straightforward - if someone uploads a doc in Amsterdam it updates all databases, so someone in UK sees it on their node instantenously. Similarly UK to Amsterdam. This way a master db can sit alongside each node, they all stay in sync whoever does a write from whatever node and winner-winner chicken dinner.
Doing a master-slave setup would be OK in terms of reads (slave with each node), but for writes the latency will grow the further away it is, obviously…
@LukasReschke what’s the resolution in this case? Don’t say Global Scale (yet).[quote=“guidtz, post:17, topic:12863, full:true”]
Very interesting.
Do you try it with mariadb 10.1 wich integrate master-master gallera ?
And what do you use for data sync ?
Regards
Guidtz
[/quote]
Galera master-master is in place and working fine with 6 web nodes at the moment, each node that does a write is directed via haproxy to any master and the others replicate immediately. It’s beautiful.
Data sync is syncthing, which I’ve now setup to basically replicate the whole nextcloud install folder, not just data. I did this for two reasons:
The nodes are, and should remain identical
I wanted to upgrade to 12 without doing it on each of the 6 nodes manually given the db will only be upgraded once. The upgrade of all 6 nodes was incredible; I left it in maintenance mode while it synced, and had replicated from the upgraded node to all the others within about 3 minutes, before then taking it out of maintenance mode and it all working perfectly.
Good very good. I use also master-master with haproxy in front in another projet and It’s works fine and add a new mariadb master is so simply.
What is “Data sync” ? I try glusterfs, lsyncd, drbd for found a a multidirection data sync. So I want only opensource solutions.
Problem, glusterfs and drbd make storage slower. And lsync is not done to bi-directionnal sync.