Multi-Master vs Master-slave replication

Dear community,

I have been reading about high available nextcloud clusters on bare metal by using a galera cluster for database replication. However, I read the following:

A multi-master setup with Galera cluster is not supported, because we require READ-COMMITTED as transaction isolation level. [Galera doesn’t support this with a master-master replication, which will lead to deadlocks during uploads of multiple files into one directory for example.

My question is if this is still true today ? That is the best structure for a high available Nextcloud is a master-slave database architecture instead of a multi-master scenario ?

Cheers!!!

There is ways to accomplish this. Be aware though that the tradeoff might be slower database operations, hence might be the oppsite of what you wish to achieve.

To have Multi-master (true load balancing), you will need to either simultanously write the data to all downstream database masters at the same time - and you will need to lock that until it has been confirmed that the entries exists on all the masters, until a write has been truly completed, or you will need to setup multi directional replication between all masters. Both of these methods will require extra resources and bandwidth at each node respectively, or suffer from worse performance.

PostGreSQL has multi-master resplication in various forms, natively. However even PGSQL warns that performance advantages is very hard to accomplish. When using middleware replication solutions, like Galera, auto increment, guid’s and other unique key generations must be explicitly adressed, or you need to appoint a primary (hence master and the rest slaves).

1 Like

hi @averagejoe welcome to community :handshake:

this community is driven by volunteers and address SOHO usage. Don’t expect to find many experience in this terms here.

From my long professional experience I would recommend against any HA setup until you absolutely have to… and once you really have to you already crossed the line to get professional support. The reason is in short - HA is always complex and makes maintenance and troubleshooting harder, longer and more expensive. In my 20+y professional career I’ve hardly seen scenarios with real need of HA on DB backend… IMHO in 99.9% of cases easier and better to have really big hardware covering your need and good backup concept before you start thinking about HA.

4 Likes

I completely agree with @wwe, but would like to add a few thoughts of my own:

If the level of availability that a single enterprise-grade server already provides (dual power supplies, hot swappable drives in a fault-tolerant array, etc.) is really not enough for you, then it might make more sense to implement HA with a hypervisor like ProxmoxVE in combination with a distributed file system such as ceph:

But be aware that if you really want to go down this route, you’ll need at least three enterprise-grade servers, preferably with multiple NVME’s in each, and a 10Gbit network connection, better 25Gbit.

If that is out of your budget, a single entry-level enterprise-class server with redundant power supplies and multiple disks in a ZFS array, plus e.g. hourly ZFS snapshots, and replicating them to a NAS and a good backup strategy should be more than enough.

Oh, and just in case you’re using consumer hardware and the real reason you want to cluster your software stack is to overcome some of the shortcomings of the hardware you’re using, don’t do it; it will end in disaster sooner or later.

3 Likes

Thank you all!!! @wwe @bb77 @Kerasit

Appreciate all your replies! As @bb77 noted I do have a setup today as you describe. On top of that I have an offsite backup that is split in backing up the VM and the /data folder. The /data folder is bidirectional replicated and is connected to the VM via an NFS share. Hence, in the case of my house going down I can quickly recreate the VM on the remote location since I don’t need to recreate the /data folder. However, I would prefer a system that does this automatically instead of manually needing to reconfigure the IP addresses and the backup. Hence, I was interested in an HA setup that is geographically spread.

Now, the reason I am asking is because I want to learn how to make geographically spread HA applications. The setup that I have in mind to create geographically spread HA applications is the following.

  1. At location A a Nextcloud VM proxmox HA cluster (with Ceph) consisting of 3 nodes. This node cluster will be the write cluster
  2. At location B another Nextcloud VM that is bidirectional replicated from A (x km away from A)
  3. At location C another Nextcloud VM that is bidirectional replicated from A (y km away) from A)
  4. GeoDNS by using PowerDNS for GeoIP routing.

The idea is that traffic gets routed to the geographical nearest server and when one location goes down they are routed to the left over nodes that are still alive.

I don’t have experience with this yet and I hope you could give me some example scenarios why you would not want this ? In my humble opinion I think that if one node goes down e.g location A then the cluster will just stop replicating to that node and continue on location B and C. Then to recover node A you just recover from where the most up to date node is at. But then again the question how do commercial VPS providers do this ? Don’t they use a similar approach ?

Cheers!!

I’d say they don’t. Traditional VPS offerings don’t provide an out-of-the-box solution for HA across multiple sites. After all, a VPS is just a VM with an operating system of your choice and that’s it. To achieve what you want, you would need to spin up a number of identical VPSs spread across your VPS provider’s various data centres and then build everything from scratch, just as you would with physical hardware in different locations.

Of course, there are also various SaaS providers that offer disributed databases, load balancers, DNS, etc., or you could use one of the big ones like AWS and Azure, where you have everything under one roof and can mix and match their services as you like.

However, if you are using a distributed database from a SaaS provider for example, you are obviously not going to learn how to build a distributed database from scratch, because the provider has already done the heavy lifting for you, and the hardest part to learn is probably how to keep the costs under control :wink: So in the end it depends on what you specifically want to learn and, of course, your budget.

That beeing said, I am not an expert and cannot give you more specific advice on how best to build a highly available and geographically distributed Nextcloud. I know it can be done, but that’s pretty much where my knowledge ends :wink:

Thank you @bb77,

I know indeed that I can just use a commercial provider for all of my problems. However, I want to learn how I can do the items that you listed myself without using commercial providers. Otherwise as you stated you don’t learn anything.

Thank you !!!

It requires litteraly “just” 4 Raspberry PI’s to build the needed lab setup that will enable you to learn. So if this is for learning, that should be enough.

However you describes various geographical cases as well. This requires
litteral geographically spread nodes. How else would you make decisions about balancing based on response times?

I say go for it. Galera Cluster is the right apporach and there are good examples out there for getting it right. There are even a few articles to be found on this very forum.

If you would consider another database product, give PostGreSQL a shot. It actually does have master-slave, master-master and even middleware component support:

1 Like

Nextcloud has global scale:

I don’t know how much this covers high availability.

I found this one - which is actually the best solution I have seen so far:

1 Like

I had a look indeed @Kerasit and I think indeed it is a good plan. I tried it with syncthing instead of GlusterFS and this seems to work fine. The only thing I am right now struggling with is how to host my own GSLB (HAproxy for instance) ? Imagine I host a GSLB at location A and a replicated server on location C from A. Then person P makes a requests to website W from location C. Then P’s request will first travel to A then to C then back to A and then back to person P at location C. Shouldn’t the GSLB make a direct connection between person P and C instead of traveling back and forth to A ? Thank you!!

I am using HAProxy for everything reverse proxy - including load balancing.
I have not tested your particular scenario. However it is much more complex.

You will need to somehow determine - in the time of the request that hits your load balancer (which is located one place) - from which region (IP geolocation) the request is comming from, and then choose which downstream should handle it. This will never truly work.

1:
If you have 1 domian, it needs 1 public IP. That one will be “fixed” to a geographical location already. There IS incredibly expensive DNS providers that can deliver enterprise DNS solutions, with DNS servers all over the world, who can point trafic to different IP adresses for the same domain, based on geographical location of the requestor.

2:
So you will have 1 load balncer in front. Do you have further load balancers on each location? If not, then the trafic will flow this way regardles:

Client (Asia) → Load Balancer (Europe) → “Cluster ASIA”.

Or

Client (Europe) → Load Balancer (Europe) → “Cluster Europe”.

At the end of the day then you are actually making the performance worse, UNLESS, you are also serving this for INTERNAL trafic (on LAN) on each location. In that scenario, it make perfect sense. However then all you need is a local DNS that points to a local Reverse Proxy, which has as first priority, the local instance and then the “rest” of the sites as random secondary priority if the local is down.

So you will have 1 load balncer in front. Do you have further load balancers on each location? If not, then the trafic will flow this way regardles:

Client (Asia) → Load Balancer (Europe) → “Cluster ASIA”.

My question is: will the traffic then stay routed trough LB Europe or will it have a direct local connection to the ASIA cluster after the LB directed it to the ASIA cluster. In my mind the traffic goes like this:

Client (Asia) → Load Balancer (Europe) → “Cluster ASIA” → Load Balancer (Europe) → Client (Asia).

Or is this not the case ? Then it would travel constantly via Europe which would destroy the purpose of geolocation.

Thanks!!!

Yes. The trafic will stay in same “lane”. So the reverse proxy that serves the request will also respond. So yes. That assumption is correct. :slight_smile:

1 Like

So then a GSLB is useless right ? it only makes the system less efficient in the end ? I am looking for a solution that connects the user to the nearest server directly. I think maybe powerdns can do something like that ? forwarding requests to the IP that is closest to the clients location ?

That solution can work if using one of the big cloud vendors, as that is one of their entire selling points of “regions”. Best solution would be to have an url for each region, and then have a simple webserver with a redirect based on geolocation.

Unless, you are willing to buy global “virtual” IP solutions at vendors like akamai ofcourse. :frowning:

So yes, there is no easy solution for one domain.

and then have a simple webserver with a redirect based on geolocation

What do you mean with this ? 2 URL’s are just two different IP addresses right ? you mean one url for Europe and one URL for America for instance

Excatly what I meant.