Make a privat Nextcloud high available

Hello,

i have already experiences with install a standalone Nextcloud. But now i will run a HA-Nextcloud on 2 different locations, with 2 internet connections running on two RPIs. If one node have no internet-connect or no power in a case of a blackout, so the other have to do the job automatically. I cant imagine that this constallation is so exotic, that it gives no how-to for that.

Do i need DRBD, Pacemaker, Heartbeat and a Reverse-Proxy? And how do i configurate all of this?

Please help

It is certainly something that would be great to have. Unfortunately, it is not that easy, to run two instances is no problem, the question is what do you do after both went out of sync? What if a file has been changed on both setups, what if some database ids are the same but with different entries?

You have a similar problem with the synchronization of the clients but there the server wins.

For larger setups, there were some deployment recommendations in the documentation:
https://docs.nextcloud.com/server/11/admin_manual/installation/deployment_recommendations.html

But there you have more than just 2 servers and it is not to sync over two locations and often you still have a single point of failure.

There was some announcement last year or so to have multiple locations. I found a repository (GitHub - nextcloud/globalsiteselector: The Nextcloud Portal allows you to run multiple small Nextcloud instances and redirect users to the right server) but no recent news. And since such stuff is very interesting for enterprise customers, there are probably less public information available.

I might be wrong, but having run ha servers for a while, i never had such a situation, simply because good tight server with specific conf are built from ground to prevent that.

You might wanna share how you setup this?

Sharing in details, no, because i dont have all the conf and source. Furthermore, as my company paid for it, the management team choose to use Red Hat Enterprise Linux 6+ build as clusters with the High Availability Add On and Resilient Storage Add On, using GFS2 witch, unfortunatly are not free.

the big picture is about setting Policies for RHEL High Availability Clusters, using different steps:

  • Red Hat’s OpenStack build in cluster mode
  • Deploying Ceph Storage environment and integrating with OpenStac
  • Specific Network configuration using Virtual Extensible Local Area Network (VXLAN) network type and Modular Layer 2 (ML2) plugin.
  • Setting specifics environment using Pacemaker and HAProxy.
  • CloudForms implementation across differents data centers
  • Creation of a High Availability and Resilient Storage with corosync using GFS2

This kind of set-up are out the the public scope, as they requieres:

  • licenses fee ( quiet expensives ones )
  • metal bare hardware
  • data center hosting
  • Specific real high speed links
  • KNOWLEDGE or external help ( quiet expensives ones too !!! )

When all this is done, you end’s up with a HA server’s network limited to a max of 16 nodes.

Unfortunatly, there’s still some limitations like (from my “pense-bete notes”:

Overall architecture
Oracle RAC on GFS2 is unsupported on all versions of RHEL
Staged/rolling upgrades between any major release is not supported. For example, a rolling upgrade of Red Hat Enterprise Linux 5 to Red Hat Enterprise Linux 6.

Hardware
Cluster node count greater than 16 is unsupported

Storage
Usage of MD RAID for cluster storage is unsupported
Snapshotting of clustered logical volumes is unsupported unless that volume has been activated exclusively on one node (as of release lvm2-2.02.84-1.el5 in RHEL 5.7 or lvm2-2.02.83-3.el6 in RHEL 6.1)
Using multiple SAN devices to mirror GFS/GFS2 or clustered logical volumes across different subsets of the cluster nodes is unsupported

Networking
Corosync using broadcast instead of multicast in RHEL 6 is unsupported (except for demo and pre-sales engagements)
In RHEL 5.6+ broadcast mode is supported with certain restrictions as an alternative to multicast.
In RHEL 6.2+ UDP unicast is is fully supported as an alternative to multicast.
Corosync’s Redundant Ring Protocol is a Technology Preview in RHEL 6.0 - 6.3, it because fully supported on RHEL 6.4+ as described in the following article.
The supported limits for the heartbeat token timeout are described in the following reference: What are the supported limits for heartbeat token timeout in Red Hat Cluster Suite?

High Availability Resources
Usage of NFS in an active/active configuration on top of either GFS or GFS2 is unsupported
Usage of NFS and Samba on top of same GFS/GFS2 instance is unsupported
Running Red Hat High Availability Add-On or clusters on virtualized guests his limited support.

The conclusion of this: HA is very high end in terms of hardware, the overall cost is pretty important, the IT management knowledge is quiet challenging !

But if you have two locations and the connection between them is going down, each one can have different changes. Or the internet connection between the nodes is slower than to some clients, due to time delay, there can be problems.

not with HA, as one node only is considered to be the master node.

When the down-node come back, it stay as a secondary node. It doesn not come back as a primary. But at this moment, a full replication of data is made, so it become a updated -secondary, ready to become a primary node if the active primary node is going down.

There is a différence betwenn replicate nodes and HA nodes.

This doesn’t resolve @tflidd’s comment about a split-brain, should the connection between both nodes go down. HA across different locations is really tough and usually requires redundant and independant network connections in order to avoid a split-brain situation. I don’t think, that this is a viable way to tackle this.

If HA is a pritority, do it locally and start off with not using RPIs as your nodes. And consider the costs in doing so… power, hardware, labour (continously of course, since a HA system means a lot of regular work !).

And now…consider again…

With the red-hat HA servers, you can set up a maximum of 16 nodes.

Usually, each node is host in a tiers 2-3 datacenter with different geographical location.

My company have 3 node HA, hosted in tiers 3 datacenters (LUX, TWN, US). Over 3 Years, we have experienced 1 hardware failure on one node. Having 3 HA node going down at the same time is call the end of the world, or a terrorist act.

The split-brain is for sure a real problem, but not in real HA environment with over 3 nodes.

Furthermore, the AFR translator in glusterFS makes use of extended attributes to keep track of the operations on a file. These attributes determine which brick is the source and which brick is the sink for a file that require healing in case of data or availability inconsistencies originating from the maintenance of two separate data sets because of servers in a network design, or a failure condition based on servers not communicating and synchronizing their data to each other…

Remember, we are not in a replication scenario, but real HA server.

forget to say ther is a way to prevent split-brain problems witch is to configure server-side and client-side quorum.

The quorum configuration in a trusted storage pool determines the number of server failures that the trusted storage pool can sustain. On a 16 nodes, you determines than 10 nodes is the maximum.

If an additional failure occurs, the trusted storage pool will become unavailable, preventing any data lose or split-brained files. The trusted storage pool will remain save in order to rebuild the down nodes.

This is managed by the glusterd service.

gluster volume set all cluster.server-quorum-ratio PERCENTAGE

In case of a 2 nodes pool, PERCENTAGE need to be superior at 50, witch gave:
gluster volume set all cluster.server-quorum-ratio 51%

In this example, the quorum ratio setting of 51% means that more than half of the nodes in the trusted storage pool must be online and have network connectivity between them at any given time. If a network disconnect happens to the storage pool, then the bricks running on those nodes are stopped to prevent further writes.

exemple of a glusterFS secure pool:

gluster volume info testvol
Volume Name: redacted
Type: Distributed-Replicate
Volume ID: redacted
Status: Created
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: server1:/bricks/brick1
Brick2: server2:/bricks/brick2
Brick3: server3:/bricks/brick3
Brick4: server4:/bricks/brick4

… until brick9

and the AFR outpout is like:

getfattr -d -e hex -m. brick5/advchck.log 
\#file: brick5/advchck.log
security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-2=0x000000000000000000000000
trusted.afr.vol-client-3=0x000000000200000000000000
trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b

Brick             |    Replica set        |    Brick subvolume index
----------------------------------------------------------------------------
-/gfs/brick1    |       0               |       0
-/gfs/brick2     |       0               |       1
-/gfs/brick3     |       1               |       2
-/gfs/brick4     |       1               |       3
-/gfs/brick5     |       2               |       4
-/gfs/brick6     |       2               |       5

and so on

Didn’t mean to downplay, but…

…consider the topic of this thread: “Make a privat Nextcloud high available”

We also operate two HA clusters on our company network, spanning to locations, but we’re talking private use here… Heck, the OP taled about RPis for running his NC on - hardly a setup which qualifies for any serious HA installation.

1 Like

git-annex can manage different sources and locations and is quite tolerant if they are only temporarily available (you can even put it on usb keys and keep them as backup). I never really used it, so I don’t know how stable it is and how easy you can manage conflicts.

One simple and cheap way would be to be able to synchronise the same client two servers.
I.E.: The client has a directory to synchronize, and itsyncs independently the directory with both servers.
Thus the data is duplicated.
If the client also provides for sharing on both servers… that’s good.

Naturally the user name aspect is a manual aspect.
But for very small family it should be OK.

Developing such a client is neither simple nor cheap.

I don’t understand why it should not be cheap:

I understood that the client keeps track of what has been synchronized in a database which seems to be at the root of each synchronized directory.

The difficulty is the collision of names for this database.
If the client database file name are the same for the 2 servers, then it won’t work.
If the database client file names are different, then it will work has I expect/described.

If you do it yourself, but you have to invest time or pay someone to do it.

Well, the interesting part is when both sides change things at the same time or in a very short time that will create some conflicts. That is the interesting part of all sync solutions, how is this handled. If it is just your account, you can look into it, but doing this on a large scale between instances, that is hard. If you have clear roles of master/slave servers, or you implement locking etc.

In fact I have tried it and it works.
The client seems to keep the database in files that have a name which have different names. (In my test).
I don’t know yet what is the way of calculating the name, but it seems promissing :

-rw-r--r--   1 denis  staff   225280 15 fév 08:33 ._sync_0b0ac1a25c4a.db
-rw-r--r--   1 denis  staff    32768 15 fév 08:33 ._sync_0b0ac1a25c4a.db-shm
-rw-r--r--   1 denis  staff        0 15 fév 08:33 ._sync_0b0ac1a25c4a.db-wal
-rw-r--r--   1 denis  staff   200704 15 fév 08:34 ._sync_e25d520da88f.db
-rw-r--r--   1 denis  staff    32768 15 fév 08:34 ._sync_e25d520da88f.db-shm
-rw-r--r--   1 denis  staff        0 15 fév 08:34 ._sync_e25d520da88f.db-wal 

Now I have to check in the code if those files are te only important things, and if there is a collision possibility.

After thinking of the diverse cases the following does not work properly in this concept and should be avoided / managed with care: Synchronisation of shared trees.

``
___________________|----> server 1 <-----|

User 1Client 1 <----|__________________| ----> User 2 Client 2

___________________|----> server 2 <-----|
``
In this case there may be a continuous synchronisation, modification propagating from server 1 to user 2 client 2 then to server 2 then to user 1 client 1 then to server 1 …
To be tested.