Hi all.
I have been using Nextcloud since 2016, and I rely on it for my self-hosted services. Historically, I have run my Nextcloud instance in a lxd (lxc) container, and it’s operated well. One of the things I had on my ‘to-do’ list was to create a truly seamless fail-over experience. This was because downtime was sometimes more than just inconvenient. I suspect my journey is not over, but recently, I have setup a new installation that addresses many of my needs:
- I have three instances of Nextcloud running, each on three different physical servers.
- Using keepalived, one of them is always hot and faces the internet (this so very easy to config).
- My files are stored using a glusterfs replicated storage (which is hosted on each of the three servers inside dedicated storage vm’s).
==> I mount the glusterfs storage in my containers in two different directories, one for the Nextcloud installation with all the configs/settings/apps (typically at /var/www/nextcloud) and one for the much more voluminous user data (I personally mount mine at /var/www/nextcloud-data, yours may be different). You could use e.g. ceph instead of gluster, but it does my head in trying to get it work reliably so I went with glusterfs. Your mileage may vary. There are other solutions here too - but basically some storage you can sync very fast. - I syncronize the databases using Galera. This is the magic: this is what keeps everything properly in sync. It’s actually easy to configure.
I have to say, to date, I am very pleased with this setup. It’s my best ever “failover” service. I can stop any one server and another one picks up where I was. It’s not 100% fool-proof, but it’s about 98+% there so far. I may be able to improve my reverse proxy or something to fix the brief issues I do see, but they are very “livable” and are similar to glitches on commercial services, where you have to refresh a page occasionally.
A key result of my setup is that all my files, settings, apps , app-data, shares, contacts etc. etc. are simultaneously updated and made immediately available on every node. Of course, you double (or in my case, triple) storage requirements since three redundant copies means, well, three lots of storage.
I can still get downtime (power cut, ISP), but things like me rebooting a server (or more likely, me accidentally breaking one (again)) now doesn’t impact my Nextcloud uptime at all.
I just wanted to post this to let people know this is possible, as I have seen many articles that make this harder work than it now has to be. I stress this is still not 100% perfect - when I have deliberately tried to break this, I can get momentary disconnections, but nothing that a page refresh doesn’t fix - something I think we all deal with on just about any service.
A downside of clustering is that it is slightly slower, but in my use-case, it’s literally about 25% slower - to where I don’t really notice it. If you want absolute best speed, a single server using flash storage on bare metal with super fast cpu/memory is unbeatable. But I wanted substantial redundancy as I store a lot of personally valuable data.
I am not suggesting this is the only way to do it, it’s just my (new) way. If anyone is interested to know more about this, feel free to pm me or better yet message me (likely faster) on twitter (@ogselfhosting) or on mastadon (@OGSH). I don’t claim to be an expert, but after 8 years or so, I finally mange to get enough of this right enough that it seems to work well for me.
Happy Nextclouding, y’all