Hybrid on-prem and cloud - Replication of storage?


I currently have Nextcloud installed on a VM on-prem. Using a simple split-DNS, users can connect to the server with an internal IP when on the LAN and with the public IP (over the Internet) when outside.

However, users connecting through the Internet have less-optimal experience because of reduced speed a latency. I’m thus planning to create a replica of NC on the cloud (IaaS). I understand that NC supports replicas: the application itself is stateless, MySQL can be replicated and there’s no cache in use (load isn’t that big). The problem is with replication of storage.

What is the best way to replicate the storage from my on-prem server to the cloud server? There will be only 2 servers (1 on-prem, 1 on the cloud), and I don’t think there’s budget for a third server (many solutions require a third one for quorum). There’s a site-to-site VPN already in place between the local network and the VNet in the cloud.

So, hang on, do you want to copy your install (once) or keep two installs synced? Not quite following you (although I haven’t had much sleep, so bear with me if I’m slow…)

Sync the two instances :slight_smile: replication will happen over a slow-ish connection, but that should be fine as traffic won’t be too big.

I would imagine that copying one instance the first time and then using rsync+occ files:rescan would probably be your best option. The only thing I wonder about is if you’d run into any issues with an app storing absolute paths in the database and then having an incorrect park when the database file was rsynced. Still, if that happens, then that’s more of an issue with that app, which should be fixed anyway…

So you’d recommend using rsync, which can definitely work. Why would I need “occ files:rescan”, however, in this case? I’m just replicating the underlying files, while the index is kept in MySQL (which is replicated separately)

Actually, if you’re replicating the database as well as the files in the data folder, them you don’t need to do a rescan with occ. That’s mainly for if people are creating files through a method Nextcloud didn’t know about like FTP or rsync, without updating the database yourself.

Not with NC, but I replicate masses of data using Bittorrent Sync. I’d imagine you’re going to want files accessible from both sides almost instantaneously to avoid hampering workflow, but unless you’re running rsync almost constantly there’s going to be a delay. BTS will keep everything up to date as files change.

I think 2 separate installs with federated sharing between the two NC servers might be your best bet.

Least amount of unneccessary stuff will get synced.

Admittedly this means users that are primarily offsite will need to have a separate URL from those that are primarily onsite, so that their logins/sync clients don’t have issues.

1 Like

Thank you all for your comments.

@JasonBayton I’ve tried BT Sync before but i was not too happy with it. It creates a new torrent for each file, so it’s pretty resource-hungry

@andrewlsd sorry, federation is not an option if it requires users to continuously think what network they’re connected to and switch accounts…

@AlessandroS my sugestion for using federation would be used this way:

c1.yourdomain at office
c2.yourdomain in cloud/datacentre

userA, is usually at the office, and so would always connect to c1
userB is usually offsite, and so would always connect to c2

when userB visits the office, userB would continue to connect to c2 and so might experience a reduced speed to c2.
when userA is away from the office, userA would continue to connect to c1 and so might experience a reduced speed to c1.

Isn’t that exactly what he’s attempting to overcome?

Hi @JasonBayton. Yes. what this does is improve the normal, everyday performance for users that are normally at the office and users that are normally working remotely.

I mentioned the reduced performance for the unusual case where a user that is normally at the office happens to be working remotely, and vice versa.

a 2-node gluster plus 2-node mysql cluster should work if you want to be able to access either side. Until it doesn’t, then you might have your work cut out to figure out which side has the most valid data. GlusterFS should be able to handle that, its the database that scares me. Worst case scenario might be a total resync from one side to the next

I have done a 3-node POC setup using gluster with Master-Master-Master MySQL using severalnines.com’s clustercontrol to handle the MySQL management. You should be able to get away with the third node being a database-only node, and not needing to have any of the ownCloud app or data files.

The success/failure is likely to be tied to the file-sync frequency and file-in-memory caching between the two servers. A busy nextcloud environment would probably be more like to have an issue than a quieter one.

@andrewlsd I don’t have “office users” and “remote users”; it’s the same users that just roam in and out, so I need something that can be done with just split DNS (or something equally simple).

I’m also not really crazy about using Gluster with just 2 nodes. The local network and the cloud are connected with a VPN and failures will happen, so I want to avoid split-brain scenarios. Additionally, on the cloud we will have 1 VM only, so there’s no SLA

That’s definitely not an unusual case :slight_smile:

@AlessandroS with BTSync off the table, I’d go back to looking at rsync running on regular intervals via cron. Or maybe even on demand: http://wolfgangziegler.net/auto-rsync-local-changes-to-remote-server

The biggest downside I see is conflicts in shared folders and which side would overwrite the other come sync time.

Jason I believe I’ll try this route next week :slight_smile: Sounded like the most reliable one

I guess the only options are (1) go all-in on the datacentre-hosted Nextcloud (as if you were using Dropbox or Google Drive) or (2) get a SAAS Nextcloud service, where you pay per GB (a-la Dropbox)

Remember you can use external storage providers, perhaps that might help reduce the storage requirements on your VM and/or provide some storage resilience.

We’re looking at a way to replicate NC in a managed fashion, for a combination of off-site backup and the scenario described above. Based on experience we’re assuming this replication should happen at an “NC-aware” level, with some similarities to federation, with options on a per node basis such as:

choose partner(s) to replicate with (considering scenario where company has multiple small offices with varying storage/bandwidth situations);

priority replication of particular folders;

ability to delay/bandwidth limit replication of large files;

ability to decide default replication interval and/or schedule;

ability to choose not to replicate some files or folders e.g. hyper-sensitive content, work in progress, personal files (already being synced from a laptop/mobile device and therefore already backed up;

Let me know if this sounds of interest (or impossible!) and when work commences we’ll be sure to keep you in the loop.

What about using Unison for syncing the files? cis.upenn.edu/~bcpierce/unison/
I have several installations using this across ADSL VPN links to keep file systems in sync. I use MySQL Master to Master replication as well and this gives me more problems than Unison.

@putt1ck the extension of nextcloud sounds pretty much like the one I’m searching for. Will it enable my installation to have some kind of satelite instance in another department or on the web?

It would be very interesting to have the ability to define access priorities too. Depending on your app configuration or location nearest storage location should be choosen.

Is your solution in developement or are you still searching for possible options available?