Hybrid on-prem and cloud - Replication of storage?

Not with NC, but I replicate masses of data using Bittorrent Sync. Iā€™d imagine youā€™re going to want files accessible from both sides almost instantaneously to avoid hampering workflow, but unless youā€™re running rsync almost constantly thereā€™s going to be a delay. BTS will keep everything up to date as files change.

I think 2 separate installs with federated sharing between the two NC servers might be your best bet.

Least amount of unneccessary stuff will get synced.

Admittedly this means users that are primarily offsite will need to have a separate URL from those that are primarily onsite, so that their logins/sync clients donā€™t have issues.

1 Like

Thank you all for your comments.

@JasonBayton Iā€™ve tried BT Sync before but i was not too happy with it. It creates a new torrent for each file, so itā€™s pretty resource-hungry

@andrewlsd sorry, federation is not an option if it requires users to continuously think what network theyā€™re connected to and switch accountsā€¦

@AlessandroS my sugestion for using federation would be used this way:

c1.yourdomain at office
c2.yourdomain in cloud/datacentre

userA, is usually at the office, and so would always connect to c1
userB is usually offsite, and so would always connect to c2

when userB visits the office, userB would continue to connect to c2 and so might experience a reduced speed to c2.
when userA is away from the office, userA would continue to connect to c1 and so might experience a reduced speed to c1.

Isnā€™t that exactly what heā€™s attempting to overcome?

Hi @JasonBayton. Yes. what this does is improve the normal, everyday performance for users that are normally at the office and users that are normally working remotely.

I mentioned the reduced performance for the unusual case where a user that is normally at the office happens to be working remotely, and vice versa.

a 2-node gluster plus 2-node mysql cluster should work if you want to be able to access either side. Until it doesnā€™t, then you might have your work cut out to figure out which side has the most valid data. GlusterFS should be able to handle that, its the database that scares me. Worst case scenario might be a total resync from one side to the next

I have done a 3-node POC setup using gluster with Master-Master-Master MySQL using severalnines.comā€™s clustercontrol to handle the MySQL management. You should be able to get away with the third node being a database-only node, and not needing to have any of the ownCloud app or data files.

The success/failure is likely to be tied to the file-sync frequency and file-in-memory caching between the two servers. A busy nextcloud environment would probably be more like to have an issue than a quieter one.

@andrewlsd I donā€™t have ā€œoffice usersā€ and ā€œremote usersā€; itā€™s the same users that just roam in and out, so I need something that can be done with just split DNS (or something equally simple).

Iā€™m also not really crazy about using Gluster with just 2 nodes. The local network and the cloud are connected with a VPN and failures will happen, so I want to avoid split-brain scenarios. Additionally, on the cloud we will have 1 VM only, so thereā€™s no SLA

Thatā€™s definitely not an unusual case :slight_smile:

@AlessandroS with BTSync off the table, Iā€™d go back to looking at rsync running on regular intervals via cron. Or maybe even on demand: http://wolfgangziegler.net/auto-rsync-local-changes-to-remote-server

The biggest downside I see is conflicts in shared folders and which side would overwrite the other come sync time.

Jason I believe Iā€™ll try this route next week :slight_smile: Sounded like the most reliable one

I guess the only options are (1) go all-in on the datacentre-hosted Nextcloud (as if you were using Dropbox or Google Drive) or (2) get a SAAS Nextcloud service, where you pay per GB (a-la Dropbox)

Remember you can use external storage providers, perhaps that might help reduce the storage requirements on your VM and/or provide some storage resilience.

Weā€™re looking at a way to replicate NC in a managed fashion, for a combination of off-site backup and the scenario described above. Based on experience weā€™re assuming this replication should happen at an ā€œNC-awareā€ level, with some similarities to federation, with options on a per node basis such as:

choose partner(s) to replicate with (considering scenario where company has multiple small offices with varying storage/bandwidth situations);

priority replication of particular folders;

ability to delay/bandwidth limit replication of large files;

ability to decide default replication interval and/or schedule;

ability to choose not to replicate some files or folders e.g. hyper-sensitive content, work in progress, personal files (already being synced from a laptop/mobile device and therefore already backed up;

Let me know if this sounds of interest (or impossible!) and when work commences weā€™ll be sure to keep you in the loop.

What about using Unison for syncing the files? cis.upenn.edu/~bcpierce/unison/
I have several installations using this across ADSL VPN links to keep file systems in sync. I use MySQL Master to Master replication as well and this gives me more problems than Unison.

@putt1ck the extension of nextcloud sounds pretty much like the one Iā€™m searching for. Will it enable my installation to have some kind of satelite instance in another department or on the web?

It would be very interesting to have the ability to define access priorities too. Depending on your app configuration or location nearest storage location should be choosen.

Is your solution in developement or are you still searching for possible options available?

Regards,
Pepe

Weā€™re still speccing it out, but seems clear that a replication approach at an NC level is the only to provide the functionality weā€™re interested in. Weā€™ll open a github project at some point soon and use the wiki to flesh it out, then look at what resource is needed to achieve it.

Hey putt1ck,

This is exactly the sort of thing I am interested in. What stops me from using NextCloud is that itā€™s not easy to have exact real(ish) time copies.

To put some more context, Iā€™m a home user and looking to replace my QNAP and RPi rSnapshot backup with something a bit more feature-full and easier to manage. Iā€™m keen to move onto Raspberry Pis and cheap HDs, when some hardware breaks I can just through another one in itā€™s place and click replicate.

Is there any news on these features being added to NextCloud?

Matt

Weā€™re assuming weā€™ll have to develop it, with at least one client interested in this functionality; would be good to get NC core devs to say ā€œnot possibleā€, ā€œalready in planningā€ or ā€œhey, sounds good, go for itā€ before we start though! Client whoā€™s interested is still migrating to NC from legacy solutions, so weā€™re unlikely to start working on this for a month or so.

@jospoortvliet @nickvergessen @LukasReschke

One of those may be able to confirm or deny :thumbsup:

We have no direct plans to create a way for multiple nextcloud servers to replicate data.

However Global Scale might be a step in the right direction. See
https://nextcloud.com/blog/nextcloud-announces-global-scale-architecture-as-part-of-nextcloud-12/

but global is not available in the community version ā€¦