Redundant Storage

I’d like to propose a feature to mirror folders between two locations. One reason I’m hesitant to rely on Nextcloud is because I don’t want to depend on a single storage solution. I’d like to be able to upload a file to a folder and, in the background, have it automatically upload to two different storage locations, such as local storage and S3. I’d even be happy if we had the option to keep two folders in sync on a schedule instead of on file upload or creation.

4 Likes

I like it! Would be worth filing a feature request. Your best bet is to use rclone. Or, checkout Borg Backup.

Slightly different, but see this request for mirroring federated data locally. Different design, but also for mirroring a directory between two locations (and two users).

2 Likes

Well, already syncing all the clients to the server is not easy and there are still issues (conflict files). Doing such a thing between servers must be more reliable because a user can’t easily check it. There have been several attempts to do this here, but I have seen nothing easy to setup and use.

For business users, there is the global scale feature. I’m not really sure if they rely on storage solutions to spread data over several locally distributed locations. Unfortunately there is not much documentation about this.

How would you use rclone? Do you mean to sync data on a client device using the Nextcloud client so it gets synchronized twice? If that’s what you mean, the problem with that is the double upload and, in my case, frequent errors. My thought is that it should be uploaded to the Nextcloud server once and then saved in two locations.

Rclone is used to rsync your chosen directories (or Nextcloud’s entire /data directory) to multiple locations. It is well documented, with lots of detail on their website. You would use cron to schedule when and where backups are saved. I recommend running it on a separate machine for managing your backups/one way mirrors.

You can also use it as a sort of all purpose external storage mounting tool.

The problem is that you need a database in sync as well. For larger setups, there are database cluster and storage backends where different servers can connect at the same time. These machines are in close proximity with fast connections.
For remote locations, you have to manage a certain delay and try to avoid conflicts. I don’t know if there are working solutions and what conditions you need.

Are you referring to the rclone suggestion or my suggestion? I don’t see how it would be necessary to for a database sync when the only thing NC would really be doing is saving in location A and saving in location B. Obviously there are specifics of dealing with ensuring data integrity between the two and which to use when the files are different, but the gist of it would be separate file entries in the DB just as their are now.

I foresee a “virtual directory” table that simply references the records for the real files in the two locations.

It depends what you exactly want. You wrote about mirroring two locations, so this means you can modify both sides, and there is the database involved somehow. If it is just a backup or a read-only copy, this is different.

You could even use syncthing or similar to keep two locations in sync and use this folder as external storage in Nextcloud. So you can keep two locations in sync and use Nextcloud mainly to provide access to the outside.

Yes, I said mirroring two locations, but not in that respect. Similar to a RAID mirror where a change is written to 2 or more drives but you aren’t making changes directly to either of them. That was my original desire. Then I said I’d be happy if it kept to locations in sync, but that was more of an “if I can’t get mirroring then sync would suffice”.

Let me give better details of what I’m thinking:

Nextcloud presents a virtual folder to the user. When a file or folder is uploaded or created, the file is saved in both Location A and Location B. To users the files appear to exist once but on the backend they are actually stored twice for redundancy and availability. If Location A goes down then the file is still accessible from Location B or if the file in Location B was corrupted (hash doesn’t match, for example) then the file in Location A still exists and can repair the file in Location B.

If you run your Nextcloud sever virtualized, this can be achieved using VMware Fault Tolerance
https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.avail.doc/GUID-623812E6-D253-4FBC-B3E1-6FBFDF82ED21.html

FT provides continuous availability… by creating and maintaining another VM that is identical and continuously available to replace it in the event of a failover situation.

2 Likes

I do, but it’s not VMWare.

But then you need the database as well. And the user must connect to the second location, who decides when to switch?

Just for the files you could use distributed storage, glusterFS, …

Such setups get complex, which can creates errors on its own (it even happens to the googles and amazons from time to time). With a RAID-system in a stable location and good backups, you can get pretty good availability, even after a full system failure, you might restore everything from backup within a couple of hours.

There have been a few topics about this before:

1 Like

I do that with the underlying storage solution. In my case I run NC in a FreeNAS VM, and I replicate the freenas storage (including nc vm) to a remote location.

I wouldn’t add that complexity to NC.

1 Like

That’s assuming you’re using local storage. I’m using GlusterFS but I’ve experienced a lot of issues since it doesn’t handle small files well. I use object storage for most of my storage but that means placing all of my trust in Amazon or someone else? With redundancy I could keep one copy in S3 and one copy in Linode Object Storage. Not everybody runs their instance on a server they control. I don’t control the underlying hardware of my server so I, and likely others, have to rely on other solutions.

The database is the easy part.

Not everyone has control of the underlying VM host.

A Proxmox cluster can do that as well.

Hypervisor level FT is over the top, and usually expensive. I as an vmware pro i prefer simple HA, but thats another story.

Building a cluster, etc. is not an easy task and introduces a LOT of complextity. Belive me, i tried :upside_down_face:

Having a singe instance on a reliable Virtualisation layer, combined with a Backup and an Off-Site Backup (just in case the building burns down) is the way to go. No need for paranoia or to file level sync everything all over the globe.

Maybe you should indeed take a look at Proxmox for HA then.

If no HA is needed this is indeed a good idea. Proxmox already has a backup tool integrated for VMs so backing up is easy.