Hello, I am about to install Nextcloud (docker, vm, … don’t know yet).
After extensive reads I came to the conclusion I would not be able to “easily” achieve HA and FT and that is okay.
I am focusing on a robust disaster recovery though.
Here is my planned setup:
1x VM - Nextcloud will be installed here.
1x QNAP or similar - have all my files/data here and attach it to the VM via NFS.
have the volumes of all containers on the QNAP
As for disaster recovery:
1x snapshot of the VM (everytime I upgrade Nextcloud, docker and run yum update).
recurrent snapshots of QNAP.
In case the VM crashes and does not restart, My questions are:
Restoring from the snapshot would be sufficient?
Do I need to introduce borg backups on the database and/or anything else?
Do I need recurrent snapshots of the VM if there have been no changes in the meantime?
Can I move all volumes of the containers to the QNAP? I read somewhere Master and Redis volumes should remain on the vm.
Suppose the vm crashes while data was being written, is there any chance, even remote that anything will get corrupted? database, files …
Then you need to know, what you want to protect. Like for disaster recovery, you want to be back online quickly, there snapshots are really great because they contain the data and all the configuration. However, they are rather large, so you don’t want to do them too often. But for many things it is perhaps already great to have back everything, even if it is one more days old. And you need a very fast connection to play back the backup (or even a second independent disk, which might not protect against local fire, …).
I also use a different backup just on the data with rsnapshot, that covers backups of smaller time intervals and allows also to go back in history (e.g. find a file that was deleted 3 months ago). And it just saves the differences. Getting back to this data is a bit more difficult, but this is more for critical data that must not get lost but which is not time critical.
From my long experience in IT I would recommend you to avoid HA/FT setups if possible. It is not easy to implement HA/FT right and often it is not really required.
The starting point of every DR strategy must be proper backup/restore 101: backup what and why (not how). This is the only way to cover all kinds of outage (and no snapshots are not a proper backup).
HA/FT comes on top of this and help to avoid/shorten an outage at the cost of added hardware cost and operation complexity. As long you don’t really need to I would recommend focusing on backup/restore and only consider HA/FT in case your recovery time objective (RTO) is shorter than restore duration, accept added cost and undestand and can manage additional complity.
We had few discussions already - likely you find more detailed discussion in topics tagged high-availability
My desaster protection is “daily full backup of VM” in my business and three times per week of my private VM. Snapshots I only keep, while testing the VM after updates. If everything works as expected I remove the snapshots.