A question about backup

jonc · April 17, 2023, 7:54pm

Hello everybody,

I have a question about backup. I hope I’m in the right subforum here. And I hope it’s ok if I leave out all the details like PHP version, Nextcloud version, etc. for now. I think my question is independent of these details.

I would like to integrate the data of my Nextcloud into a backup process. I want to keep different versions of the backup. In principle, this goes in the direction of “grandfather-father-son”: So Daily, Weekly, Monthly Backups …

I have a virtual server hosted in a data centre. Nextcloud runs on this server. Locally at home I have another server with a lot of disk space. The server is integrated into a backup system.

How do I get the data from the Netxcloud into my local server? I have seen that there is a version of the Nextcloud client for the Command Line. Is that meant for ideas like this?

Or would it be better to proceed in such a way that the VPS server backs up the Nextcloud DB, the web dir and data dir daily, creates a TAR-GZ package from it and then pushes it to the local server via scp?

What is the most sensible way to proceed? What are the advantages and disadvantages?

Thank you and best regards
Jonc

Translated with DeepL Translate: The world's most accurate translator (free version)

Kerasit · April 17, 2023, 9:03pm

Well snapshotting kind of backup requires a filesystem that can do it, like BTRFS or ZFS or on the hypervisor level.
Forget SCP. RSYNC is your best friend. Supports Delta sync, compression, newest version of file and many other features. Uses SSH as protocol so requires nothing really, other than rsync on source and destination.

tflidd · April 18, 2023, 8:00am

rsnapshot does this, you can define intervals and keep hourly, daily, monthly etc. backups. In the rsnync-world, there is a whole ecosystem of different variants…

jonc · April 19, 2023, 9:19am

Thanks a lot for your good tips. True, rsync is a good way to synchronise the data. I had almost lost sight of this tool by now. I have ZFS running on the local server. I wanted to use it to implement the snapshots. But rsnapshot sounds good too. Thanks for your good suggestions

Kerasit · April 19, 2023, 9:52am

If you have ZFS already. Use that. It will use almost no overhead, and is very solid. Use RSYNC to always have the newest data backed up.

ZFS is great for more reasons than for snapshotting. It is also the filesystem with best integrity monitoring.
All storages and databases gets corrupt sectors from time to time. This is why you in a high availability environment with files needing very high integrity, uses raid. Using ZFS with only one disk also has the advantage, that compared to almost any other file system, it discovers corrupt files - even in snapshots - and try to recover it from its mirror (RAID). If you do not have that, it will clearly tell you that you can recover the specific file(s) from a backup. Litterally just transfering the files from the backup, to the destination it shows you with any command, like for example SCP or RSYNC. Then you runs SCRUB again (sometimes twice), verifies there is no more known file permanent errors, and clears the status. Use ZFS for highest possible accurate file integrity indicator, also in snapshots.

Use daily:

sudo zpool status -v

ZFS runs routinely SCRUBs, but you can initiate them if needed by:

sudo zpool scrub

If you need to recover files, it will show you the excact files in a list under permanent damaged files.
Use that list to recover the files from your backup.
When you are certian you have recovered all files and run SCRUB, and there is no more permanent known files.

  pool: mypool
  state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Apr 17 13:10:23 2023
	228G scanned at 350M/s, 51.6G issued at 79.3M/s, 296G total
	0B repaired, 17.44% done, 00:52:36 to go
config:

	NAME        STATE     READ WRITE CKSUM
	mypool     ONLINE       0     0     0
	  mydev    ONLINE       0     0     4

errors: Permanent errors have been detected in the following files:

        mypool/containers/containername:/rootfs/var/www/nextcloud/data/userid/files/somepicture.jpg
        mypool/containers/containername:/rootfs/etc/apache2/somefile.conf

Errors like these are references to files no longer either existing, or has been recovered and not recorded yet. These can be ignored:

  pool: mypool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:20:53 with 0 errors on Thu Apr 13 14:03:31 2023
config:

	NAME        STATE     READ WRITE CKSUM
	mypool      ONLINE       0     0     0
	  mydev     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x19b83>:<0x437a>
        <0x19b99>:<0x44a6>
        <0x19ac0>:<0x10bfc>
        <0x199f4>:<0x782>
        <0x199fa>:<0x3576>

You can clear your pool status with
sudo zpool clear poolname

MartinRushton · April 19, 2023, 10:41am

The method I use on my systems relies on good old-fashioned dump in one of its forms. The master program contacts each VM or remote machine in turn and runs a small slave program once for each filesystem. The slave switches between dump(8) and xfsdump(8) as appropriate and sends the zipped data over the network to the master. The second Wednesday in the month I run on level 0, other Wednesdays on level 1 and other days level 2.

There is also a tar based slave for filesystems that are neither extX nor xfs such as /boot/efi. To keep track of dump dates and provide the same level capability as dump it uses a sqlite3 database. In the past I’ve also run cpio based dumps in a similar way.

I have provision for an arbitrary script to be run using the same mechanisms, currently my only use for it is to dump the xfs inventory which is otherwise excluded. If you are running a DBMS such as MariaDB you would probably need to dump the database at this point, remember that an open file will not dump safely.

The dumped data is recorded on an external USB disk, which is moved off-site after the level 0 dump and a new target brought into use. Apart from L0 day, it’s all automatic and I just glance at the emailed report when it arrives.