Backup in reasonable time

Hello,

I have the following setup:

  • Ubuntu 18.04 Server (older hardware, 2GB RAM), small SSD for the system
  • 4TB HDD for Nextcloud data (currently about 1,2 TB used)
  • 5TB external HDD (2,5 inch, USB 3) for Backup, LUKS encrypted

Before automating anything I tried to do a manual Backup (as suggested here, in german):

cd /var/www/html/nextcloud
sudo -u www-data php occ maintenance:mode --on
sudo tar -cpzf /mnt/NC_Backup_scripts_`date +"%Y%m%d"`.tar.gz -C /var/www/html/nextcloud .
mysqldump --single-transaction -h localhost --all-databases -u username -p > /mnt/NC_Backup_DB_`date +"%Y%m%d"`.sql
sudo tar -cpzf /mnt/NC_Backup_DataDir_`date +"%Y%m%d"`.tar.gz -C /media/storage/data .
cd /var/www/html/nextcloud
sudo -u www-data php occ maintenance:mode --off

Problem: Backing up the data dir that way would take at least 16 hours (!) for one TB.
I had the server in maintenance between midnight and 8 in the morning (about 500 GB backed up till then), but had to switch it back on so that it can be used.


So I am looking for ideas how to speed that up significantly, so that I can do a daily backup in some way.

  • Shall I use an unencrypted backup drive?
  • A faster backup hdd?
  • Do I need a faster CPU/Server to do the compressing/encrypting in reasonable time?
  • Or do I need some big Raid/LVM System where I can make snapshots etc.?
  • Or do something like an rsync to a second internal hdd, and then back up that one (which can take the whole day…)?

Thanks for any ideas to get a proper setup. I am ready to buy a new machine, new hdds etc. if needed. Just would like to follow some best practice, and not reinvent the wheel.

Thanks,

Thomas

Hi,

Maybe for a faster backup you could parallelise rsync using fpsync, a tool provided by fpart.

And tar after the rsync processes are finished.

What is around 17 MB per second that is not so bad for 2 Cores old CPU. Try to not using compression, this will increase your backup speed a bit, but only if you hit CPU limits.

It is already not bad to have 17 MB per second transferred on older CPU.

Try to use rsync, or fpsync. You even do not need to put server in maintenance mode. I’m using this script to simply backup to remote FS.

Just run something like:

rsync -a --exclude=data/updater* --exclude=*.ocTransferId*.part --partial --info=progress2 --delete NextCloudPath BackupFolder/nextcloud/

I’m personally really hit my Network bandwidth limits by using rsync, so fpsync will not bring me any benefit.
Investigate what is your problem:

  1. hit CPU Limits - try to disable compression. Encryption will also consumes your CPU.
    I’m also doing like this: NC Server has an old 2 core CPU, so I use to move compression and encryption tasks to backup machine:
    tar -cvp /NC/FOLDER | nc -q 0 receivingHost 8080
    on Backup Server side using compression and encryption with LUKS, because CPU is more efficient.
    nc -l 8080 > gzip backup.tar.gz

  2. hit HDD Limits - check you HDD speed, may be it is old and could not handle more MBps of write.

  3. hit USB Limits - try to move HDD to other machine and use Network (if you have more then 1Gbps Net). But it seems you are using USB 3.0 so have a look on a first hint.

Thanks for the info, yes it seems the CPU is the limit. I have some 2-3 load since the backup started, which means the server is quite slow (web interface really slow etc.).

I am still thinking about the overall concept of the backup.

e.g. simply copying files (RAID or rsync) is not a “real” backup, because there are no file versions (“restore the file from 2 weeks ago”) and accidently deleted files may be deleted too (in RAID for sure, in rsync depending on the options).

But it is also unrealistic to restore a whole Nextcloud Server with 1 TB of data to a 2-weeks-old state cause 1 file is missing.
(I am not sure if the Nextcloud server itself keeps older file versions?)

So an idea is to just copy the server data for the purpose of a restore (if the server/hdd breaks) - which means that this is not a real backup.

And then do the “real” file backups on the clients with a software like Deja dup (or Time machine on a mac…) - to backup the Nextcloud sync folder. Such a software would allow to restore older versions etc. via a nice GUI.
Problem of that: Several laptops just have 256GB SSDs, so only a small part of whats on the server can be kept in sync. At least one client would be needed with enough space to fit ALL server data.

I do fear the situation that the server(hardware/system ssd…) breaks, which means I have a nice backup of all data, but it takes me days to reinstall everything and get a new server with the data up and running.
To be safer with that it may make sense to keep a second server (with rsynced data) running (maybe use that rsynced server to to the actual backup work to external hdds).


It is all quite confusing. In the past I trusted gdrive and a cloud2cloud backup (Spanning), now I switched to Nextcloud. Works really well, but I didn’t figure out yet how to do backup well (including a clear path to restore the server when hardware breaks etc.).

Did not catch this use case. If your user delete 1, or X files, you do not need to restore whole X GB of files, just find this in backup and copy it into user folder and run command to rescan it:

sudo -u www-data php occ files:scan rescans whole DATA Folder (will take a while)

sudo -u www-data php occ files:scan --path user_id/files/path/to/file rescans only restored folder in the folder.

Or, you could use filesystem snapshots (from btrfs, zfs, etc) to restore snapshot, but not in disaster when you HDD dies. There is even App to manage it Snapshots - Apps - App Store - Nextcloud

With a Versions app - yes. This App is part of official delivery.
You can even setup it to save all versions for at least X days: https://docs.nextcloud.com/server/15/admin_manual/configuration_server/config_sample_php_parameters.html#file-versions

Backup of NC should have 2 parts: Backup of DB (you did it with mysqldump command) and backup of data folder. Optionally config or whole NC folder with Apps.

If you are going to “cheap” solution without Raid - just do periodically rsync to your external HDD. This will keeps your data safe. Otherwise have a look into RAID.

It is possible with rsync also, or user restic Rsync to cloud storage for backups? - #2 by Reiner_Nippes But without GUI :slight_smile:

Current status:
Found out: The server has only USB2. The max data transfer is about 30 MB/sec uncompressed, and about 20 MB/sec with gzip (CPU limit).

Also my tar backup command stopped working. It worked just once/first backup, backing up about 1TB. Now the data amount slightly grew (about 1,3TB) and the command stops at about 100-200GB (no matter if just using uncompressed tar or tar and gzip, tried both).

So rsync seems to be the only way to go, making 1 synced version by day (hardlinking the unchanged files - need to learn more about that).
I read that rsync needs lots of RAM, so I am sceptical if that will work on this machine (1,3 TB of data, over 300k files).

I would need to add a USB3 Extension card and more RAM - or just get a newer (used) machine.
Its amazing how well Nextcloud can handle/serves that amount of data on such an old machine, but when it comes to backup (and restore…) I see real limits.

I use rsnapshot. It only transfers (and stores) the difference since the last backup and let’s you access the state x hours/days/weeks back. There is a whole range of similar scripts, chose the one that suits you most.

I use now a rsync script based on (DE) https://wiki.ubuntuusers.de/Skripte/Backup_mit_RSYNC/

I think it is similar to rsnapshot. Basically it makes a new folder for every day, copies only new files and hardlinks all not-changed files.
(This is btw. also done by Apples Time-Machine, just with a fancier interface)

I had no problems with RAM - the Machine has 2 GB of RAM and uses about 1 GB of Swap (on a SSD) - rsync worked well (syncing over 300k files…). During the Backup the load goes quite up, but Nextcloud is still useable.

I think now about getting a slightly newer (desktop) machine, for USB3, more RAM and a faster CPU. Also LVM/LVM Snapshots could be helpful.

Exactly.

Or directly use a COW filesystem (ZFS/btrfs). Even NCPi on raspberry uses btrfs.

Hi Thomas,

I ran into the same problem and I’m curious how a professional service would solve this situation (i.e. uptime vs. data security). If I’d provide a professional cloud service I’d have my issues with having to enable a maintenance mode (for private stuff this is somehow tolerable). IMHO this is something nextcloud has to address in the future, i.e. provide a zero downtime (no maintenance mode) backup solution.

The above thread contained some good pointers for me to stitch together a dedicated script and enhance it with some additional stuff:

  • using the nextcloud.export command (this is the snap syntax) to export everything but the data
  • enabling maintenance mode during synchronization of the data because: Not doing so can result in corrupted data being backed up in cases when a currently rsynching file is modified concurrently by nextcloud
  • Better dumping of old backups using a find <...> -mtime +${KEEPDAYS} <...> implementation

On my machine (HP ProLiant ML330 G6, Xeon E5606 @ 2.13 GHz) the initial virgin copy of 800 GB took 14h16m from a RAID 1 array to a simple internal SATA backup drive. I run the backup via cron every day at 3:00 AM.

Even though the script is pretty specific to my setup I post it here in case it could be helpful to others:

#!/bin/bash

DATADIR=/media/data0/nextclouddata
BACKDIR=/media/backup/nextcloud
PREFIX=`date '+%Y-%m-%d'`
KEEPDAYS=60

NC_BIN=/snap/bin

# log to stdout and show progress when run in terminal
if [ -t 1 ]; then
        RSYNCOP=--info=progress2
        LOG=/dev/stdout
else
        RSYNCOP=-v
        LOG=${BACKDIR}/log/${PREFIX}.log
fi

mkdir -p ${BACKDIR}/config
mkdir -p ${BACKDIR}/data
mkdir -p ${BACKDIR}/log

# Export nextcloud setup (data option -d does not work with external data dir).
# Assumes a symbolic link at /var/snap/nextcloud/common/backups pointing to
# ${BACKDIR}/export
${NC_BIN}/nextcloud.export -abc >> ${LOG} 2>&1

# Create compressed archive (rsynching not useful here because the source data is
# freshly generated on each export)
CONFTARBZ=${BACKDIR}/config/${PREFIX}.tar.bz2
(tar -c ${BACKDIR}/export/* | pbzip2 -c -p2 > ${CONFTARBZ}) >> ${LOG} 2>&1
rm -rf ${BACKDIR}/export/*
ln -nsf ${CONFTARBZ} ${BACKDIR}/config/latest

# Export nextcloud data
${NC_BIN}/nextcloud.occ maintenance:mode --on
rsync -aR --delete --link-dest=${BACKDIR}/data/latest ${RSYNCOP} ${DATADIR}  ${BACKDIR}/data/${PREFIX} >> ${LOG} 2>&1
ln -nsf ${BACKDIR}/data/${PREFIX} ${BACKDIR}/data/latest
${NC_BIN}/nextcloud.occ maintenance:mode --off

# Remove backups older than KEEPDAYS
find ${BACKDIR}/data/   -mindepth 1 -maxdepth 1 -type d -mtime +${KEEPDAYS} -not -name "latest" -exec rm -rf {} \;
find ${BACKDIR}/config/ -mindepth 1 -maxdepth 1 -type f -mtime +${KEEPDAYS} -not -name "latest" -exec rm -rf {} \;
find ${BACKDIR}/log/    -mindepth 1 -maxdepth 1 -type f -mtime +${KEEPDAYS} -not -name "latest" -exec rm -rf {} \;

Addendum:
It is possible to avoid longer maintenance mode enables by

  1. running rsync without maintenance mode during a large sync
  2. run it again with maintenance mode enabled to resync anything that changed during 1. and correct any potentially corrupted data

Latter will run significantly faster than former - keeping the down time to a minimum.

Veeam has a free standalone Linux agent that’s pretty good. You might give that a try. The first backup will be long, but incremental after that are pretty quick depending on rate of data change.