Nextcloud version (eg, 20.0.5): 20.0.6 - 20.0.6.1 (via SNAP)
Operating system and version (eg, Ubuntu 20.04): Linux 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l
Apache or nginx version (eg, Apache 2.4.25): Apache (fpm-fcgi)
PHP version (eg, 7.4): 7.4.14
MY PROBLEM:
I do a backup of my nextcloud data directory with a script. This script packs the data into a .tar.gz file. Basically I use a raspberry pi for my nextcloud with an external hard disc (usb 3.0). The backup is stored on another hard disc which is connected to my fritzbox (with fritz-nas, usb 3.0). The connection between fritzbox and and raspberry pi is over ethernet.
Basically this is working for me. The script is able to copy the data over the network from the data directory to the backup directory. The problem is, that it is very slow. For my round about 400-500G it takes more than a night to backup the files.
Now I’m wondering what to improve here. In my imagination it should be a smarter way of doing this. Does anybody have a suggestion?
I thought of doing it with rsync instead of tar. Is this a good way maybe?
Here is the basic way I’m doing the backup:
#!/bin/bash
# backup script for snap installation of nextcloud
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
user=pi
bakdir="/media/fbox/ncbackup/20210309" # backup dir on fritz box nas, mounted on /media/fbox
ncdatadir="/media/WD1" # data directory, hd directly via usb3.0 connected
fn_BakData="nextcloud_data.tar.gz" # file name of tar file for data
# create bakdir
if [ ! -d "${bakdir}" ]
then
mkdir -p "${bakdir}"
chown -R $user:$user ${bakdir}
else
errorecho "ERROR: backup dir ${bakdir} already exists!"
exit 1
fi
# maintenance mode on for snap
nextcloud.occ maintenance:mode --on
snap restart nextcloud.php-fpm # not sure if this is still needed
snap stop nextcloud
# backup data
tar zcpf "${bakdir}/${fn_BakData}" -C "${ncdatadir}" .
# maintenance mode off
snap start nextcloud
nextcloud.occ maintenance:mode --off
snap restart nextcloud.php-fpm # not sure if this is still needed
It does an initial full backup and after that, only changes have to be transfered. Or some kind of that. You should read on their website for technical details.
I saw a really nice tutorial and usable sample script at c-rieger.de (sorry, german only).
rsync is a good one for a task like this.
Even better would be rsync-time-backup, which also uses rsync and keeps a kind of a history on the backup drive (the file system on the backup drive needs to support hard links).
Raspi compute performance could be an issue as well. Compressing 500GB of data needs lot of computing…
I suggest perform some tests like copy some large files to your backup destination - here you can analyze how well your network performs. If this is bad focus on improving network speed… otherwise take a look at CPU/RAM of the Raspi while backup happens - continuous CPU usage over 80% means the bottleneck is there. In this case skipping compression may help already.
TEST 2: create 1GB file on fritzbox and copy via rsync to raspi:
1048576000 bytes (1.0 GB, 1000 MiB) copied, 94.007 s, 11.2 MB/s
copy: /media/fbox/test/test.bin > /media/WD1/test/test.bin
rsync: send_files failed to open "/media/fbox/test/test.bin": Stale file handle (116)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]
file creation time: 95 s
rsync time: 0 s
file size: 1048576000
Summary:
Test 2 is very slow in copying the file 11MB/s in comparison to 67MB/s
rsync from fritzbox to raspy not even working: Stale file handle error
→ first google analysis showed something with mounting problemsany suggestions here?
@ Test 1 the rsync speed was ~11MB/s → this is exactly the copying speed from USB on fritz box
→ bottleneck seems to be the USB on fritzbox (NextcloudUser you’re right ;-))
Rough estimation: copying 500G with 11MB/s takes nearly 13 hours. Thats too long in my eyes, right?
So I maybe will have a look at BorgBackup.
I guess the performance of my raspi is not the issue then, right?
Sorry, I don’t see the point in your tests as they just measure the network speed very roughly.
Maybe you haven’t understood yet how rsync or borgbackup work. When doing a backup for the very first time, the complete data needs to be transferred, no matter whether you use tar.gz, rsync or borgbackup. The thing changes, when doing the next backup. Then rsync and borgbackup will do an incremental sync (sending only the data that changed from the previous backup) - contrary to just copying a tar.gz file, which will transfer the whole bunch of data again and again.
When using rsync or borgbackup you must not compress your data-directory to a tar.gz file prior to backing it up. Just use the uncompressed datafolder as it is to transfer it to your backup location when using rsync or borgbackup and let them do the compression to reduce the size of the transferred data.
Thanks for the explanation. I think I mixed up some things in my head. I originally thought of having a kind of history-backup-solution with different folders each week. This would mean that the script would copy everytime a full backup. But in the meantime I rejected this idea.
For me it is now clear that this incremental backup is a smart idea and I think I will try out one of them.
My fritzbox configuration says it’s 3.0. But I will retry configuration and plugin off and on again.
yes, I’ll do this when I have time in the next days.
Exactly this is what rsync_time_backup does: with every invocation of rsync-tmback.sh a folder with a complete backup will be created at the backup destination. It comes with an expire strategy, so older backups are automatically deleted (e.g.: within 24 hours, all backups are kept. Within one month, the most recent backup for each day is kept. For all previous backups, the most recent of each month is kept.)
I have no experience with borgbackup, but I assume it works quite similar to rsync_time_backup.
No.
To create a full backup, only changed data compared to the previous backup needs to be copied to the backup destination as unchanged data is already there (from the previous backup).
The magic behind all this are hardlinks: unchanged files do not have to be created again: a hardlink to the same file of the previous backup is enough. This saves bandwidth during the backup process and disk space at the backup destination.
I use duplicacy for backups and it does a great job. You can adjust the intervalls for full and incremental backups and it is quite reliable both from my experience and what I’ve read about it.
the most important difference between rsync an borgbackup orrestic: rsync stores full copies of the files at the destination, while others create proprietary backup repository using deduplication methods: if multiple files have identical parts, this repeated parts are stored only once at the destination. the advantage is much smaller backup size, the disadvantage is they need much more computing power to identify all the unique parts and especially in case of restore to combine the files from the pieces stored in the backup. one practical effect of this design decision: rsync restore is as simple as file copy from archive, restoring backups from borgbackup and restic requires the program itself and the knowledge how to restore the backup using the program. restic always encrypts the backup (optionally for borgbackup), which is good if someone gains access to your backup storage and bad if you loose the password…
hardlinks used by rsync_time_backup work like de-duplication as well - de-duplication is not such fine-grained as by restic and borgbackup but I don’t think it makes big difference for typical scenarios built of almost static documents, hundreds of photographs and videos…
there are lot of backup programs and strategies. you should choose one which fits your needs in terms of speed, reliability and security. It makes no sense to switch to the very best program if you are happy and familiar with the second best. In fact differences within same backup category are small - each common backup solution out there works well given the setup is right and the admin knows how to run the system on daily base.
The most important truth around backups: you should always setup automatic backups to different destinations (3-2-1 rule) - and verify restore procedures from time to time - but from your original post it looks this is what you are looking for.
You could also give rclone a try. Works in the background similar to rsync with different sources and providers. It also offers a web interface for all the mouse movers of us
I tested this now more in detail and found a problem with links.
When I connect my HD (with ext4) to my fritzbox I’m not able to run the rsync-time-backup completely becaus it is telling me, that I’m not allowed to create the link to the /latest/ directory.
When I try the same with HD connected directly to my Laptop, it works.
Somehow the fritzbox USB mounting is not able to create this links.
Here is the error message (in german):
ln: die symbolische Verknüpfung '/mnt/hannelore/bak/latest' konnte nicht angelegt werden: Die Operation wird nicht unterstützt
It says basically:
symbolic link cannot be created: operation not supported
Google tells me nothing about the problem fritzbox and symbolic links.
I’ve written an support request to AVM. Waiting for input here.
If this is not possible I’ve to think about another backup-solution: my next try will be bork-backup.
That symbolic link is just a nice-to-have link to find the latest backup quickly. If you sort the backup-folder by name, you will easily find the latest backup without that link. The backups are created and valid without that symbolic link.
You could just comment out the line 617, so the link won’t be created. Then the script should run through completely without errors.