I've got a failing hdd and I'm not sure if I understand BTRFS snapshots

Absolute_axolotl · July 22, 2021, 7:54pm

Before I start I say that I have read the documentation related to nextcloudpi and using BTRFS snapshots but I’m still not sure that i understand it. Quite possibly when i set up my pi4 with this version of nextcloudpi i was misunderstood something, or not!

I think my 3TB hdd with the data directory on is on the brink of failing, basically it seems to only be connected in fleeting moments. It’s formatted to BTRFS. I have a flash drive that takes regular snapshots using nc-snapshot-auto. Suppose the hdd dies will the snapshots be enough to restore everything? Have I totally misunderstood how the snapshots work? I have a spare hdd that I can swap in I just need to know if i can use the usb snapshots as a data directory recovery or am I starting again?

eyduh · July 22, 2021, 10:38pm

In what way is it failing? Which version of ncp, nc and kernel are you running?
nc-info under ncp-config has a lot of useful troubleshooting info. uname -r gives you your kernel
btrfs problems can occur because of the whole volume being taken up by snapshots or the particular kernel has an issue with btrfs (very common pre 5.10.8 kernels), or it has become fragmented, etc.

Do you use the HDD for anything other than nextcloud, ie are the snapshots just of nextcloud data or are they used for other data as well?
If it’s only nextcloud data I would use the ncp-backup feature first of all to backup to the new location. This will create a tar archive of your data which you can use to restore everything should anything go topsy-turvy. I would also say that rsync -a is your friend.

Have a look at How to recover a BTRFS partition – Own your bits which has a lot of useful info on btrfs. also duckduckgo/startpage are your friends.

This is also a very useful thread on btrfs:
https://www.oracle.com/technical-resources/articles/it-infrastructure/admin-advanced-btrfs.html
Which refers to this article when you get to the snapshots section:
https://www.sanitarium.net/golug/rsync+btrfs_backups_2011.html

Long aswer short, I think you still should do a regular old school backup using the ncp-backup tool or rsync. The snapshots are basically just duplicated metadata referring back to files on the drive that is “failing”.

Absolute_axolotl · July 24, 2021, 8:37pm

After a bit of investigation it may just be that the cable to the hdd is damaged. The reason i suspected that the drive was failing is on a few occasions recently when trying to log in the data dir has been missing so effectively no access to nextcloud.
ncp v1.37.0
nc 20.0.8
kernel 5.4.79-v7l+

The drive is a only used for nextcloud. Hopefully the cable replacement is all that’s needed but i think I’m going to go extra cautions and use rsync. As far as snapshots taking up too much room is there a way to remove some?
Thanks for the help it’s very much appreciated.

Absolute_axolotl · July 24, 2021, 8:51pm

make that
nc 20.0.11

eyduh · July 25, 2021, 1:53pm

The standard NCP sonfiguration should take care of that you don’t have too many snapshots using auto:

All your deleted data are stored in the snapshots for specific retention time that is preconfigured in NCP. These configurations are: one per hour, limit: 24,one per week, limit: 4, one per day, limit: 30, one per month, limit: 12 snapshots. So effectively you have one year retention time for your files in cloud.

Sauce: https://docs.nextcloudpi.com/en/how-to-backup-and-restore-using-nc-snapshot/

To see how your drive is doing spacewise you can use:
sudo btrfs device usage /mount-point
or
sudo btrfs filesystem df /mount-point

I’m no expert but I seem to remember somewhere saying that filesystems that use snapshots benefit from having at least 10% not used. But don’t quote me on that q: