Q: Nextcloud instance retire > archive all users all data > clean path forward?

timchipman · November 18, 2024, 12:44pm

Hi, I wonder if anyone can help me figure out a resolution for this question. I’ve dug in forums and google search and found various discussions but I can’t actually see how to do what I want / was hoping.

I setup a nextcloud instance for a local non profit I help with IT stuff about 6 years ago. It got modest use for about 3-4 years but due to staff turnover, change of ‘business flow’ and ‘preferred work tools’ they no longer use nextcloud at all. One user had been going into it maybe 3-4x per year to refer to an old file here and there from a prior member of staff / for ‘legacy archival access’ sort of thing.

So, I don’t wish to leave this nextcloud instance alive and on the internet as a security liability / and as a maintenance burden / since it is effectively now retired from service.

My goal is to make a “Good clean complete copy” of the full extent of all data folders for all users on the nextcloud instance. Ideally will be something as simple as possible, ie, so that a non-technical user may browse the file structure and then easily find anything-everything which was stored in the old nextcloud.

I know based on the nextcloud data dir size - we have about 50gig as total footprint of this nextcloud instance. This includes all users, all files, all versions, all trash cans.

I know for example one particular old staff user account has ~15gigs of data. I was able to setup and use the ‘rclone’ tool in a ‘nearby current debian VM’ and harvest all the data for that particular user account

but

there are about 20 user accounts

and this is kind of klunky and gross

Originally when I did the rclone attempt, I foolishly assumed this one user could see ~everything but now after doing more digging I realize

we have a few users with 10-15gig of files
many users with ~trivial use (0-100meg)
misc things in trash and version directories
and this all together adds up to the ‘total footprint’

So, I am wondering if someone can hint to me cleanest way to do this

Is there a way I may create a new “mega-access-migrate-data-user” who has inherently simply got all-data-access. Then I can do my rclone sync / harvest ‘latest copy of all files, all folders’ and that will give me what I want. (ie, I am not fussed about versions and trash)

OR

Is it really just a matter of me manually pulling the data tree out of my nextcloud instance, and then manually doing some data tidy up
we can see in the root of the nextcloud data tree - basically a list of (all users - each has their own folder)

and then inside every user I cache / files / files_trashbin
and I’ll just manually copy-mulch-move-simplify

so that my end product is an external USB HDD
with a structure of

USERNAME_1
USERNAME_2
…
USERNAME_10

and inside each username dir is – all the directories of stuff they worked on

Anyhow. End of the day I think I can just do this. But I am hoping to confirm there is no better cleaner nicer way to do this than a

WinSCP > File transfer pull from root user nextcloud data dir
Harvest to local dir on windows machine with USB storage attached
dump the data in there
once it is all tidied up > copy whole mess to a second drive, so we have no less than 2 copies of the archival data for ‘safe keeping’

Many thanks for help / if you have read this far.

Tim

Kerasit · November 18, 2024, 1:25pm

rsync is your best friend here. Rsync can filter on patterns, foldernames, and all such things, hence you can ensure to omit the cache folders, the appdata folders and so on, but ensure to still copy the rest of the relevant files.

timchipman · November 18, 2024, 1:51pm

OK. Thank you. Is it correct for me to assume that the ‘nextcloud data reporistory’ tree will basically contain all the files / and if I use (Rsync or WinSCP or whatever) method to get them (onto an NTFS filesystem USB HDD External device) - the files will be there, complete, and ‘just work’ ?

In the past I had the vague impression memory that ‘actual file data’ was hidden from sight on linux/back end of nextcloud; and all we had was metadata and pointers and a complex file<>lookup structure. But this does not appear to be the case. So that is good. But figure I should sanity check.

thank you!

Tim

vawaver · November 18, 2024, 2:31pm

Hi Tim,

I recently handled a similar situation during a migration of a Nextcloud AIO instance from one server to another. While my setup might not match yours exactly, I hope my approach can offer some insight or be a possible solution for you.

First, let me mention that I’m a Linux user, and this process turned out to be quite straightforward on a Linux system. However, I understand that your options might differ depending on your environment, so take this as one potential way to approach the task.

My Approach

Direct Data Copy:
I accessed the Nextcloud data directory directly on the server and used Linux tools like rsync to back up the files to a local disk. This allowed me to exclude unnecessary folders like files_versions and files_trashbin during the copy. It kept things clean and focused on the latest versions of user files.
Filesystem Considerations:
For the backup, I used a Linux-native filesystem (like XFS) on the local disk. I avoided NTFS entirely since it doesn’t preserve Linux-specific attributes and could introduce potential issues when handling metadata or symbolic links. Depending on your setup, you might want to ensure the filesystem you use is suitable for your needs.
Backup Tool Alternative:
If your Nextcloud instance supports it, you might consider using Borg Backup. This tool is available in newer versions of Nextcloud and can handle incremental backups efficiently. If your instance is older or doesn’t support Borg, direct copying or another backup tool might be necessary.
User Structure:
The Nextcloud data directory usually has a straightforward structure where each user’s data resides in their dedicated folder. After copying, you can organize these folders for better accessibility, making it easier for non-technical users to navigate.

A Guide You Can Refer To

I’ve written a guide on migrating data in a similar context, which might help you better understand the process:
Migrácia Proxmox VM na nový VM

The guide is in Slovak, but you can use AI translation tools to easily translate it into your preferred language. While the guide focuses on Proxmox, the principles for managing and migrating data are applicable here as well.

Unanswered Questions

It’s unclear what type of Nextcloud instance you’re running (AIO, Snap, Docker, or manual install), or what tools are available in your environment. If you’re not using Linux, some of these steps might need adjustment. Knowing this information could open up additional possibilities or simplify the process further.

A Simple Suggestion for Your Case

If you can access the Nextcloud data directory directly on the server:
- Use rsync or a similar tool to copy the data to your local storage.
- Exclude unnecessary subdirectories like files_versions and files_trashbin to keep it clean.
If the Nextcloud instance supports Borg Backup, it might be worth exploring as it simplifies archiving and compression.

I hope this helps, and feel free to share more details about your setup if you’d like tailored advice.

timchipman · November 18, 2024, 3:01pm

Thank you! This is all super clear and very good / much appreciated

My setup is
LXC Container running on proxmox
since it is a few years old it is a bit stale
sounds like I basically want to harvest out the data from the nextcloud data dir, and that will give me all the data. Optionally using rsync or similar to pull desired/exclude unwanted.

since we’re going to a trival holding tank config I will probably do a one-shot pull of the data tree using winscp and dump to NTFS local disk on a ‘data migration computer’ and that should be fine

I’ll do a sanity test (smaller subdirectory first) pull to make sure things all go to plan before doing the ‘real full bigger pull’

I’ll post a followup here to confirm once all done and good / etc.

thank you!

Tim

system · February 16, 2025, 3:01pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.