De-duplication of file data across the entire system?

Is there any way to have Nextcloud de-duplicate data across the whole system? Even across various users? I know there is an extension that can ask you if you want to delete files within your personal file store that are duplicates – While useful, that’s not what I’m talking about.

Take the situation where multiple users have uploaded the same file or set of files to their personal file store. This is wasted space on the server – multiple copies of the same data. Is there a way to enforce a single instance of all unique files (determined by SHA-256 or better hashing) and then just use hard links or something similar to provide a copy to all users or components that reference that data with “copy-on-write” semantics that might also delete the backend data when no references remain? I…e., Single Instance Storage

My use case: I want to de-google my life and a big part of that is my gargantuan, hundreds-of-gigabytes photo archive. My wife and I share many of the same photos, because kids. LIterally the same photos, not just similar. It seems like the actual data contained in these photos should be able to be de-duplicated and stored once across the whole system, while still showing my wife and I our own copies of those photos.

that’s asking two different things. I don’t think you can have one copy of a file in two different accounts. just have all photos in one account, and share the top folder with your other user. within an account, you can get an app in the store to de-duplicate files in a single account.

Yeah, I am expecting NextCloud to do this at the system level, not some app, which would have to run as a user.

I’m imagining this working as hard links in a Linux filesystem with copy-on-write semantics.

This is quite a technically complicated issue, so I doubt it could be solved by Nextcloud itself. However you might be able to address this with the filesystem like ZFS’s Deduplication setting. Just note it can come with a hefty performance hit.

The more straight forward solution is structuring your file storage so users share copies of files through the share mechanism or have access to shared folders like group folders and manually moving/de-duplicating shared content with something like fdupes: How I’ve used it in the recent past

2 Likes