Maybe my remarks are obsolete, just in case, my Version: Nextcloud Hub 3 (25.0.10)
I was annoyed by a bug in the communication protocol between (at least) the android client App and my server (8 vCPU + 32 GB + bis 500MBit/s) while uploading big files, like videos. I suppose it occurs because the client comes in timeout before the server transfer the Ack. Then the client retries and endlessly creates copies because I configured the error management so: “retry and keep both version if a file with the same name exist by renaming with (2)” and I do not want to change this.
I came into creating a small script which listed all files in data and based on md5, sha256 and size, identified duplicates.
My script scanned all files and detected that from my 500GB of data… 100 GB (20%) where lost because of such duplicates. There are also files from 2 handy clients from users which synchronizes there whatsapp picture folders, but the impact is minimal, because no endloop and the files are compressed (by whatsapp).
Then I explored a bit and I came with a couple of suggestions. Do what you want with them, if you want I can also copy them elsewhere if it is more appropriate.
- continuous: Backend stores md5 and sha256 of files. They can be compared by the client with there local copies if a file with the same name already exists, to avoid unexpected duplicates
- daemon: create a size optimizer daemon, which compares (size+md5+sha256) all files (independently with the owner) to detect duplicates. When one is found, it is moved into a data_duplicate folder. This file would then be pointed at with symbolic links in place of the actual duplicates. The daemon would also clean up by removing files of this data_duplicate folder which are no more pointed at by symbolic links
- It would be nice to have the option “skip trash” when deleting a file, with a confirm-overlay containing a “don’t ask again today checkbox”. → When one tries to get more storage again and go through several folders, it is annoying to remove, then go to trash and delete again.
- It is not possible to sort by size (to start completely removing the biggest one, for example)
- it is not possible to search easily for a file, except if one knows the EXACT name including its extension (no wildcard/regexp/partial/per type option)
- if a sharee (not the owner but the one becoming the share) deletes a file, then this file seems to be in both trash of the share and the owner. Maybe only the owner should see the file in his trash, or some tricks should avoid duplicating the file on the system for every user (symbolic link?). I did not check the behaviour with 10 sharees, if all of them becomes this duplicate…
- The search overlay which hides the last columns (shared status, size, last modified) of the first 5 lines of results. A “quick filter” line (between title line and the first element) additionally to the current search might be appropriate. It would offers appropriate search options for the column (string search in Name column, size bigger and/or smaller than, modified before xx and/or after xx, shared yes/no checkbox). This filter would not go through the children folders, but only the current folder.