Symbolic link support

Please take care here:

there are 2 ways in which symlinks can be supported with a cloud storage sync system.

  1. The first one is to store the link without ever dereferencing it.

    • It is the approach asked for in my previous post #18 (Symbolic link support)
    • It is by definition totally safe, can be supported on a per-client basis.
    • Can be useful to users who use symbolic links as a way to organize stuff and provide shorthands within document sets organized with an internal directory structure or that must obey specific rules about where things must be found (LaTeX projects, source code, linked spreadsheets, etc.)
    • It should be relatively easy to implement.
    • It is consistent with what popular software like git does with symbolic links.
  2. The second one is dereferencing the links to let stuff lives outside of the directory that is meant to be synced to be synced anyway.

    • it works by providing a symbolic link to the stuff to be synced inside the directory under the control of the synchronization software
    • It implies some dereferencing, that may inherently be troublesome.
    • It is the Dropbox way of doing things. But it is at best quirky (since the sync client cannot be notified of changes happening outside the directory that is meant to be synced, which means that the sync operation cannot be reliably started right after a file is changed). There are a lot of messages on the web about this issue, and Dropbox itself discourages using it (https://www.dropbox.com/help/syncing-uploads/selective-sync-preferences-wont-update#symlinks).
    • IMHO, now that Nextcloud/Owncloud support multiple sync directories, it is not a very interesting feature, since a similar behavior can be obtained in a more robust way.

When asking about the symlink feature, I would like to recommend making clear which one of the two behaviors is being asked for. Ideally, there should be two independent feature requests for the two.

4 Likes

@callegar That is a great breakdown between the two.

Personally I am looking for the first scenario. All references are made within the sync directories.

Cheer,

I had several terabytes of files that I wanted to share with users on my new installation of Nextcloud. I did not want to upload them (and duplicate them). I did not want to move them from their current location.

My user folder in the data directory just has a symlink or two to the volume with the files. It looks nice and works well. I end up having to edit Storage/Local.php (or whichever one it is) with each upgrade, but it’s a single-line edit (switching a false to true).

Of course, with 12, it’s complaining that the hash doesn’t match what it’s expecting. Wish I knew how to turn that off.

There are all sorts of use cases for symlinks, limiting it to arbitrary use cases is just misguided. If there should be any limitations at all, those should be up to the admin, maybe on a per-user basis.

2 Likes

I do not think that the previous suggestion to make clear what is being asked is limiting to arbitrary use case.

The point is that there is a fundamental difference between two scenarios: one where links are never dereferenced (aka followed) by either the client or server of nextcloud, which is inherently secure, and one where they need to be dereferenced (followed, which I understand is your use case), which can be troublesome in terms of security, resource usage and timelessness of sync and needs thus to be handled with more care.

To mention a few potential issues:

  • How if, deep in your data, you end up having a symbolic link to a place with private sensitive stuff of yours? You may get that because your data is music and someone, maliciously, gives you a directory of mp3 files and you do not notice that one of the file is not a file, but a link to ~/Private. If links are dereferenced, immediately you start sharing sensitive data with many users. If users can upload symbolic links, the implications can be even more serious.

  • How if, deep in your data, you end up having a cycle of symbolic links: A -> B -> C -> D -> E -> A. Unless some (possibly costy) algorithm is added to the sync code, the latter may start looping on dereferencing this cycle rather than doing what it should do.

  • How if your sync dir contains a link to a volume, as you say, and a file in the volume is changed. In many situations the sync software cannot be reliably notified of this change, and the syncing of the modified file may start with a significant delay.

This is not to say that your usage scenario is not interesting (even if I think that in some cases the possibility to sync multiple folders can substitute for your usage of symbolic links). It is to say that only users who know what they are doing should IMHO activate it. Dropbox itself (that only supports the scenario you are talking about) actively discourages relaying on it because it can end up in bad surprises for users who do not fully understand what they are doing.

This is why, IMHO, the two cases should be considered separately and, if both implemented, have independent and mutually exclusive ways to enable them.

As a final clarification, Nextcloud currently implements (but disables by default for security reasons and due to the hash mismatches) the scenario you are talking about, but does not implement the one (never dereference) that I am talking about.

2 Likes

Hey,

After reading this thread I am not quite sure what to gather. Are symlinks something NC are concidering supporting? And not just on *NIX, on Windows as well.

Thanks!

I am also very interested by this feature. Is there any plan to support it? I want it especially for Unix systems!

2 Likes

From an efficiency standpoint there’s little reason not to implement symlinks.
The cost of the algorithm is O(n), where n is the number of symlinks stored. It’s reasonably fast. In fact optimizations can be made if you don’t assume that every single symlink participates in a trip through the graph. This leads to the algorithm being limited by the maximum number of nodes in a connected subgraph.

From a security standpoint, that’s a whole other can of worms.

Yes, please support symbolic links!! (with no dereferencing) We are trying to migrate our lab’s infrastructure to NextCloud, and the fact that symlinks aren’t copied at all breaks a lot of our workflow. :frowning:

Just read through the topic. Jep vote for symlink without dereferencing as well.
Reaaaally nice would be, if on Windows clients they would be created as .lnk files automatically and on Apple systems as their pendant respectively.

An optional feature to dereference symlinks would to nice of course, e.g. allow automatically dereference symlinks, if they link to something outside the data directory. If they are just used as internal structure tool, of course dereferencing doesn’t make any sense at all, as at least two copies would be synced.

  • But besides the already mentioned issues, also how does the client know, that it doesn’t have to sync back the non-link file, that is on the server then? Would need some client side stored info, about which files on the client side should stay links and which (different) files to sync back from server.
  • Also what happens then, if the file is changed server side? Sync back the changed file to the client, which then has two copies, the older one, that was linked to, as well :thinking:? Nice would be if the symlink stays, but the target file is changed… BUT uhh, impossible or at least unreliable due to permissions and an even way worse security risk than mentioned above.
  • No idea how Dropbox handles these questions, but the one way or the other, there will be heavily surprised end users…
  • From my point of view, because of this issues and questions and as due to its complexity, users might expect it being resolved differently, than the software does, symlinks should be never dereferenced. If one needs to have a file available on two locations client side, place the real file inside Nextcloud data dir and the link outside.
1 Like

I suggest doing something like this in small steps. The original use case this was to support internal symlinks (aka no server-side de-reference), so do that first.

The implementation should be kept to this minimum, plus the notes about validating symlinks to ensure they are inside the sync directory (see my other comment). By doing it this way you cover the majority of use cases for symlinks while also minimising headaches for users.

The server-side de-referencing idea above is interesting but would need much more thought, and probably will take much longer to code and validate and may cause more support tickets being raised when it goes wrong. See how hard the first stage is to implement before taking this larger piece of work on.

Hello, I find this comment very clarifying. I understand that there are different use cases that might benefit from either implementation. I just restate the symbolic link processing strategies:

  • TYPE A: No dereferencing
  • TYPE B. Client dereferencing

Some opinions:
I came because I needed the type B (deref). I implemented a poc. It is a poc since it does not address the issues you mention (loop check, file system notifications).
I agree that the multiple sync folders functionality, partially solves this issue, but again it has the problem of copying the data in the synced folder under the root folder. With symbolic links you can preserve your documents organization and just symlink the stuff you want to publish under root, without wasting space.
Regarding security, indeed, this should be added as an advanced opt in option, targeted to power users, who want this conveniency.
Also I can see that it should be more practical to have this option per synced root (per client).
Note, that this is a client implementation. The server does not know anything. Also clients that are on platforms no supporting symlink, they too, not do any special processing.

Regarding type A (noderef), I can see use cases, but I think it should be limited to symbolic links targeting paths inside the sync root. To distribute links that link outside the root is not so practical, since:

  1. The paths environment might be different from client to client (e.g. client a has folder /opt/data/, client b does not). If you need a client to modify a path outside the sync root, with type B (since you know what you are doing) you can add a symlink and only to the environments you want to (e.g. only on client a and not on b)
    b. There is no cross platform meaning for e.g…" C:\data" vs “/home/data”

Also I consider this to be more difficult to implement, since you need to modify client and server to be aware of a new file type. In addition, client should consider all platforms specifically. Also you need to sanitize that links will be pointing inside the sync root.

And like you said, if they are to be implemented both, they should be mutually exclusive. Also the client should handle various corner cases

  • client a type A, client b type B =>
    • handle name clashes, on both ends, ignore if type mismatch,
    • b skips remote links,
  • client a type A, client b no link processing
    • b skip remote links
  • client a type A, then change setting to no processing
    • what do we do on remote? Do we delete the links ? Just leave them ? (maybe this suggests that link setting should be a remote setting shared by all clients ?
  • Another though, links are first classified as external (linking outside the sync root) and internal. Then the type A setting applies only to internal and type B only to external. Again, what if I change one link from external to internal ?

Brain damage…

Anyway I might improve the type B thing.

Hello,

Just another view: I am using the fact that nextcloud ignores links as a feature to embed GIT repositories inside directory trees synchronized with nextcloud. In order to avoid issues it is best to avoid that nextcloud updates files in GIT repositories.

  • One solution would be to add every GIT repository in the list of ignore files for nextcloud. But that is not efficient if you have many repositories.

  • The other solution I adopted was to create the original GIT repositories in a hidden subfolder (eg. “.subfolders”) and then use symlinks to present the GIT repositories in the nextcloud tree where I want them to be.

In case nextcloud would be “upgraded” to deal with symlinks, I would like to have a setting allowing to require nextcloud to ignore them (as now).

Hi,

I would like to emphasize the point again that symlinks used to be synchronized. When this feature was removed it totally broke my setup so I’m still using an ancient version of the desktop client.

@cgraefe what is the last version that supports symbolic link?

This is a long thread. But as I understand it you’re discussing allowing and managing filesystem symlinks as made by ls -s ?

In my opinion I think it would be better to introduce internal links, similar to shares. This way we solve security issues as well as access control. A bonus is that we are not dependant if the underlying filesystem supports symlinks.

We already support links to individual objects. We do need a way to represent them as a file-links, and possibly deal with conversion of symlinks to NC links.

For example:

The link could be stored in a plain file.link containing something like this, but preferably in xml or json.

[NC16 link] 
https://example.com/index.php/f/167231
2 Likes

Dropbox has this solved somehow. Maybe check how they do it?

Symlink support is essential for many non-trivial use-cases on both MacOS and Linux. It’s not a magical thing. I’m honestly surprised that Nextcloud doesn’t support them, this should be flagged as a bug.

Again, other cloud storages support symlinks, so why not copy their method?

5 Likes

Well, you may try it with the Link editor app, which is more about external links to the websites, however you may use if for this specific case.

1 Like

I was searching for a way to use symlinks in the web interface and I ended up trying “Link editor” app.
Two problems:

  1. It works only on the web interface, so the links become useless on native file managers.
  2. Nextcloud sees it as an external link, so instead of loading with Ajax the folder or the file pointed, it reloads the entire webpage.
1 Like

They have an open source client? With a compatible license?

1 Like

As a sidenote nextcloud symlinks (with or without real filesystem symlinks) would be essential to be able to create a normal picture albums app where the the albums may include images in file storage, or several albums include the same image with different permissions, sharing attributes and comments.

The other album solution is basically using the database to create a symlinking service just for the albums, which sounds like a waste of time implementing a common useful feature for just one app; it would be a generic useful feature to share groups of files with different groups of people, having separate comments and chats.

2 Likes