Symbolic link support

I understand why the symlinks have been disabled. And Iā€™m rethinking the way I do and organise things. But Iā€™m encoutering a problem. I canā€™t modify anymore the concerned folders through the web interface or through the sync client. I tried to suppress these symlinks directly on the server, but it seems that they still are in the mysql database, so client keeps restoring folders I try to delete.

1 Like

Ok. First, supress the symlinks. Then sudo -u www-data php occ files:scan user_ID (for debian).

Hi folks, +1 from me on this one.

Symlinks may seem like a gimmick but for some of us theyā€™re hugely useful. Can I make some implementation suggestions:

  • no dereferencing server-side, only client-side
  • symlinks have to point to other files inside your Nextcloud data dir (which to me means relative links only, and no use of ā€˜ā€¦ā€™ to escape the data dir) - so the server would necessarily need to validate this when a new link is uploaded/modified on the server

I believe these will avoid most security issues, including the /etc/passwd suggestion above. The only vector this doesnā€™t handle is loops: If some malicious user creates an infinite loop of symlinks the apps reading this on the clientā€™s operating system should handle this anyway, as these could be created on systems not running Nextcloud.

Other apps - notably git - are making use of these and now that NTFSā€™s support for symlinks is more public its usage will only increase under Windows. Itā€™d be a great feature to have and might open up some new possibilities for how Nextcloud can be used. Building symlink support in will also allow Nextcloud to understand symlinks that come in from External storage providers.

J

3 Likes

I also agree - it wqould need to be supported securely, but this is an excellent feature.

2 Likes

+1 to this.

I had proposed a subset functionality. It may not be useful for this backup situation, but it is related tot he main topic of symbolic links.

Hi, thanks for Nextcloud!

I would like to know if there is any news on the state of symlink support wrt the sync client.

Beforehand, I would like to clarify:

  1. In talking about symlinks, I am not interested in them in view of using nextcloud as a backup strategy.

  2. I am interested in storing symlinks ā€œas they areā€, with no dereferencing what so ever. With no dereferencing, symlinks cannot be a security risk.

  3. I know that not all platform support symlinks equally well. To me, it would be OK to have the sync client

    • Recognize the existence of a symlink on the client host and store it in the server as a file. For instance, if I have a symbolic link, such as foo.txt being a symbolic link to ../bar.txt, I would be happy of seeing on the server a foo.txt file with content symlink -> ../bar.txt. Then, the other way round, when getting something that is on the server, but not on the client, recognize if the item to be synced is a file whose content matches the symlink magic, and (i) if so; and (ii) if on a platform supporting symlinks, change the file on the client into a symbolic link.

Rationale

  1. To me symlinks support is critical to the storing and sharing of stuff in next cloud, because they are a part of how data is made accessible and searchable in large directory structures. Furthermore, when certain LaTeX documents are stored in a directory structure, they are functional to assure that LaTeX then finds things where it expects them to be, without having to duplicate files. When certain spreadsheets are stored in a directory structure, they assure that linked spreadsheets are found where they are expected to be without duplicating files on the client. When some code is stored in a directory structure, symlinks may be used to assure that the makefile finds things where they are expected to be.

  2. The behaviour described in point 3 above: is not disruptive; it can be made switchable on/off per client; does not require any support in the server (just in the sync client); can be acceptable for clients running on OSs that do not support symlinks; is compatible with webdav; lets the existence of symlinks be discovered and explored easily with the web interface since they are stored as regular files, but easily recognizable by their content for having a special function; by having no dereferencing neither in the server, nor in the sync software does not pose any security issue.

  3. The behaviour described in point 3 above is consistent with the symlink support provided in other pieces of software (e.g. git for windows, aka msysgit) on platforms that do not have good support for symlinks.

5 Likes

Posting to this thread to add my support for symlinks because they are immensely useful. Security considerations can be easily addressed so not a valid reason to avoid implementing this feature. Also, Dropbox supports symlinks.

2 Likes

@miguelg Are you sure dropbox supports symlinks?

Also, here is a link to a feature request that is related to symlinks

Iā€™m definitely sure that Dropbox supports it, @kmcb. Hereā€™s a screenshot that proves it:

1 Like

Please take care here:

there are 2 ways in which symlinks can be supported with a cloud storage sync system.

  1. The first one is to store the link without ever dereferencing it.

    • It is the approach asked for in my previous post #18 (Symbolic link support)
    • It is by definition totally safe, can be supported on a per-client basis.
    • Can be useful to users who use symbolic links as a way to organize stuff and provide shorthands within document sets organized with an internal directory structure or that must obey specific rules about where things must be found (LaTeX projects, source code, linked spreadsheets, etc.)
    • It should be relatively easy to implement.
    • It is consistent with what popular software like git does with symbolic links.
  2. The second one is dereferencing the links to let stuff lives outside of the directory that is meant to be synced to be synced anyway.

    • it works by providing a symbolic link to the stuff to be synced inside the directory under the control of the synchronization software
    • It implies some dereferencing, that may inherently be troublesome.
    • It is the Dropbox way of doing things. But it is at best quirky (since the sync client cannot be notified of changes happening outside the directory that is meant to be synced, which means that the sync operation cannot be reliably started right after a file is changed). There are a lot of messages on the web about this issue, and Dropbox itself discourages using it (https://www.dropbox.com/help/syncing-uploads/selective-sync-preferences-wont-update#symlinks).
    • IMHO, now that Nextcloud/Owncloud support multiple sync directories, it is not a very interesting feature, since a similar behavior can be obtained in a more robust way.

When asking about the symlink feature, I would like to recommend making clear which one of the two behaviors is being asked for. Ideally, there should be two independent feature requests for the two.

4 Likes

@callegar That is a great breakdown between the two.

Personally I am looking for the first scenario. All references are made within the sync directories.

Cheer,

I had several terabytes of files that I wanted to share with users on my new installation of Nextcloud. I did not want to upload them (and duplicate them). I did not want to move them from their current location.

My user folder in the data directory just has a symlink or two to the volume with the files. It looks nice and works well. I end up having to edit Storage/Local.php (or whichever one it is) with each upgrade, but itā€™s a single-line edit (switching a false to true).

Of course, with 12, itā€™s complaining that the hash doesnā€™t match what itā€™s expecting. Wish I knew how to turn that off.

There are all sorts of use cases for symlinks, limiting it to arbitrary use cases is just misguided. If there should be any limitations at all, those should be up to the admin, maybe on a per-user basis.

2 Likes

I do not think that the previous suggestion to make clear what is being asked is limiting to arbitrary use case.

The point is that there is a fundamental difference between two scenarios: one where links are never dereferenced (aka followed) by either the client or server of nextcloud, which is inherently secure, and one where they need to be dereferenced (followed, which I understand is your use case), which can be troublesome in terms of security, resource usage and timelessness of sync and needs thus to be handled with more care.

To mention a few potential issues:

  • How if, deep in your data, you end up having a symbolic link to a place with private sensitive stuff of yours? You may get that because your data is music and someone, maliciously, gives you a directory of mp3 files and you do not notice that one of the file is not a file, but a link to ~/Private. If links are dereferenced, immediately you start sharing sensitive data with many users. If users can upload symbolic links, the implications can be even more serious.

  • How if, deep in your data, you end up having a cycle of symbolic links: A -> B -> C -> D -> E -> A. Unless some (possibly costy) algorithm is added to the sync code, the latter may start looping on dereferencing this cycle rather than doing what it should do.

  • How if your sync dir contains a link to a volume, as you say, and a file in the volume is changed. In many situations the sync software cannot be reliably notified of this change, and the syncing of the modified file may start with a significant delay.

This is not to say that your usage scenario is not interesting (even if I think that in some cases the possibility to sync multiple folders can substitute for your usage of symbolic links). It is to say that only users who know what they are doing should IMHO activate it. Dropbox itself (that only supports the scenario you are talking about) actively discourages relaying on it because it can end up in bad surprises for users who do not fully understand what they are doing.

This is why, IMHO, the two cases should be considered separately and, if both implemented, have independent and mutually exclusive ways to enable them.

As a final clarification, Nextcloud currently implements (but disables by default for security reasons and due to the hash mismatches) the scenario you are talking about, but does not implement the one (never dereference) that I am talking about.

2 Likes

Hey,

After reading this thread I am not quite sure what to gather. Are symlinks something NC are concidering supporting? And not just on *NIX, on Windows as well.

Thanks!

I am also very interested by this feature. Is there any plan to support it? I want it especially for Unix systems!

2 Likes

From an efficiency standpoint thereā€™s little reason not to implement symlinks.
The cost of the algorithm is O(n), where n is the number of symlinks stored. Itā€™s reasonably fast. In fact optimizations can be made if you donā€™t assume that every single symlink participates in a trip through the graph. This leads to the algorithm being limited by the maximum number of nodes in a connected subgraph.

From a security standpoint, thatā€™s a whole other can of worms.

Yes, please support symbolic links!! (with no dereferencing) We are trying to migrate our labā€™s infrastructure to NextCloud, and the fact that symlinks arenā€™t copied at all breaks a lot of our workflow. :frowning:

Just read through the topic. Jep vote for symlink without dereferencing as well.
Reaaaally nice would be, if on Windows clients they would be created as .lnk files automatically and on Apple systems as their pendant respectively.

An optional feature to dereference symlinks would to nice of course, e.g. allow automatically dereference symlinks, if they link to something outside the data directory. If they are just used as internal structure tool, of course dereferencing doesnā€™t make any sense at all, as at least two copies would be synced.

  • But besides the already mentioned issues, also how does the client know, that it doesnā€™t have to sync back the non-link file, that is on the server then? Would need some client side stored info, about which files on the client side should stay links and which (different) files to sync back from server.
  • Also what happens then, if the file is changed server side? Sync back the changed file to the client, which then has two copies, the older one, that was linked to, as well :thinking:? Nice would be if the symlink stays, but the target file is changedā€¦ BUT uhh, impossible or at least unreliable due to permissions and an even way worse security risk than mentioned above.
  • No idea how Dropbox handles these questions, but the one way or the other, there will be heavily surprised end usersā€¦
  • From my point of view, because of this issues and questions and as due to its complexity, users might expect it being resolved differently, than the software does, symlinks should be never dereferenced. If one needs to have a file available on two locations client side, place the real file inside Nextcloud data dir and the link outside.
1 Like

I suggest doing something like this in small steps. The original use case this was to support internal symlinks (aka no server-side de-reference), so do that first.

The implementation should be kept to this minimum, plus the notes about validating symlinks to ensure they are inside the sync directory (see my other comment). By doing it this way you cover the majority of use cases for symlinks while also minimising headaches for users.

The server-side de-referencing idea above is interesting but would need much more thought, and probably will take much longer to code and validate and may cause more support tickets being raised when it goes wrong. See how hard the first stage is to implement before taking this larger piece of work on.

Hello, I find this comment very clarifying. I understand that there are different use cases that might benefit from either implementation. I just restate the symbolic link processing strategies:

  • TYPE A: No dereferencing
  • TYPE B. Client dereferencing

Some opinions:
I came because I needed the type B (deref). I implemented a poc. It is a poc since it does not address the issues you mention (loop check, file system notifications).
I agree that the multiple sync folders functionality, partially solves this issue, but again it has the problem of copying the data in the synced folder under the root folder. With symbolic links you can preserve your documents organization and just symlink the stuff you want to publish under root, without wasting space.
Regarding security, indeed, this should be added as an advanced opt in option, targeted to power users, who want this conveniency.
Also I can see that it should be more practical to have this option per synced root (per client).
Note, that this is a client implementation. The server does not know anything. Also clients that are on platforms no supporting symlink, they too, not do any special processing.

Regarding type A (noderef), I can see use cases, but I think it should be limited to symbolic links targeting paths inside the sync root. To distribute links that link outside the root is not so practical, since:

  1. The paths environment might be different from client to client (e.g. client a has folder /opt/data/, client b does not). If you need a client to modify a path outside the sync root, with type B (since you know what you are doing) you can add a symlink and only to the environments you want to (e.g. only on client a and not on b)
    b. There is no cross platform meaning for e.gā€¦" C:\data" vs ā€œ/home/dataā€

Also I consider this to be more difficult to implement, since you need to modify client and server to be aware of a new file type. In addition, client should consider all platforms specifically. Also you need to sanitize that links will be pointing inside the sync root.

And like you said, if they are to be implemented both, they should be mutually exclusive. Also the client should handle various corner cases

  • client a type A, client b type B =>
    • handle name clashes, on both ends, ignore if type mismatch,
    • b skips remote links,
  • client a type A, client b no link processing
    • b skip remote links
  • client a type A, then change setting to no processing
    • what do we do on remote? Do we delete the links ? Just leave them ? (maybe this suggests that link setting should be a remote setting shared by all clients ?
  • Another though, links are first classified as external (linking outside the sync root) and internal. Then the type A setting applies only to internal and type B only to external. Again, what if I change one link from external to internal ?

Brain damageā€¦

Anyway I might improve the type B thing.