Use NextCloud as Data management solution for AI training

Hi all,

Have anyone use Nextcloud as a data management solution for AI training, for replacement of NFS. Says I have some imaging datasets, and I use Nextcloud to storing, viewing, or sharing with permission. Now I want to mount that dataset folder for actually training. Is that an OK solution?

The specs of my system is:
TrueNas: 256GB RAM, 40 core CPU, 150TiB storage capacity
NextCloud installed as TrueNas app

Thank you all

What are your requirements? If you directly access via webdav, that shouldn’t be a problem, is that fast enough?

If you look for best raw speed performance, other solutions are faster (NFS, SCP, …) or with sync over different devices syncthing, …

I have a large folder contains some datasets, about 100 TiB, hosted on Truenas Scale, access through LAN only (10GB connection).
Want to have a simple ACL solution for those datasets. For example give a specific user the write right but not delete… And also use the full 10G link speed.

I haven’t try NFS, somewhat overwhelm and dont know how to setup an ACL.
SCP and syncthing might not be a choice, since all the client PC don’t have a large storage.

I imagine a basic usecase like this.

  • User A want to train some network with dataset D
  • User A contact an adminstrator to have read permission on dataset D
  • User A then mount dataset D through webdav or anything direct to his/her code
  • User A then training directly from the mounted storage, without any copying.

Did you consider iSCSI?

I think he would run into problems with iSCSI because it sounds like he would need another share on top of it to manage the permissions and allow for multiple client access. NFS might be a good fit. Or maybe even SMB.

You could use Nextcloud for this, but it seems like an awful lot of excess features and complexity just to manage share permissions with LAN systems.

No problem, since He is already running TrueNAS: Adding an iSCSI Share |

OP:

Very slow, much overhead…

iSCSI is AFAIK the fastest solution.

I would give it a try!

I did use iSCSI for some other services which requires no permissions at all. I think iSCSI sharing is not capable of advance ACL, which is one of my requirements.
I used SMB, but without ACL, because I found that to config those ACL is not that easy. The sharing mechanism in nextcloud is very much like google drive and way more intuitive.
Is there any performance benchmark regarding webdav in nextcloud?

You can’t compare to the others, there is a database managed in the background, that takes resources. 10% of the network capacity should be easy, reaching 50% or more is getting more difficult (set up caching stuff according to your use case etc.).

What you can have is use NFS etc. natively when you need high-performance access. You then can use NFS as external storage in Nextcloud, if users just want to browse through data, add something etc.

1 Like

Except that iSCSI doesn’t fit his use case at all since he would have to still run NFS etc. on top of it to handle folder-based permissive access from multiple users/clients. iSCSI is designed for use by a single device or a server cluster and is very good for that purpose. Not so good for this.

1 Like

With all due respect, that’s just not true.

Could you show an example using iSCSI with ACL

I’m afraid it is, in order to do what he asked. He specifically said he wants user-level access control of sub folders, and iSCSI offers no such control. You’re going to send him on a wild goose chase with this suggestion.

iSCSI is a block level storage protocol, and the filesystem (as well as all the granular access control) is completely in the hands of the initiator. The only access control at the target is basically whether the connection is allowed.

Try it, then decide if you need something else.

1 Like

I’m going to, considering using Atlassian Crowd → openLDAP → Truenas NFS to handle permission. Will report here soon.