Terrible performance and reliablity

Nextcloud version: 32.0.6
Nextcloud Desktop version: 3.17 <

We’re running a Nextcloud instance with ~500,000 files (~1TB) using SMB as a storage backend. Nextcloud runs on a vm on the storage host. Networking is done via virtio.

We’ve had consistent issues with the desktop client being updated to a stable version yet having to be rolled back because of breaking bugs in the stable release on windows and since version 3.17 the linux clients have completely broken and refuse to download and upload properly.

After configuring a client it takes at least 40-50min. to use the client when using VFS.

Are these issues we would have in the Enterprise edition?

My assumption, and what I’ve expressed so far, is that SMB(CIFS)(-> IOPs) is the bottleneck and a native filesystem should be used instead, I don’t know how correct I am in saying this but this has been my experience with other products of a similar nature.

Coming from another “performance and stability” related thread, do you have APCu and Redis installed? They make huge difference. I can’t emphasize this enough. Redis solved performance and reliability issues for another poster who had issues that looked like they were client related.

There was batch of buggy Windows 3.x clients in the last two years but I run 4.0.6 on Windows and Linux and they’re fine. I ran full sync recently on both and it completed without any issues and as fast as the network allowed.

I have 1.1 TB but only 120k files though. Also, how many users do you have? I only have two users and my server (32.0.5) is LAMP on bare metal and local native storage on SATA SSD. I kinda doubt it’s a client problem.

Otherwise, my guess would be the SMB storage. Good luck.

@AdamAnon SMB/CIFS is the bottleneck, is my professional opinion too, because of IO overhead of the protocol and the behavior being very similar to that of a system with 1TB and very small files since the network load is also on and off around 100-200KB client side. I know Virtio can go up to 10Gbps, if it wants to(so Nextcloud to SMB/CIFS), which would mean the network load could go up to 1Gbps on the client network.

My experience in an unintended benchmark(it’s in same ballpark but I can’t remember for certain(numbers wise)):
If you try to change the ownership or list all files for example 65000K files(1TB) via a 10Gbps connection over SMB/CIFS it takes about 8-12 hours, doing the same on a native filesystem takes around 30s-30min. depending on your hardware.

I have around 20 users total and 10 concurrent users on average.

We have Redis and APCu in our environment.

I should note that we use MariaDB, so no SQLite , so that potential bottleneck is also not the issue.

4.0.0 Was broken, 4.0.1 had bugs that affected us, and 4.0.2 also had bugs that affected us just to name a few. Our Desktop clients all use VFS and since multiple people work on the same workstations, a full sync won’t work space wise, so we need the VFS functionality.

The few Linux clients we have, have been broken for this specific Nextcloud instance since 3.17 like I said.

Full sync presents no issues in terms of reliability and stability.

I’m still looking for a second professional opinion, to ensure it is what I think it is.

Note:
I’ve repeatedly asked to use WebDAV instead of SYNC for this use case, due it to less overhead and it being more reliable, but have been denied this request.

That makes sense I went from 3.17 to 4.0.5 so I skipped these early 4.x releases because that’s what I normally do and I’m not surprised that they were problematic. I also use MariaDB but my setup is simpler overall than yours.

From that quote it seems that SMB storage is indeed the bottleneck. I was warned in the past against mixing SMB with Nextcloud or any other native Linux solutions.

Good luck then as I don’t have anything else to contribute here. Take care!