Nextcloud taking 6 hours to sync 600,000 files over 900mbs

What is the file limit for nextcloud?, I have to share folders with more than 600,000 files, and it is quite frustrating to have to wait up to two hours (with an Internet of 900Mpbs in upload) to make the synchronization, I have migrated from google drive and the difference is huge when making the first synchronization.

My server is in GCP, the storage is compatible with s3 I use ldap, and I see that despite having many resources the load average reaches 80, slowing down the whole service.

This is like compairing apples and bananas.

When you upload or download files, there is many infrastructure bits involved. One thing is the handling of the file itself by the webserver (taking resources, is it the fastest possible web server app available, and has it been tweaked correctly), but you also has a database and a cahce and file locking services (hopefully). All of these elements need to play nicely together to consume files.

Google has thousands of engineers, distributed and load balanced services, split infrastructure with the most optimal SAN and file storage servers dedicated to disc operations and a tweaked infrstructure tailored to perform optimal for each component (load balancers, databases, web servers etc) and so on. Unless you have a similar setup, it is impossible to compete google, AWS, Azure or similar SaaS storage solutions.

If you run everything on a single virtual machine, you will never get a performance that will ever come close to rival any of the above.

3 Likes

Let’s see:

4 machines with 8 cores (cluster on demand) 300 users at the same time.
SSO or ldap and load balancing
1 multiregion postgres as a service with dynamic growth (currently the database is about 80GB)
1 bucket as standard infinite dedicated storage + x read-only file type buckets.
1 redis server for cache.

and everything is in GOOGLE GCP

Google drive does not do this same type of sync, practically if you share 1 PB it appears immediately to the user, the way nextcloud agent sync for pc is more like how onedrive does it and it is supremely frustrating with many files.

I just wanted to ask if there are limits, but I see that it is more expensive to do your own implementation of an opensource software than to hire licenses with a dedicated vendor for this.

a folder with 800,000 files takes up to 6 hours to display, I know it’s absurd, but I have users who need to consult certain documents from time to time.

No. It is as cheap or expensive as you decide. However there is a price of performance. This is not nextcloud issue more than it is a matter of expectations to a piece of software where what you get is bits and bites, and has to provide all the computing power and middleware yourself, unless you use the AIO appliances ofc. When using any SaaS everything - including the underlying infrastructure - is part of the product.

1 Like

That is because they use ALOT of in memory data and uses a lot of other tricks to deliver stuff to you emmediatly. In fact their service work in a specific way, where it “intelligently” chooses which files to upload directly to the cloud, and which to keep storing “locally”, but where - if you needs it - it will fetch that one file between two seperate devices. This trick is also something they use to almost emmediatly index files and share that index to all connected devices, including meta data. The transfer of files in Google drives make use of many cool tricks like a combination of the p2p method, transfer in the background, store temporary in memory and eventually dump to disk when the disks are ready. GDrive will gradually empty memory but uses some machine learning to keep a constant index aswell as most accessed files decrypted in memory for fast access.

Again. Thousand of engineers. You do not have, nor can it be expected to anyone, the needed infrastructure and expertise to maintain the needed middleware and microservice infrastructure to run such a setup. It will also introduce som many potential sources of errors that operating a Nextcloud instance will be unbearable.

There are Nextcloud enterprise customers with nealy the same feel of performance than those of OneDrive and Google drive, but as with everything, it is then implementations which has been tweaked and tuned over time.

This is nonsense. If I share 1PB already uploaded to Nextcloud with anyone, it is emmediatly available. What can take time, is loading the webpage that will show you the files on screen, if all those 1PB is stored in same level.

It sounds like the features that suits you is more Google Drive than Nextcloud. The features you are specifically mentioning is in how the clients are designed. Again, read my post: Thousands of engineers and it is basically just working like a torrent file. Nextcloud and Microsoft has focused more on file inetgrity whereas Google has focused on speed and simplicity.

2 Likes

Sometimes software is not optimized. If you have 100.000 files in a directory and the software must first read the list of all files, that may be bad. I do not know if Nextcloud differ between 100 or 100.000 files. Maybe Google makes it better.

1 Like

You can definately say that it is not adviced to store that many files in same level anyway. Even if you did an OS native simple listing of the files from a console, it would just spew out lines on your sceen for a long time in a shell console. And here we are talking as native we can get to the raw data.

There are tradeoffs with all design choices. Nextcloud is designed in a certain way and Google drive in another way. I will be honost though. I do not use Nextcloud privately or operates it for a non profit NGO because of performance. I do that because of all the features Nextcloud delivers, including total control of the data.

1 Like

Okay, thank you very much, I will continue to insist on the product, I think it is still an interesting solution, but I have exhausted all resources to make it competitive enough with other storage solutions. about my particular case where I have groups of users with more than 600,000 files.

I wanted to clarify that:

The files are already in nextcloud, example.

a user already saved 800,000 files without problems, but he shares these 800,000 documents with another user.

This user will take more than 6 hours to sync the 800.000 files.

Ah. That is indeed expected.

When using the clients, it will actually download the files to all devices where these are available. This is default behavior. It can be configured to not store files locally in certain situations, hence will only “pull” a file if accessing that file. You should read the documentation for the Nextcloud clients and plan and sesign a setup that fits your need. It make sense though that it take that long in that use case without specifically configuring it to not do it.

3 Likes

What the computer does:
Open file, read it, write it to client, close file, close file on client, update database, write logs, and many more steps in the background, opening closing connections, SYN, ACK, … you name it.

Do that with one 1GB file and it will happen way faster than 1.000.000 1kB files because there are 1.000.000 operations necessary.
Many small files is always slower than just one big file taking up the same amount of disk space. You can even test and time this on your computer just copying without any software inbetween.

3 Likes

That is clear to me, thank you very much for talking about it, what I mean is that the client is quite slow, it first counts the files then synchronizes…in fact I had to go back twice with more than 1000 users to version 3.9 because version 3.10 does not work well with S3 compatible.

Webdav is faster for this kind of cases, but microsoft insists on eliminating it.

For now the only thing I can do is to let the client choose what to synchronize, a task that becomes complicated for the user if he does not do this task before synchronizing.

My conclusion is that the Desktop client is a product that has a long way to go in a productive environment.

Apples and oranges, because with webdav the files stay on the server and only the files that you actually use need to be processed/transmitted. Like on a classic Samba share, where you would never synchronize 800,000 files to your local machine in order to edit one of them.

In what “productive” environment do users need 800,000 files locally available at all times?

The Google client is faster because it only downloads placeholders of the files. You can achieve the same thing with Nextcloud by using the Virtual Files feature of the Nextcloud desktop client. If you’re using it on Windows, it even uses the exact same technology as the other clients from Google, Dropbox, etc.

5 Likes

Hi, it is impossible for a user to want or try to work with so much data, it is more cultural to accustom them to be selective with what they require…but there are cases where someone must supervise that data from the client.

obviously we use virtualfiles. but it does not compare with google drive client. if i share 400.000 files with google drive the synchronization does not take more than 30 seconds…with nextcloud client you have to wait for it to count the files and then for them to appear, this process takes hours…

I compare it more with the way you sync sharepoint being more stable and a bit faster.

I was desperate with so many errors with the desktop client, but here I go with the git cases, version 3.10 has been a disaster for me with S3 compatibility.

I know there is more to come, I can’t go back to google drive. So I have to wait. thanks to all

Ah ok (never tested it with that many files), if that’s the case, the Google client does indeed seem to handle some things quite differently to the Nextcloud client, as I find it hard to believe that such a huge difference is just due to server performance. Maybe you can search the GitHub issues and open one if none exists yet.

1 Like

Try filling out the support template and seeing if there are details in your logs and such, which might help move this conversation forward. Here:

You are missing the required support template. Please fill this form out and edit into your post.
This will give us the technical info and logs needed to help you! Thanks.

For now, for these cases, I will use webdav, thanks!

1 Like

You could also try samba or nfs mounts. Good luck.

1 Like

I would make a couple zip files. So the up and download would go much faster.

1 Like