Nextcloud taking 6 hours to sync 600,000 files over 900mbs

Kerasit · November 21, 2023, 1:15pm

That is because they use ALOT of in memory data and uses a lot of other tricks to deliver stuff to you emmediatly. In fact their service work in a specific way, where it “intelligently” chooses which files to upload directly to the cloud, and which to keep storing “locally”, but where - if you needs it - it will fetch that one file between two seperate devices. This trick is also something they use to almost emmediatly index files and share that index to all connected devices, including meta data. The transfer of files in Google drives make use of many cool tricks like a combination of the p2p method, transfer in the background, store temporary in memory and eventually dump to disk when the disks are ready. GDrive will gradually empty memory but uses some machine learning to keep a constant index aswell as most accessed files decrypted in memory for fast access.

Again. Thousand of engineers. You do not have, nor can it be expected to anyone, the needed infrastructure and expertise to maintain the needed middleware and microservice infrastructure to run such a setup. It will also introduce som many potential sources of errors that operating a Nextcloud instance will be unbearable.

There are Nextcloud enterprise customers with nealy the same feel of performance than those of OneDrive and Google drive, but as with everything, it is then implementations which has been tweaked and tuned over time.

Kerasit · November 21, 2023, 1:23pm

This is nonsense. If I share 1PB already uploaded to Nextcloud with anyone, it is emmediatly available. What can take time, is loading the webpage that will show you the files on screen, if all those 1PB is stored in same level.

It sounds like the features that suits you is more Google Drive than Nextcloud. The features you are specifically mentioning is in how the clients are designed. Again, read my post: Thousands of engineers and it is basically just working like a torrent file. Nextcloud and Microsoft has focused more on file inetgrity whereas Google has focused on speed and simplicity.

devnull · November 21, 2023, 1:34pm

Sometimes software is not optimized. If you have 100.000 files in a directory and the software must first read the list of all files, that may be bad. I do not know if Nextcloud differ between 100 or 100.000 files. Maybe Google makes it better.

Kerasit · November 21, 2023, 1:39pm

You can definately say that it is not adviced to store that many files in same level anyway. Even if you did an OS native simple listing of the files from a console, it would just spew out lines on your sceen for a long time in a shell console. And here we are talking as native we can get to the raw data.

There are tradeoffs with all design choices. Nextcloud is designed in a certain way and Google drive in another way. I will be honost though. I do not use Nextcloud privately or operates it for a non profit NGO because of performance. I do that because of all the features Nextcloud delivers, including total control of the data.

Weimar-Meneses · November 21, 2023, 2:06pm

Okay, thank you very much, I will continue to insist on the product, I think it is still an interesting solution, but I have exhausted all resources to make it competitive enough with other storage solutions. about my particular case where I have groups of users with more than 600,000 files.

Weimar-Meneses · November 21, 2023, 2:39pm

I wanted to clarify that:

The files are already in nextcloud, example.

a user already saved 800,000 files without problems, but he shares these 800,000 documents with another user.

This user will take more than 6 hours to sync the 800.000 files.

Kerasit · November 21, 2023, 3:21pm

Ah. That is indeed expected.

When using the clients, it will actually download the files to all devices where these are available. This is default behavior. It can be configured to not store files locally in certain situations, hence will only “pull” a file if accessing that file. You should read the documentation for the Nextcloud clients and plan and sesign a setup that fits your need. It make sense though that it take that long in that use case without specifically configuring it to not do it.

RuudschMaHinda · November 23, 2023, 10:53am

What the computer does:
Open file, read it, write it to client, close file, close file on client, update database, write logs, and many more steps in the background, opening closing connections, SYN, ACK, … you name it.

Do that with one 1GB file and it will happen way faster than 1.000.000 1kB files because there are 1.000.000 operations necessary.
Many small files is always slower than just one big file taking up the same amount of disk space. You can even test and time this on your computer just copying without any software inbetween.

Weimar-Meneses · November 23, 2023, 12:20pm

That is clear to me, thank you very much for talking about it, what I mean is that the client is quite slow, it first counts the files then synchronizes…in fact I had to go back twice with more than 1000 users to version 3.9 because version 3.10 does not work well with S3 compatible.

Webdav is faster for this kind of cases, but microsoft insists on eliminating it.

For now the only thing I can do is to let the client choose what to synchronize, a task that becomes complicated for the user if he does not do this task before synchronizing.

My conclusion is that the Desktop client is a product that has a long way to go in a productive environment.

bb77 · November 23, 2023, 1:56pm

Apples and oranges, because with webdav the files stay on the server and only the files that you actually use need to be processed/transmitted. Like on a classic Samba share, where you would never synchronize 800,000 files to your local machine in order to edit one of them.

In what “productive” environment do users need 800,000 files locally available at all times?

The Google client is faster because it only downloads placeholders of the files. You can achieve the same thing with Nextcloud by using the Virtual Files feature of the Nextcloud desktop client. If you’re using it on Windows, it even uses the exact same technology as the other clients from Google, Dropbox, etc.

Weimar-Meneses · November 23, 2023, 2:31pm

Hi, it is impossible for a user to want or try to work with so much data, it is more cultural to accustom them to be selective with what they require…but there are cases where someone must supervise that data from the client.

obviously we use virtualfiles. but it does not compare with google drive client. if i share 400.000 files with google drive the synchronization does not take more than 30 seconds…with nextcloud client you have to wait for it to count the files and then for them to appear, this process takes hours…

I compare it more with the way you sync sharepoint being more stable and a bit faster.

I was desperate with so many errors with the desktop client, but here I go with the git cases, version 3.10 has been a disaster for me with S3 compatibility.

I know there is more to come, I can’t go back to google drive. So I have to wait. thanks to all

bb77 · November 23, 2023, 2:45pm

Ah ok (never tested it with that many files), if that’s the case, the Google client does indeed seem to handle some things quite differently to the Nextcloud client, as I find it hard to believe that such a huge difference is just due to server performance. Maybe you can search the GitHub issues and open one if none exists yet.

just · November 23, 2023, 3:01pm

Try filling out the support template and seeing if there are details in your logs and such, which might help move this conversation forward. Here:

You are missing the required support template. Please fill this form out and edit into your post.
This will give us the technical info and logs needed to help you! Thanks.

Weimar-Meneses · November 23, 2023, 4:59pm

For now, for these cases, I will use webdav, thanks!

just · November 23, 2023, 5:14pm

You could also try samba or nfs mounts. Good luck.

Starmanager · November 24, 2023, 2:59pm

I would make a couple zip files. So the up and download would go much faster.

tflidd · December 10, 2023, 11:28pm

If I use the google API and just want to get the index of a few 10 000 files in one folder, this request takes a couple of minutes. Not sure if they slow the API down, or if they do more hidden in the background with their client app.

In the past, you could speed up a lot of things by optimizing the db cache sizes (there are tools for it). However, 600 000 files means ~83 files/s, that is not too bad, would be interesting to know where the bottle neck is.

If you have a business case behind it, enterprise subscriptions or if you have own sources can help to improve the client software.

jtr · December 11, 2023, 1:13am

Is sync considered a requirement in your use case? Or is the only reason you’re utilizing it due to Microsoft’s deprecation of their built-in WebDAV client?

just · December 12, 2023, 2:36am

The issue here is hundreds or thousands of users. If that is the case, this is already way outside of what we can advise on for community support.

Weimar-Meneses · December 14, 2023, 6:20pm

google offers interoperability, let’s say that everything worked fine until the arrival of the 3.10 client, on the other hand renaming a simple folder with content is death.

For now I will continue in GCP with the primarary storage in bucket, for more critical issues, I have added a fixed scalable disk in xfs and published from external extorage this allows to rename folders much faster.

As for the users for now I will stop using the nextcloud client and unfortunately I have to use the raidrive client, I have not found something better.