Files:scan slow, but non files:scan activity is normal

Nextcloud version (eg, 20.0.5): 28
Operating system and version (eg, Ubuntu 20.04):ubuntu
Apache or nginx version (eg, Apache 2.4.25): apache
PHP version (eg, 7.4): 8.2
Database: latest fresh install of mariadb

The issue you are facing:
I will update the actual versions (but they are downloads from 4 days ago in ubuntu, with the latest ubuntu downloadable iso at the time.

Set up amd 7950 64gb ram, 2ssds, 2 10 tb.

Running through vm workstation.

Scenario:
Files:scan takes 2.5 seconds per file. With > 20 thousand files that will take too long. I’m not interested in setting pictures sizes of thumbnails. CPU utilization at 5 percent? Mostly for the vm workstation. Cores allocated to the workstation is 8, and memory is 24 gb.

raid drive is visible in /mnt/pictures and is added as a local drive in nextcloud. To check if it’s something with SMB (it’s installed) I did some file movements/downloads of both a lot of little files and several multi gb files within the vm workstation running ubuntu. transfer speeds were of 60-110 mb/s and what I would expect for correct functioning. Within windows my resource monitor also confirmed said transfers within both network adapter (share drive to vm file move) and hard disk read activity.

Next scenario, open nextcloud and login from within windows to download large file and lots of little files. Performance was correct and as expected (speed wise)

Next scenraio: run .occ files:scan…no lock errors or anything. 2.5 seconds per file less for smaller files.

During the scan CPU usage is almost negligible, nic is barely used, hd activity almost none.

Perhaps to test if regurgitating completed thumbnails will be slow. Precached a folder with files:scan, opened it from a web browser on a remote machine. Thumbnails appeared very quickly.

Therefore, where is the bottleneck, how to check if utilization of resources nominal? Size of pictures is 400 kb to 25 mb. Avg 4mb

Nextcloud version (eg, 20.0.5): 28

Which patch/minor version level?

Files:scan takes 2.5 seconds per file.

I assume you’re trying to do this because you’re adding files externally (i.e. from outside of NC) to this mount routinely and they’re not getting picked up automatically[1] already?

raid drive is visible in /mnt/pictures and is added as a local drive in nextcloud.

So /mnt/pictures is located on the 2x10TiB spinning HDDs, correct?

Therefore, where is the bottleneck, how to check if utilization of resources nominal?

Is your db/mariadb located in the same VM you’re monitoring or elsewhere? I would expect database activity against the filecache table during the scan run.

Also confirm there aren’t weird things appearing in your Nextcloud log during the scan (well, or generally).

Might be worth adding -vvv (verbose) to the scan, but it’ll likely list just each file (and itself potentially slow things down).

What apps do you have installed (occ app:list)? (I believe a command-line scan still emits events so other apps installed could come into play if they’ve defined listeners for certain file activities).

If you feel like experimenting, it may be worth seeing if there is a difference if you ditch the local External Storage mount and simply mount the underlying OS volume currently located at /mnt/pictures somewhere directly under your NC datadirectory.

[1] References:

I wanted to update and close this thread.

The issue is something that can’t be fixed. After running TOP I found that the limitation is in the CPU and the software. Preview generation is single threaded and although I’ve seen some options to speed things up like removing db locks during the initial scan…I would rather not do it. My single thread on the cpu was at 100% during the initial scan and finished eventually 3-4 hours later. There’s no programming issue that we can correct, the only thing possible to do is rewriting the process.

Nextcloud seems to go into sequential order in both locking/unlocking and file creation. I think these two processes should be unlinked. The initial process should be a multithreaded scan of of the directory followed by a multithreaded generation of preview thumbnails. Then the database should have those loaded and checksummed…

TLDR;
Picture pregen is a single threaded process, slowness is limited to the max speed of your highest performing unit.