Glacial import, too many songs?

I’m running import on the command line, and it’s going glacial slow. ~1 song an hour. It started much faster, ~10 seconds a song.

Investigating the server I find ‘php occ music:scan --all’ is now at 62H and pegged at 100%CPU. Everything else looks reasonable.

Any ideas? Is this a music collection size issue? I think 12,000+ songs have been imported so far.

Are your songs maybe stored on some external storage? That kind of setups have proved to be problematic because (depending on many things) they may require transferring the whole files between your file server and the Nextcloud server to be able to index them. But still, the process should not slow down as it progresses.

Which database system are you using? If you were using SQLite, then it could explain the problem, as it is not suitable for anything but toy-installations.

As a reference for which kind of performance you should be getting, my test environment is RaspberryPi 3 with internal storage of Nextcloud located on an HDD attached with USB 2. It’s fair to say this is a low-end server. But still, I have managed to scan 60000+ tracks library within a reasonable time. The scanning pace is pretty much constant and about 10 songs every 3-4 seconds.

I3 530, zfs raidz on local sata drives. So that shouldn’t be an issue at this level.

And yep config says:
‘dbtype’ => ‘mysql’,

guess I need to find instructions on conversion.

I’m guessing the comments about ‘single user’ for mysql is a little misleading.

Confusing myself, mysql = maria I think:

root@media:/var/www/nextcloud# ps aux | grep sql
mysql 2270 1.8 2.3 2770220 87068 ? Ssl Jan31 379:13 /usr/sbin/mysqld
root 1045728 0.0 0.0 9044 732 pts/1 S+ 18:00 0:00 grep --color=auto sql
root@media:/var/www/nextcloud# /usr/sbin/mysqld --version
/usr/sbin/mysqld Ver 10.3.25-MariaDB-0ubuntu0.20.04.1 for debian-linux-gnu on x86_64 (Ubuntu 20.04)

So I should be fine?

You might want to dig deeper, you are far away from:

and this is a raspberry pi 3!

Raspberries are really nice reference systems because all parameters are known. So please provide more information, SATA implies that you are probably not using a Raspberry Pi.
Also 100% CPU usage, what is using the CPU, Nextcloud-php (something in the logs?), webserver (again the logs) or the database (setting the cache sizes correctly to your use case can make a huge difference), …

Okay, I see no obvious reason, why it would be that slow on your system.

I now retriggered the scanning on my “large test set” with 60k+ tracks to get some up-to-date test results; it has been like two years since I previously scanned that set. The process at least started with normal pace but it will still take whole day to complete, even if the pace does not slow down.

All these details are above.

Yes it is an I3 530.

The process using 100% CPU is “php occ music:scan --all”

Trying to find relevant log files, the only one I can find is:
root@media:/var/log/mysql# tail error.log
2021-02-14 17:56:03 442919 [Warning] Aborted connection 442919 to db: ‘nextcloud’ user: ‘nextcloud’ host: ‘localhost’ (Got an error reading communication packets)

And that has apparently been generating that single line once a day.

Webserver log should be irrelevant to running it on the command line right.

(this afternoon it has scanned ~40 songs in 4 hours)

I could look at tuning the database, but as it is php using the cpu that shouldn’t be relevant right?

I guess it’s not completely impossible for this to be caused by the database settings, but maybe PHP configuration might be a better bet. When you view the Nextcloud settings as an admin user, do you see anything interesting in the Admin > Overview section? Nextcloud reports here if it can identify any kind of configuration issue.

Looking at the cpu usage on my system during the scanning, ps aux reports that the “php occ music:scan” command is taking about 15% while three “php-fpm” instances are taking together 40-60% at any given moment. The “mysqld” command is reported to consume about 2% of cpu.

Meanwhile, the top command shows that “mysqld” would be taking some 60% of the cpu while “php” takes about 14% and “php-fpm” varies between 0 and 60%. But frankly, I don’t know how the percentages reported by top should be interpreted as they often sum up to much more than 100% (but not all the time even under the extreme load).

Nothing I’d expect to do anything, I’m not using server-side encryption, it was just enabled for moments, but the documentation is horribly out of date on how to get rid of that warning.

root@media:/var/www/nextcloud# ps aux | grep php
root      196249  0.0  0.0   8912   664 pts/2    S+   09:28   0:00 grep --color=auto php
root     1061684  0.0  0.0  12348  2592 pts/1    S+   Feb14   0:00 sudo -u www-data php occ music:scan --all
www-data 1061736 97.1  1.8 348312 69336 pts/1    R+   Feb14 899:11 php occ music:scan --all
root@media:/var/www/nextcloud# ps aux | grep mysq
mysql       2270  1.8  1.4 2770220 54868 ?       Ssl  Jan31 390:44 /usr/sbin/mysqld
root      198079  0.0  0.0   9044   724 pts/2    S+   09:29   0:00 grep --color=auto mysq

Do I not have fpm active and that is slowing things down this much? Or is that just the difference between web interface and command line?

Hrm, now when I check iotop I have:
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1422 be/4 root 30.49 K/s 0.00 B/s 0.00 % 63.19 % [txg_sync]
I think this could be it, system is pretty ram constrained for ZFS, will look at options there.

You should be seeing php-fpm there also on the command line use, provided that it has been set up correctly. Not having it probably slows down things significantly, although I’m not sure if the effect could be so extreme.

Server-side encryption is something which I haven’t tested basically at all. I guess it’s plausible that it would cause the RAM usage to explode when tens of thousands of files are being read within a single run. If this would then lead to extensive swapping, that could explain the extreme slowness :thinking:.

Hrm, some files appear to be taking a lot less time. I think I need to just be patient.

Yeah, now we are out of that artist, songs are processing every 10 seconds.

Anything I can do to identify why they were so slow? One of the files that had issues:
╰─>$ mp3info -x JHCD0101.mp3
File: JHCD0101.mp3
Title: Track:
Artist:
Album: Year:
Comment: Genre: Other [12]
Media Type: MPEG 1.0 Layer III
Audio: Variable kbps, 44 kHz (stereo)
Emphasis: none
CRC: No
Copyright: No
Original: No
Padding: Yes
Length: 0:00

I’m guessing that the lack of info causes issues?

Lack of info causing issues sounds weird, too. And 10 seconds per song is still at least 10 times as long as it should take.

If you want, you can share one of those “slow files” with me and I can test if they cause any slowness on my system. My email is pauli(dot)jarvinen(ät)gmail(dot)com.

1 Like

I got the sample files, and they are obviously corrupted. In fact, I couldn’t find any player which could open them. I can also confirm, that the scanning gets stuck on these files in the Music app. The occ music:scan has now been taking 100% cpu for maybe half an hour, and it is still processing the first file.

I checked these files also with MP3 Diags software on Windows. It, too, took tens of seconds to analyze the files, while it usually does the analysis in a blink of an eye. In the end, it found many corrupt streams and no valid streams from the files.

Yeah, I was about to check they were valid and then was distracted and forgot. And now I think about it, that whole artist directory is full of oddly structured albums, not like the bulk of the collection that is well formatted, I wonder how that came to be.

Sounds like a bug, but not an important one.

I have reported this issue onward for the getID3 library, which we use to read the file metadata: Analysis gets stuck with corrupt files · Issue #286 · JamesHeinrich/getID3 · GitHub

James there promised to look at it later. Hopefully, he will figure it out how to prevent (nearly) hanging up on such files, and we may get a fix at some point.

1 Like