I have realized that my cron is taken quite some time to finish. After some investigation I found that the job OCA\Files\BackgroundJob\ScanFiles is triggered via the oc_jobs table in my database. The job does not run every time cron is started (which would be every 5 minutes) but at least twice a day.
Why ? I mean, why is it scanning all my files for new / changes files. My server hosts around 1,5 TB of data, this takes forever and I am not getting the idea of it.
I would understand if I have external storage activated (I had previously btw). or if I upload files without using any NC interface (web, client, webdav) but this is not the case.
Biggest concern or unanswered question WHY is OCA\Files\BackgroundJob\ScanFiles even necessary, and why does it scan through all files even the previews ?
Thanks for any help and advice, perhaps it is a leftover from the times I used the external storage app.
The second problem was the scan of my appdata/preview folder. I have many pictures, and I also use the NC instance for many years already. Long story short over 3 500 000 entries in the files_cache table only for the previews.
I need to push this again, someone must know if this is normal, if this should be in the database for a standard installation or if this is kept in the database from an uninstalled app.
With around 2 TB of files, this takes ages to finish and results in weird NC behavior.
Running occ files:scan --all finishes within 30 minutes and without any error. The oc_jobs tables tells me that the cron, after fixing some issues I had with occ files:scan, still takes several hours to finish.
Is there a way to figure out what cron.php ist doing for hours, even days?
I also tried to enforce only one cron job running, but this stops crons from running over days using for example flock
The big question is, what is cron.php doing while it reserves CPU cycles and starts messing with mysqld?
I’ve been reading your posts from various threads because I’ve had the same issue since upgrading to v19.x.
Twice a day CPU usage becomes excessive resulting in swapping which, at least once a day, grinds the VPS to a halt. Notification of excessive swapping.
I’m rather surprised this issue hasn’t been resolved by now.
Having spent a good amount of time on this cron.php problem I’ve, to be honest, expended far too much time with no movement toward fixing this.
Thanks, at least someone else having the same issue, for now I made a quite radical workaround killing all “php” processes every hour, 3 minutes after the full hour. This is not nice, but everything I could think off to keep my system running. I still do not understand why the scanfiles job takes so long - I even have not idea why it is even necessary.
This is what I added to the cron of www-data
sudo -u www-data crontab -e
# WorkAround
# used to kill the php job which gets stuck and eats up CPU
3 */1 * * * killall php
I have fixed my issue with the cron, at least for now. Using strace -p PID for the process never finishing I realised that it was going over my appdata_xxx/preview folder and was never able to finish this.
Based on the files_cache database I had more than 3 158 958 entries linking to the preview. I assume I had even more files in there. I followed a “not so recommended” procedure to get rid of the folder and the database entries and it worked (but took forever).
Still there are open questions
Why the hack do I have millions of entries for preview in there? I have lots of pictures, but if this is an “overall” bottleneck this should be solved differently.
Why the hack are all files, including previews, scanned twice a day using the jobs database? OR is it even more often? This is my biggest concern, question, whatsover part of this - WHY ? Bigger system must suffer from this too, I just don’t get it.
How does the jobs database work, what do the entries mean, any documentation appreciated.
And why is nobody jumping in from the DEVs helping out, I thought this is a support forum where even DEVs from the Nextcloud Team help out?
Edited PRM config files and needed and added on in the …/rules directory for the user that ran the Nextcloud cron job. Thus the file usr123.user contained:
IGNORE=""
MAX_CPU="20"
MAX_MEM="20"
MAX_PROC="50"
# we dont care about the process run time, set value 0 to disable check
MAX_ETIME="0"
IGNORE_ROOT="0"
KILL_TRIG="1"
KILL_WAIT="1"
KILL_PARENT="1"
KILL_SIG="9"
# KILL_RESTART_CMD="service php7.4-fpm restart"
KILL_RESTART_CMD="/sbin/reboot"
The PRM log showed this when the Nextcloud cron job ran away:
I’d tried restarting php7.4 as you see, but the server still ground to a hold. So I resorted to rebooting which worked.
Monit:
/etc/monit/monitrc
check system $HOST
[..]
if loadavg (5min) > 10 then restart
if swap usage > 50% for 2 cycles then restart
if cpu usage (system) > 90% for 1 cycles then restart
stop program = "/sbin/reboot"
[..]
I’ve not done the process with clearing out appdata_xxx/preview yet.