Nextcloud version : 14.0.4
Operating system and version : Debian 9.6
PHP version : 7.2.12
Variant: Docker php-fpm Image
Everytime I move big number of files in Nextcloud, the server basically becomes unresponsive for the duration of the MOVE.
Further investigation shows that the MOVE basically issues ALL move commands all at the same time, and all available php-fpm workers are used to handle them, leaving no available workers to handle normal HTTP requests.
A really neet way to DoS a nextcloud Instance, if you ask me…
pm.max_child limit does not really help, as I just have to select a larger amount of files to DoS the instance.
Is there a way to limit how many php-fpm workers should handle file operations?
Can you open top and check if you have a lot of i/o-waits?
Hey, thanks for the response.
I did another COPY operation and checked top. IO-Wait was hovering around 50% never going above 60%. So I don’t think I’m IO-Bound. I also don’t think I’m CPU-Bound.
It’s just that all fpm children get used for the COPY/MOVE Operation (currently set at 20) and not other request can be handled until one is again available (done with the MOVE/COPY request).
I also have to mention that when I do a MOVE/COPY Operation in Nextcloud I always COPY/MOVE between filesystems/partitions (due to the underlying FS structure that I use), so the COPY/MOVE command actually do COPY/MOVE the data around (read and then write). (not like a MOVE in the same filesystem, that just updates the path and returns immediately)
Same issue still present in 18.0.3. Copying a large number of files from location a to location b let’s the server become unresponsive for the duration of the COPY.
I checked, all available php-fpm workers are used. Plenty of resources on the machine available, disk I/O is not the issue and all other services on the machine function just fine.
How can we reserve some workers to never handle copy/move requests or even better reserve to handle regular https/webinterface requests? Think this should be the default.
Just want to say that I don’t have the issue anymore since around version 16 or so.
If I do a COPY or MOVE operation in my current instance (18.0.3), only 5 workers are used at any time for that operation.
To the best of my memory I did not configure it manually, so I assumed it just got “patched”.