How to debug huge performance issue with the user provisioning API?

amenthes · October 3, 2016, 9:55pm

I have an import script to create a few users via Nextcloud’s User Provisioning API. For some reason, the performance is degrading badly.

A few runs of my script (it is still in development) back, it ran “okay” with a response time of about half a second per request. Nothing to write home about, but also not bad enough to really look into it. Recent runs of that script require 13 seconds per Request. And I need a few hundred requests to set everything up.

I have confirmed that my script itself is not he source of the issue. I enhanced my Apache Logfile to also print %D - the request time in microseconds. The requests really does take about 13 seconds. What irritates me is, that every request takes 13 seconds or slightly longer - almost as if there was some kind of rate-limiting in place.

Here’s a short excerpt from my script output:

  - (13105ms) request /ocs/v1.php/cloud/users/some.name (this is the POST that creates this user)
  - (13278ms) request /ocs/v1.php/cloud/users/some.name {"key":"email","value":"name.changed@example.com"}
  - (13371ms) request /ocs/v1.php/cloud/users/some.name {"key":"quota","value":"10GB"}
  - (13387ms) request /ocs/v1.php/cloud/users/some.name {"key":"display","value":"Name Changed"}

And the corresponding lines in the apache logfile (could also be from another iteration of the script. Anyway.):

x.x.x.x - admin [03/Oct/2016:23:23:01 +0200] "PUT /ocs/v1.php/cloud/users/xxx HTTP/1.1" 200 13/13244011 2358 "-" "-"
x.x.x.x - admin [03/Oct/2016:23:23:01 +0200] "PUT /ocs/v1.php/cloud/users/xxx HTTP/1.1" 200 13/13273034 2494 "-" "-"
x.x.x.x - admin [03/Oct/2016:23:23:01 +0200] "PUT /ocs/v1.php/cloud/users/xxx HTTP/1.1" 200 13/13292479 2500 "-" "-"
                                                                          time in seconds --^ /    ^--- time in microseconds

So these four requests to set up a user take almost a minute.

I run Nextcloud 10.0.0 on an Ubuntu 16.04 machine (a VM with enough resources for the job), MariaDB as database and redis as cache is configured. The web interface is plenty responsive. I do not think that this is a general performance Issue.

what I tried so far

I added the debug output to my script and then to the apache logfile.
CPU is idling at near-zero percent most of the time.
Sysload is about 0.4 - not ideal, but I guess that’s due to the fact that there’s always a HTTP-Request waiting for a response.
The Nextcloud admin interface shows no errors in server settings
The Nextcloud.log has almost no lines in it. There’s a few 405 The resource you tried to create already exists, but I do not believe that they would be responsible for the general performance issue.

I am certain that the time is spent on the webserver. I just don’t know how to track it down further.

amenthes · October 3, 2016, 10:03pm

a little more info about the script itself. It’s a setup for a small classroom situation. You have ~30 students and ~7 teachers.

Each Teacher has one directory they’re (read-only) sharing with the students group and the teachers group. (to distribute hand-outs or slides)
Each Student has one directory for each teacher (7 in total, they use this to hand in assignments).
One Teacher has a share wirtable by all teachers and readable by all students.

The script creates the users, sets additional information (like email, quota …), creates directories and sets up the shares.

This takes ~360 API-Requests to set up. Currently the runtime of the script is unbearable, It’s taking more than an hour to complete.

amenthes · October 3, 2016, 10:29pm

I threw wireshark at it, but it also only confirms what I guessed. Everything that belongs to one request is in this image. At Frame 1215 the connection starts, TLS-Handshake is finished at frame 1240. Then I send my request (1241), which is TCP-ACK’d in Frame 1242. All of this happens in a tiny fraction of a second.

A long pause follows.

The next thing I know (Frame 1243) is 1374 bytes of data from the server followed by encrypted alert (closing the TLS) and a TCP-FIN (closing the socket).

amenthes · October 3, 2016, 10:44pm

I just started a general MySQL log. Oh, well, I think I may have found the problem:

Query     SELECT * FROM `oc_bruteforce_attempts` WHERE (`occurred` > '1475491053') AND (`subnet` = '<my ip here>/32')

Well, apparently there’s brute force detection somewhere in Nextcloud. And since I’m doing a ton of requests for a huge amount of users, it appears to have caught on.

In case anyone wonders how to turn on a general log for MySQL:

-- run via mysql for example
SET GLOBAL log_output = "FILE";
SET GLOBAL general_log_file = "/tmp/sql.log";
SET GLOBAL general_log = 'ON';

-- remember to turn it off again, later!
SET GLOBAL general_log = 'OFF';

take a look at /tmp/sql.log or whatever you specified as filename.

SOLUTION: I ran a truncate oc_bruteforce_attempts; on my database, and now it’s humming along like before