Nextcloud Overload

Hello,

Since the implementation of my new Nextcloud server I am experiencing serious overload problems, until even the web interface and any SSH connections don’t work at all.
the only way to regain control over the VM is to shut it down and reboot it.

[/details]
Nextcloud version (eg, 20.0.5): 24.0.5
Operating system and version (eg, Ubuntu 20.04): Debian 11
Apache or nginx version (eg, Apache 2.4.25): Apache 2.4.54
PHP version (eg, 7.4): 8.1.11

I already had crashes of this type so I put it in a much more powerful server

But now after 5 or 6 days, the server is completely down, and this is what I see in esxi

My hardware

The content of /proc/sys/kernel/hung_task_timeout_secs is 120

if you need more information no problem.

Thank you in advance for your help

Hey :wave:

are you sure you are meeting the system requirements?
https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html

except that I use debian 11 yes

this is the tutorial i used to install my nextcloud server

Maybe then consider to try to use a supported Debian version, can make a big difference

I would start off by determining exactly what resource is overloaded. Try running htop in Debian to get a look at what’s going on. You can also run esxtop on ESXi to see the VM’s resource usage in various ways. Try to determine if it’s CPU, memory, or disk read/write being overloaded, and by what process.

Is Debian 11 really not supported? That’s news to me.

Seems like it… System requirements — Nextcloud latest Administration Manual latest documentation

…but I’m pretty sure this is a mistake and someone should probably correct it. :wink:

Debian 11 has been the current version for over a year.

1 Like

Well as I said, probably a mistake. May guess would be that they just forgot to update the documentation…

thank you for your interventions, I will try to find more precisely the origin of the problem and I will come back to give the informations.

for the moment everything is fine (I rebooted this morning)

Hello
For information, here are the load graphs of my Nextcloud server this morning, before the reboot, collected by Centreon

your system is pretty powerful, depending on users count maybe some memory shortage. you graph show classic resource leak. Maybe you can share you user/usage metrics so we better align it to our experience (lot of forum user use RasPi or SoHo NAS). I don’t remember others reported such behavior - but maybe because majority of the users are not expert enough to analyze the issue as good you did.

the crash you reported in the first post shows coolwsd which is Collabora component - do you perform lot of/huge office document work… many users? would be great you can isolate the process eating CPU and memory so we can focus on specific components?

I agree it looks like a leak, in particular because of the way it fills up memory and then fills up swap too. When it starts doing that again, you can use htop to find the responsible process.

yes, I have quite a few collaborators (between 10 and 15) who work a lot with collabora online and indeed, it has been responsible for quite a few server crashes in its old configuration with “out of memory” alerts.

The other thing I noticed yesterday is that after opening a docx document, modified, and then closed, I have a collabora process that has remained active for several hours (PID 4033 and 4063).

I programmed yesterday a reboot of apache for this night and this morning these processes disappeared.
I removed this morning this reboot of apache to see in the duration how the server behaves

I keep this server under close surveillance and will give you the information I can gather.

Just a curiosity question, do you happen to use the Collabora CODE built in server?

Hello
Well sorry to come back so long after opening this post, but I was just waiting for the problem to reappear.
And so it is the case today

As I suspected, the problem of RAM saturation comes from collabora, here is the proof:

So I launched the commands

kill 28803
kill 41378

And everything is back to normal, but this is not a viable solution, there is a problem somewhere.

I don’t know if this can be the subject of a bug analysis or the opening of a ticket with a request for a fix, but clearly this is a situation that leads to a saturation of the server and thus a big problem of stability.

What do you think about it?

For information, about 10 minutes after killing both processes, Nextcloud became inaccessible.
I had no other solution than to reboot the server

Best regards.

Hi FSF,

I had very similar expirience with Nextcloud 24.x installed on Ubuntu (Proxmox container). No matter how much RAM I gave it it would ‘freeze’ and reboot was only way to regain SSH/www access. Tested all sorts and eventually I figured out that it was redis or APCu issue.

Try changing config.php cache section to this:

'memcache.local' => '\\OC\\Memcache\\Redis',
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' => 
  array (
    'host' => 'localhost',
    'port' => 6379,

Not sure why but the APCu was to blame. Had no issues ever since made change to memcache.local.

Cheerio!
dzidek23

Thank you.

In config.php I already have these directives:

 'redis' =>
  array (
    'host' => '/var/run/redis/redis-server.sock',
    'port' => '0',
    'timeout' => '0.0',
  ),
  'memcache.distributed' => '\\OC\\Memcache\\Redis',
  'memcache.local' => '\\OC\\Memcache\\Redis',
  'memcache.locking' => '\\OC\\Memcache\\Redis',

I still have to change the array redis part?

this is all I have for mamcache.

  'memcache.local' => '\\OC\\Memcache\\Redis',
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' => 
  array (
    'host' => 'localhost',
    'port' => 6379,
  ),

I run only one server so the memcache.distributed doesn’t do much (and that’s why it is gone from the config)

'memcache.distributed' => '\\OC\\Memcache\\Redis',

My redis runs on default config (localhost and port 6379) and I’d suggest you check what’s in your redis file. If in doubt restore default redis config and see how it goes, don’t just copy and paste from a guide even the best one.

If you don’t have too many users comment out all memcache configuration in nextcloud config and see if that resolves the issue. If it does, check redis settings, enable for local and then for distributed.

Thank you

I will set up your configuration (after backup of course) to see what the result will be and if it solves the problem.

So I modified the config.php file as you recommended and I immediately got the following error message

Internal Server Error

The server encountered an internal error and was unable to complete your request.
Please contact the server administrator if this error reappears multiple times, please include the technical details below in your report.
More details can be found in the server log.

and this is the content of the file /var/log/apache2/nextcloud-error.log

[Mon Nov 28 16:09:29.281483 2022] [php:notice] [pid 3550] [client 172.16.10.10:58831] {"reqId":"tpHGJzita9vWHI3lUUbS","level":3,"time":"2022-11-28T16:09:29+01:00","remoteAddr":"172.16.10.10","user":"--","app":"index","method":"GET","url":"/","message":"Connection refused","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0","version":"24.0.5.1","exception":{"Exception":"RedisException","Message":"Connection refused","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/RedisFactory.php","line":137,"function":"pconnect","class":"Redis","type":"->"},{"file":"/var/www/nextcloud/lib/private/RedisFactory.php","line":178,"function":"create","class":"OC\\\\RedisFactory","type":"->"},{"file":"/var/www/nextcloud/lib/private/Memcache/Redis.php","line":43,"function":"getInstance","class":"OC\\\\RedisFactory","type":"->"},{"file":"/var/www/nextcloud/lib/private/Memcache/Factory.php","line":118,"function":"__construct","class":"OC\\\\Memcache\\\\Redis","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":1106,"function":"createLocking","class":"OC\\\\Memcache\\\\Factory","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":162,"function":"OC\\\\{closure}","class":"OC\\\\Server","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/pimple/pimple/src/Pimple/Container.php","line":122,"function":"OC\\\\AppFramework\\\\Utility\\\\{closure}","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":129,"function":"offsetGet","class":"Pimple\\\\Container","type":"->"},{"file":"/var/www/nextcloud/lib/private/ServerContainer.php","line":136,"function":"query","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":57,"function":"query","class":"OC\\\\ServerContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":2082,"function":"get","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/View.php","line":122,"function":"getLockingProvider","class":"OC\\\\Server","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":454,"function":"__construct","class":"OC\\\\Files\\\\View","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":162,"function":"OC\\\\{closure}","class":"OC\\\\Server","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/pimple/pimple/src/Pimple/Container.php","line":122,"function":"OC\\\\AppFramework\\\\Utility\\\\{closure}","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":129,"function":"offsetGet","class":"Pimple\\\\Container","type":"->"},{"file":"/var/www/nextcloud/lib/private/ServerContainer.php","line":136,"function":"query","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":57,"function":"query","class":"OC\\\\ServerContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":1445,"function":"get","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/base.php","line":602,"function":"boot","class":"OC\\\\Server","type":"->"},{"file":"/var/www/nextcloud/lib/base.php","line":1111,"function":"init","class":"OC","type":"::"},{"file":"/var/www/nextcloud/index.php","line":34,"args":["/var/www/nextcloud/lib/base.php"],"function":"require_once"}],"File":"/var/www/nextcloud/lib/private/RedisFactory.php","Line":137,"CustomMessage":"--"}}
[Mon Nov 28 16:09:29.283059 2022] [php:notice] [pid 3550] [client 172.16.10.10:58831] {"reqId":"tpHGJzita9vWHI3lUUbS","level":3,"time":"2022-11-28T16:09:29+01:00","remoteAddr":"172.16.10.10","user":"--","app":"core","method":"GET","url":"/","message":"Connection refused","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0","version":"24.0.5.1","exception":{"Exception":"RedisException","Message":"Connection refused","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/RedisFactory.php","line":137,"function":"pconnect","class":"Redis","type":"->"},{"file":"/var/www/nextcloud/lib/private/RedisFactory.php","line":178,"function":"create","class":"OC\\\\RedisFactory","type":"->"},{"file":"/var/www/nextcloud/lib/private/Memcache/Redis.php","line":43,"function":"getInstance","class":"OC\\\\RedisFactory","type":"->"},{"file":"/var/www/nextcloud/lib/private/Memcache/Factory.php","line":118,"function":"__construct","class":"OC\\\\Memcache\\\\Redis","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":1106,"function":"createLocking","class":"OC\\\\Memcache\\\\Factory","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":162,"function":"OC\\\\{closure}","class":"OC\\\\Server","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/pimple/pimple/src/Pimple/Container.php","line":122,"function":"OC\\\\AppFramework\\\\Utility\\\\{closure}","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":129,"function":"offsetGet","class":"Pimple\\\\Container","type":"->"},{"file":"/var/www/nextcloud/lib/private/ServerContainer.php","line":136,"function":"query","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":57,"function":"query","class":"OC\\\\ServerContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":2082,"function":"get","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Files/View.php","line":122,"function":"getLockingProvider","class":"OC\\\\Server","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":454,"function":"__construct","class":"OC\\\\Files\\\\View","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":162,"function":"OC\\\\{closure}","class":"OC\\\\Server","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/3rdparty/pimple/pimple/src/Pimple/Container.php","line":122,"function":"OC\\\\AppFramework\\\\Utility\\\\{closure}","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":129,"function":"offsetGet","class":"Pimple\\\\Container","type":"->"},{"file":"/var/www/nextcloud/lib/private/ServerContainer.php","line":136,"function":"query","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":57,"function":"query","class":"OC\\\\ServerContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/Server.php","line":1445,"function":"get","class":"OC\\\\AppFramework\\\\Utility\\\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/base.php","line":602,"function":"boot","class":"OC\\\\Server","type":"->"},{"file":"/var/www/nextcloud/lib/base.php","line":1111,"function":"init","class":"OC","type":"::"},{"file":"/var/www/nextcloud/index.php","line":34,"args":["/var/www/nextcloud/lib/base.php"],"function":"require_once"}],"File":"/var/www/nextcloud/lib/private/RedisFactory.php","Line":137,"CustomMessage":"--"}}

so, unfortunately, in my case it poses more problems than it solves.

so I went back to my original configuration