NC Linux Desktop Client Sync on Ubuntu 20.04 uses 100% of Server CPU via kswapd0

yupthatguy · December 14, 2021, 8:45am

Through hours of testing, I have found that the nextcloud desktop sync client for ubuntu 20.04 (appimage or ppa) both seem to have a bug to where… if a common nextcloud file sync error occurs , kswapd0 spikes to 100% of CPU and the swapfile on Debian 10.5 server becomes completely filled. (clamscan also spikes 45% to 100% during kswapd0’s climb to 100% of cpu). My other sync clients do not cause this problem (mobile, ubuntu native “online accounts”) .

top command output

top - 16:08:59 up 22 min,  2 users,  load average: 89.42, 84.04, 55.66
Tasks: 378 total,  12 running, 359 sleeping,   0 stopped,   7 zombie
%Cpu(s):  3.4 us, 57.0 sy,  0.0 ni,  0.1 id, 39.5 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3946.8 total,     90.2 free,   3766.4 used,     90.1 buff/cache
MiB Swap:   6144.0 total,      0.0 free,   6144.0 used.      4.9 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND           
   36 root      30  10       0      0      0 R  98.3   0.0  12:43.68 kswapd0           
 1691 mysql     20   0 1739540   2376      0 S   3.9   0.1   0:34.59 mysqld            
 1300 root      10 -10  116752   3400      0 D   3.3   0.1   0:41.96 AliYunDun         
 1544 root      20   0  806108    640      0 D   2.4   0.0   0:09.45 aliyun-service    
  161 root      20   0    4556   1904   1844 S   0.9   0.0   0:10.60 plymouthd         
 2746 git       20   0 1374728   6020      0 S   0.7   0.1   0:07.23 gitea             
 1114 root      20   0   24312    284      0 S   0.5   0.0   0:03.74 AliYunDunUpdate   
 5805 web2      20   0  292472 215456    920 D   0.4   5.3   0:05.43 clamscan          
  155 root       0 -20       0      0      0 I   0.3   0.0   0:07.11 kworker/0:1H-kbl+ 
  232 root      20   0   70888    284     88 D   0.3   0.0   0:03.74 systemd-journal   
  936 memcache  20   0  408168      0      0 S   0.3   0.0   0:02.19 memcached         
 3492 root      20   0   11380    756    556 R   0.3   0.0   0:03.28 top               
    1 root      20   0  170192   2972      0 D   0.3   0.1   0:11.03 systemd           
 1041 redis     20   0   54244    428      0 D   0.3   0.0   0:03.28 redis-server      
 4029 www-data  20   0  339376   2436     16 D   0.3   0.1   0:00.85 /usr/sbin/apach

I have tried using nice and cpulimit to prevent kswapd0 from reaching 100% and completely consuming the swap memory… but kswapd0 seems to just power through both commands whether run individually or simultaneously and consumes 100% of CPU and swap, leaving me no choice but to reboot the server in order to clear the swap cache re-gain use of other services.

I have already reduced swapiness to zero. And I have tried:

To free pagecache:
    echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
    echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
    echo 3 > /proc/sys/vm/drop_caches

As I figure nextcloud file sync errors will be a common thing in the future, might someone be able to suggest how I can mitigate / prevent a simple file sync error from taking down my entire server?

Steps to reproduce:
1.Install install Desktop client for Ubuntu & Win10 (dual boot machine)
2. Sync calendars in Linux TB & Android mobile everything should work fine.
3. QOwnNotes generates a common sync error when using git tracking…
4. a few hours later log into to ubuntu (launching NC sync client at boot), ssh to server, and check #top output.

Things that I have tried to troubleshoot the problem:

If I use #killall -9 kswapd0, and stop apache server and mysqld, I can get enough speed so that the terminal is usable. But note, this is temporary and kswapd0 spikes back to 100% 18 to 10 minutes later (or faster)
I tried clearing the swap cache by turning swapon and swapoff. NOTE #swapon -a && swapoff -a will NOT work and will only result in an error swapoff: /swapfile: swapoff failed: Cannot allocate memory. I ultimately had to use these instructions as nothing else worked… (essentially, create a second swap, absorb whatever processes from the original swap, then swapoff/swapon the original swap, and finally remove the second swap)
3.) I edited the /etc/sysctl file and set swapiness to 0
4.) I disabled / removed all sync devices from NC admin/security and re-enabled them 1 by 1… thunderbird and mobile sync client seem to have no problem, but when I re-enabled linux Desktop (appimage or ppa) sync client… BANG problem came back.

Does anyone have any tips on how to resolve this problem?

UPDATE

After some additional testing and reading… it seems that ClamAV is running clamscan on every upload and email which spiking CPU usage to 100%. The relation to nextcloud is that I have anitvirus for files activated. Therefore, my file sync uploads also start clamscan as well, then overload the server.

The solution seems to be stop using clamscan but instead implement clamav-daemon. I am researching the problem now, but if someone can tell me how to switch from clamscan to clamav-daemon. I would appreciate it.

yupthatguy · December 14, 2021, 3:03pm

The problem I described above was an “illusion” caused by clamscan and nextcloud antivirus… here’s how I solved it:

The problem above was twofold, meaning amavis was running clamscan instead of clamd, meanwhile nextcloud antivirus was defaulted to use clamscan instead of clamd.

Solution:

1.) #dpkg-reconfigure clamav-daemon #setup up amavis to use clamd
2.) Change nextcloud’s antivirus default from clamscan to clamav daemon (socket)

This will solve your problems.

Something useful, but optional here. For those operating a shared hosting environment with debian/ubuntu which has systemd/cgroups installed by default. I found an excellent tutorial on how to limit a user’s overall CPU usage:

https://www.webhostingtalk.com/showthread.php?t=1832382

With this you can limit a user’s overall CPU usage, as to avoid having clients crash the server because of bad application settings.