Nextcloud VM Crashes When Syncing Large Files

Issue: Nextcloud VM Crashes When Syncing Large Files

Greetings,

I am running Nextcloud AIO (9.6.0) with Nextcloud v29.0.7 on a Proxmox server 8.2.7. The VM dedicated to Nextcloud has the following specs:

  • CPU: 12 cores
  • RAM: 15 GB
  • OS: Ubuntu Server 24.04 LTS
  • Connection speeds: 990 Mbit / 500 Mbit

This Nextcloud instance is used by only 4 users. Initially, my entire Proxmox setup was running on a traditional HDD, but I switched to an NVMe disk for reasons I will outline below.

Problem Description:

For over a year now, I have been facing the same issue. Whenever I try to sync a file larger than 1 GB from my PC to the Nextcloud server (for example, a video file), the sync process appears to go smoothly. However, at the final stage of the upload, I observe a significant spike in both RAM and I/O disk activity. This causes the entire VM to shut down, and I have to manually restart it.

Once the VM is back online, the file sync does not resume, and it behaves as if the file has already been synchronized. Unfortunately, the file remains on my local device, unsynced to the server. On the server (via CLI), I only see a temporary file with the .part extension in the respective directory.

I’ve tested multiple Nextcloud clients, but they all result in the same server crash when syncing large files. The same issue occurs when I try to upload the file via the web interface in a browser. Syncing files around 1 GB completes without issues.

I’ve checked the Nextcloud logs and haven’t found any errors. However, in the Ubuntu server logs, I see a message indicating that the server was “killed” due to an I/O problem.

Attached are screenshots from Proxmox graphs where I attempted to upload a 2.7 GB Linux Mint 22 ISO file.
proxmox-screenshots.pdf (398,7 KB)

Observations:

  • Uploading large files to the Ubuntu server via SSH (using Midnight Commander) or rsync works without any problems, even for files in the 5-8 GB range.
  • This leads me to believe the issue might be related specifically to Nextcloud.

My Question:

What could be causing this problem, and what would you recommend to resolve it?

However, in the Ubuntu server logs, I see a message indicating that the server was “killed” due to an I/O problem.

What’s the message? I presume it’s a kernel message.

It’s highly unlikely that Nextcloud itself is the cause of this, but it does seem to be bringing out something odd about your environment.

Can you post your Nextcloud config (occ config:list system) and installed apps list (occ app:list)?

@jtr I hope that I did it right. I am not IT pro.

config - config-nextcloud - Pastebin.com
apps - apps-nextcloud - Pastebin.com

Can you also post the kernel error?

Long shot, but haven’t looked to closely yet: Does disabling the antivirus checking make any difference?

I am not able to obtain the kernel log now.

I tried to disable CLAM AV docker in AIO as well. It did not help. :frowning_face:

I exported some log file from Proxmox VM (103) where Nextcloud AIO is running.

I see that oom killed whole VM103. But I do not know a reason.

Today I attempted to synchronize a file named mint02.zip (2,7 GB), which contains a Linux Mint installer.

Process:

The synchronization in the client progressed to 99%, but during finalization, the Nextcloud server crashed again. After restarting, I checked the server and found a file with the .part extension, as seen in the screenshot.

In the meantime, the synchronization resumed, and the ZIP file was successfully uploaded to the server.
On the second screenshot, you can see the uploaded file. However, it’s interesting that the .part file remained.
I expected it to be deleted after a successful synchronization.

I’ve attached screenshots showing CPU, RAM, and disk I/O usage during this time.

All Screenshots in PDF -
mint02-upload-screenshots.pdf (792,4 KB)

I’m not sure how the file upload management works in Nextcloud, but my understanding is that during large file uploads, the client should progressively save the upload state to the .part file (e.g., in chunks of 500 MB). This would prevent a single large data write, which may be causing the server to crash in my case.

This is just a hypothesis, considering that the .part file had a size of about 348 MB at the time of the crash, while the original ZIP file is approximately 2.7 GB in size. That leaves about 2360 MB unaccounted for, which seems to have been lost in the process.