504 Gateway timeout for large file uploads

Qtax · July 21, 2020, 11:24am

I’m getting the not so uncommon error when uploading large files trough the web interface:

Error when assembling chunks, status code 504

I’m a developer and would like to figure out what is actually going on here. What is Nextcloud actually doing in step 4 below?

How common is this for large file uploads? Does this happen to everyone that have <= 1min timeouts and doing big enough file uploads?

The course of events

No other load is on the server at the time.

Uploading a 5GB file through the web interface
File is split in 10MB chunks/parts that are uploaded one at a time
HTTP MOVE .file request is made at the end of the upload of all of the parts
The PHP process (nginx FPM) handling the request is using about 100% CPU of one core (that may include IO wait).
- Can’t see any significant CPU load on the data base or anything else during this time.
- There is IO going on tho. But the disks do not seem to be 100% busy.
504 timeout happens (after 1 minute in my case)
The PHP process/request is stopped at this point without finishing
The uploaded file is not in the destination directory, all the upload segment files are still in the upload directory
Reloading the Nextcloud web interface shows the uploaded file (this takes at most a couple of seconds)
Part files are now gone and the destination file is created

Questions and guesses

What is Nextcloud doing at step 4 for >1 minute of 100% CPU usage until the request is killed? It doesn’t seem to be writing to the destination file as it does not exist when the request is killed.

Whatever it’s doing seems useless as a simple refresh at step 8 creates the complete 5GB file and removes the part files in a few seconds.

My first guess would be that 4 is just writing the chunks to the destination file. And that seems resonable (at least for a suboptimal architecture that forces multiple FS writes of the uploaded data).
But that shouldn’t cause that much CPU load? (Unless IO wait is included in that metric I guess? I’m not that familiar with FreeBSD.)
But that a page reload almost instantly creates the target file (on the file system and in the web UI) seems to contradict that so much time is needed for that action.

Solution by increasing the timeouts

Increasing the timeouts in nginx and fpm solves this for me, but it doesn’t answer the question of what’s actually is going on, which is what I would like to find out.

Add this to your PHP FPM config: request_terminate_timeout = 600
Add this to your nginx config: fastcgi_read_timeout 600;

System info

Nextcloud 19.0.1
PHP: 7.4.8 (with 1GB mem limit)
Nginx: 1.18.0
OS: FreeNAS 11.3 U3.2
ZFS on mechanical drives (~100MB+ in seq write speeds)

Config:

<?php
$CONFIG = array (
  'apps_paths' =>
  array (
    0 =>
    array (
      'path' => '/usr/local/www/nextcloud/apps',
      'url' => '/apps',
      'writable' => true,
    ),
    1 =>
    array (
      'path' => '/usr/local/www/nextcloud/apps-pkg',
      'url' => '/apps-pkg',
      'writable' => true,
    ),
  ),
  'logfile' => '/var/log/nextcloud/nextcloud.log',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'passwordsalt' => 'xxx',
  'secret' => 'xxx',
  'trusted_domains' =>
  array (
    0 => 'localhost',
    1 => '192.168.1.12',
    2 => 'xxx',
  ),
  'datadirectory' => '/mnt/data',
  'dbtype' => 'mysql',
  'version' => '19.0.1.1',
  'overwrite.cli.url' => 'http://localhost',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'oc_ncadmin',
  'dbpassword' => 'xxx',
  'installed' => true,
  'instanceid' => 'xxx',
  'maintenance' => false,
  'theme' => '',
  'loglevel' => 2,
  # Added for testing of this issue (but it didn't seem to have any effect)
  'filelocking.enabled' => false,
  'part_file_in_storage' => false,
);

Nanu · July 21, 2020, 11:53am

Change in the php.ini
upload_max_filesize = 20G
post_max_size = 21G

and restart Apache & php

sudo systemctl restart apache2.service
sudo systemctl restart php7.4-fpm.service

good luck

Qtax · July 21, 2020, 11:59am

If you would have read my post you would have seen that I already say that increasing the timeouts solves this. (Plus your commands are not applicable for nginx & FreeNAS.) But that was not my question.

Nanu · July 21, 2020, 12:38pm

Oh I was a little too fast

sturtz_nate · July 21, 2020, 12:47pm

I don’t know much about nginx & FreeNAS, I use apache2 and ubuntu, can you change the default time out?

Qtax · July 21, 2020, 3:10pm

Yea, there are similar settings in Apache. I don’t know them but should be easy enough to find. Just google for apache php timeout or something similar.

GR4 · September 28, 2020, 12:49pm

For CGI PHP mode you have to add to Apache config (/etc/httpd/conf/httpd.conf):

Timeout 150

150 sec is enough for me. I have SSD disk and my files up to 10Gb.
By default there is no such variable in httpd.conf on CentOS 7 and default timeout is 60 sec.
As a frontend i use nginx and you have to add this line to server section of your ssl vhost config to make nginx wait more:

proxy_read_timeout 150;

PHP max_execution_time is not critical, and i have it set to 30.
After adding these 2 lines, restart nginx and httpd. Big file uploads will not get stuck locked with 504, 423 errors any more.

berniel99 · November 8, 2020, 3:21am

I am not a developer on this project, however, I can tell you what my tests have come up with. My goal was to allow uploads of 50 Gb. So I created a test file of this size.

The upload process (where the chunks are uploaded to the server) works perfectly with NextCloud default values in the php.ini files. The problem appears to a result of what we see on the screen for re-assembly. On my server re-assembly of the 50Gb file takes about 30 minutes. During this time there is no visual update to the client. This means that when “processing…” comes up on the screen, the HTTP server starts it’s “TimeOut” timer (apache in my case). If that timer expires before the reassembly is complete, then two bad things happen. NextCloud reports an error 504, however, once this error has occurred, it no longer communicates with the client, this means that it appears that locks are not released, temporary files are left on the server, etc. etc. To make things even more confusing for the user is that the re-assembly keeps going until it has completed (despite that message).

Now in my case, I set the apache “TimeOut 1800” to allow sufficient time to reassemble the file. Now everything works as designed, with the normal post upload cleanup taking place.

If I were to suggest a design change, it would to omit the “processing …” dialogue and finish the client server communication at that point. Perhaps put up a message to say that processing will take a while and to check back later. Or if you want to be clever, put a notification with conversion is complete.

berniel99 · November 9, 2020, 6:02am

Further to my last post, the observation was for a file dropped into a folder which logged into nextcloud. This morning I confirmed that the behaviour is totally different if you are uploading a file via a shared link.

There seems to be a limit of 7.8G maximum which doesn’t seem to correspond to any known settings.

When uploading this way, the chunked files are not created on the server, which is puzzling.

The partially uploaded files are left on the server and are almost exactly 7.8G in size. The client sees “connection lost with server” and tries again and produces another 7.8G partial file on the server and then quits.

berniel99 · November 9, 2020, 6:50am

I made a breakthrough this afternoon. The error messages in the log are just a symptoms of the fact that the /tmp filesystem is full. I have 7.8Gbyte in my /tmp filesystem.

Without having to install more RAM, I will just have to wait until NextCloud has a better way of managing large files.

Now that we are entering an era where 4K60Hz video files are being generated by most modern phones, there will be an increasing demand for uploading of files in excess of 50 Gb to cloud storage servers.

JohannesS · December 10, 2020, 4:13pm

Is a swapfile going to help with this issue @berniel99, or can we somehow add more storage to the mountpoint from a local disk?

berniel99 · December 10, 2020, 10:18pm

I have solved the problem and thought I would update this forum, since the solution is built into the system but is not the default behaviour

If you specify ‘tempdirectory’ => ‘path for a very fast and larger storage medium such as an SSD’
then uploads are only restricted by the size of the tempdirectory location.

Also, it is not clear that the mechanisms for upload are different when dragging and dropping a file within the browser while logged into nextcloud compared to using a shared link to upload files. Even before I had this trouble, I could upload 60 Gbyte files with not issue and the /tmp filesystem is not used and therefore imposed no restriction. This method of upload chunks the file and stores it.

The shared link method does not chunk the file at all. It is literally streams to the nextcloud temporary storage area which is by default /tmp. When it runs out of space the operation crashes.

This manifests as the 504 Gateway error which is not very helpful.

If developers are looking at this thread, I would suggest that the user error feedback should be given further thought. Also when the upload fails or the client has closed the connection before the upload is complete, it leaves partial files on the server and these are not deleted, as far as I can tell. I had a lot of them to clean up after looking into this problem.

JohannesS · December 12, 2020, 6:28pm

Thanks a lot @berniel99!
Do you have any idea what the behaviour of the sync clients (Android and Windows) is?
In my case syncing of file > 150 MB fail, even though /tmp should be much bigger (I have 4 Gb of RAM plus a swap partition on an SSD).

And where do you specify the tempdirectory to path mapping?

berniel99 · December 12, 2020, 11:39pm

In a default Linux installation the “/tmp” is a RAM disc. If you do run df -h you will see it listed as a mountpoint tmpfs (ie. it is not a physical disk). Unless you configure it with a different size, it will be 50% of your physical RAM. In your case 2GB. The swap partition is never used by applications for storage.

If you are having issues with files >150 MB then there is another issue going on and without some diagnosis on your system, it is hard to pick the thing that is failing.

There is a configuration file which NextCloud uses to specify all sorts of special behaviour.

It is located “/config/config.php”

In that file, you just need a line similar to mine (obviously, pick your own location for the temp data)

‘tempdirectory’ => ‘/home/nextcloudtemp’,

In my case that directory is on an SSD drive. Just make sure that it has read/write permissions for the webserver.

JohannesS · December 14, 2020, 12:41pm

Thanks for the explanation!

In my setup this is actually the case already, pointing to /mnt/datadir/tmp (which is an HDD with only the data dir on it). Owner is correct, too.
Not sure where to start debugging this issue, especially since I get a bunch of other errors in the logs now (permission problems even though everything belongs to www-data as before). I’ll get back to this once the rest is working properly again.

yova · January 7, 2021, 1:11am

If nextcloud is behind a reverse proxy, it might be necessary to set proxy_read_timeout 600; in the nginx config over there.

petesim · May 16, 2021, 8:35am

Am I getting you right, that there is a filesize limit in half of RAM, and if I have 2 GB of RAM and 1 TB of external storage than user will be never able to sync his large DVD-image via the instance?

berniel99 · May 24, 2021, 5:59am

If you leave the temporary directory default, then I suspect the answer is yes. If that is not what you want then make sure you set the ‘tempdirectory’ variable to somewhere else. This has worked for me as I have regular uploads and downloads from my clients for files in excess of 40 GB.

gomme600 · February 1, 2022, 7:40am

Hello,

I’m having the same problem but I’m using the latest official docker image on a raspberry pi 4 8gb along with jc21/nginx-proxy-manager.
Would anyone be able to help me modify the correct PHP files, etc…? As everything is in a docker container I’m unsure of their location.

Thanks,
Seb

wonx · February 5, 2022, 8:03pm

Here too, same problem.

The file should be saved in an external storage using sftp in my case.