Unable to login (504), but instance works well

Nextcloud version (eg, 20.0.5): 22.2.5.1
Operating system and version (eg, Ubuntu 20.04): debian 10.12
Apache or nginx version (eg, Apache 2.4.25): nginx 1.14.2
PHP version (eg, 7.4): PHP 7.3.31-1~deb10u1

The issue you are facing:
When I want to login (from my computer) on my instance, I got a `504 Gateway Time-out from nginx.
The very strange things is : My calendar, albums and shared links still work and are accessible perfectly.
It seems that the 504 only occurs on the /login/ page.
Accessing my instance via the android application also works as normal.

Is this the first time you’ve seen this error? (Y/N): Y

Steps to replicate it:

  1. Go on the public url (leads to /login page) via computer browser (several tested, but Firefox mainly used)
  2. Enter credentials
  3. Got the 504 error

The output of your Nextcloud log in Admin > Logging:

not accessible

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'instanceid' => 'xxx',
  'passwordsalt' => 'xxx',
  'secret' => 'xxx',
  'trusted_domains' =>  
  array (
    0 => 'xxx',
  ),  
  'datadirectory' => '/var/www/cloud/data',
  'dbtype' => 'mysql',
  'version' => '22.2.5.1',
  'overwrite.cli.url' => 'https://xxx',
  'dbname' => 'xxx',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'nextcloud',
  'dbpassword' => 'xxx',
  'installed' => true,
  'maintenance' => false,
  'mail_from_address' => 'cloud',
  'mail_smtpmode' => 'smtp',
  'mail_smtpsecure' => 'tls',
  'mail_sendmailmode' => 'smtp',
  'mail_domain' => 'xxx',
  'mail_smtpauthtype' => 'PLAIN',
  'mail_smtphost' => 'xxx',
  'mail_smtpport' => '587',
  'mail_smtpauth' => 1,
  'mail_smtpname' => 'xxx',
  'mail_smtppassword' => 'xxx',
  'updater.secret' => 'xxx,
  'theme' => '',
  'loglevel' => 2,
  'has_rebuilt_cache' => true,
);

The output of nginx error log in `/var/log/nginx/error.log:

2022/03/31 00:00:27 [error] 12799#12799: *1200077 upstream timed out (110: Connection timed out) while reading response header from upstream, client: x.x.x.x, server: xxx    , request: "POST /login HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.3-fpm.sock", host: "xxx"

Partial copy of the site-enabled config in nginx (if needed, I can provide full copy):

        location ~ \.php(?:$|/) {
                fastcgi_split_path_info ^(.+?\.php)(/.*)$;
                set $path_info $fastcgi_path_info;

                try_files $fastcgi_script_name =404;

                include fastcgi_params;
                fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
                fastcgi_param PATH_INFO $path_info;
                fastcgi_param HTTPS on; 

                fastcgi_param modHeadersAvailable true;         # Avoid sending the security headers twice
                fastcgi_param front_controller_active true;     # Enable pretty urls

                fastcgi_intercept_errors on; 
                fastcgi_request_buffering off;

                fastcgi_pass unix:/var/run/php/php7.3-fpm.sock;
                fastcgi_param proxy_read_timeout 3600; # Trying to solve 504 issue
                fastcgi_param proxy_send_timeout 3600; # Trying to solve 504 issue
        }

You get a timeout on the php-fpm socket. Do you get this just by opening the page or after entering your credentials?

You already increased the timeouts to 1h, does it now timeout before? Perhaps something is killing the php process (due to resource limits) or it runs into a conflict (best case: should end up in some logfile of php-fpm or Nextcloud). Or you don’t have enough php processes available to run all stuff and the current ones are busy to handle your calendar and stuff?

Thanks for the answer.

I’m getting the timeout way before 1h, after I entered the credentials and click “Login”

It’s a 2 users instance not really heavily used (at all).
Is there a way to increase the number of PHP Processes, just to be sure ?

My general load average is very low on this server (dedicated) …

If you run low on these resources, you should be able to see in the logs. But that’s in your php-fpm configuration.

There is nothing in the Nextcloud logs either?

If you speak about the /data/nextcloud.log, there is only this on DEBUG mode level:

{"reqId":"xxx","level":0,"time":"2022-03-31T21:44:58+00:00","remoteAddr":"x.x.x.x","user":"--","app":"carnet","method":"POST",
"url":"/login","message":"/appinfo/app.php is deprecated, use \\OCP\\AppFramework\\Bootstrap\\IBootstrap on the application class instead.",
"userAgent":"xxx","version":"22.2.5.1"}
{"reqId":"xxx","level":0,"time":"2022-03-31T21:44:58+00:00","remoteAddr":"x.x.x.x","user":"--","app":"files_sharing","method":"POST",
"url":"/login","message":"/appinfo/app.php is deprecated, use \\OCP\\AppFramework\\Bootstrap\\IBootstrap on the application class instead.",
"userAgent":"xxx","version":"22.2.5.1"}
{"reqId":"xxx","level":0,"time":"2022-03-31T21:45:01+00:00","remoteAddr":"x.x.x.x","user":"--","app":"carnet","method":"GET",
"url":"/apps/photos/service-worker.js","message":"/appinfo/app.php is deprecated, use \\OCP\\AppFramework\\Bootstrap\\IBootstrap on the application class instead.",
"userAgent":"xxx","version":"22.2.5.1"}
{"reqId":"xxx","level":0,"time":"2022-03-31T21:45:01+00:00","remoteAddr":"x.x.x.x","user":"--","app":"files_sharing","method":"GET",
"url":"/apps/photos/service-worker.js","message":"/appinfo/app.php is deprecated, use \\OCP\\AppFramework\\Bootstrap\\IBootstrap on the application class instead.",
"userAgent":"xxx","version":"22.2.5.1"}
{"reqId":"xxx","level":0,"time":"2022-03-31T21:45:01+00:00","remoteAddr":"x.x.x.x","user":"--","app":"no app in context","method":"GET",
"url":"/apps/photos/service-worker.js","message":"Current user is not logged in",
"userAgent":"xxx","version":"22.2.5.1","exception":{"Exception":"OC\\AppFramework\\Middleware\\Security\\Exceptions\\NotLoggedInException","Message":"Current user is not logged in","Code":401,"Trace":[{"file":"/var/www/cloud/lib/private/AppFramework/Middleware/MiddlewareDispatcher.php","line":97,"function":"beforeController","class":"OC\\AppFramework\\Middleware\\Security\\SecurityMiddleware","type":"->","args":[{"__class__":"OCA\\Photos\\Controller\\ApiController"},"serviceWorker"]},{"file":"/var/www/cloud/lib/private/AppFramework/Http/Dispatcher.php","line":118,"function":"beforeController","class":"OC\\AppFramework\\Middleware\\MiddlewareDispatcher","type":"->","args":[{"__class__":"OCA\\Photos\\Controller\\ApiController"},"serviceWorker"]},{"file":"/var/www/cloud/lib/private/AppFramework/App.php","line":156,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\Photos\\Controller\\ApiController"},"serviceWorker"]},{"file":"/var/www/cloud/lib/private/Route/Router.php","line":302,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\Photos\\Controller\\ApiController","serviceWorker",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"_route":"photos.api.serviceWorker"}]},{"file":"/var/www/cloud/lib/base.php","line":1006,"function":"match","class":"OC\\Route\\Router","type":"->","args":["/apps/photos/service-worker.js"]},{"file":"/var/www/cloud/index.php","line":36,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/cloud/lib/private/AppFramework/Middleware/Security/SecurityMiddleware.php","Line":141,"CustomMessage":"Current user is not logged in"}}

You can also log php-fpm errors:

It’s enough to log the errors, don’t display permenantly errors since this could give indicators for possible attackers.

nothing really interesting here.
I had some

WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

But not at the time of the timeout. I tried to put pm.max_children to 15, but same behaviour.

What do you mean by that btw ? Should I edit/remove the logs from this post ?

Following up on my issue, I’ve updated to Debian 11.3 (Bulleyes stable) and php7.4 …
I’m still getting the same error and even with php-fpm log level set to debug, I’ve nothing else that the previous error (on nginx error log file):
2022/04/04 22:58:18 [error] 23686#23686: *75009 upstream timed out (110: Connection timed out) while reading response header from upstream, client: x.x.x.x, server: xxx.xxxx.xxx, request: "POST /login HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.4-fpm.sock", host: "xxx.xxxx.xxx"

I’ve roughly counted ~50sec before the timeout occurs, once I clicked on “Login” button.