Nextcloud web interface unusable after changing internet gateway

Short description: The front-end website for Nextcloud is extremely slow after changing the internet gateway on the network, taking over a minute to load any new page.

Nextcloud version (eg, 20.0.5): 19.x, 20.x, 21.0.9, 22.2.10, etc. They may be older, but I’ve done four updates so far and nothing has changed so I’m certain the version isn’t the issue (among other reasons; see below)
Operating system and version (eg, Ubuntu 20.04): Ubuntu 20.04
Apache or nginx version (eg, Apache 2.4.25): Apache 2.4.41
PHP version (eg, 7.4): 7.4.3

The issue you are facing:
After changing the internet gateway from a Comcast Business Internet gateway to a Verizon Cellular Business gateway, the Nextcloud website is extremely slow when accessing it via the domain name. The website behaves normally and almost instantaneously when accessed by its direct internal IP while on the same network. When accessed by its domain, the site is unusable. Before this change to the network was made, everything worked perfectly.

Several Nextcloud updates have been applied with no change to this behavior. It worked normally for two days; when the gateway was switched, I set the DMZ host to the router (the setup was internet in → gateway → router → server) to handle incoming requests; today however, that stopped working through no one’s doing, and I instead plugged the server directly into the gateway and copied the port forwarding settings from the router to the gateway. The desktop and mobile clients seem unaffected, as well as the other non-Nextcloud pages on my web server; it is only the Nextcloud site that is facing this issue.
(edit: the desktop client app also keeps losing connection off and on. no other service that the system is running has these issues)

Is this the first time you’ve seen this error? (Y/N): Yes

Steps to replicate it:

  1. Go to the Nextcloud instance’s domain name in a browser

The output of your Nextcloud log in Admin > Logging:
Anything potentially useful is buried underneath literally endlessly-flooding messages (at the rate of several per second) of

Debug - files_sha... /appinfo/.app.php is deprecated, use \OCP\AppFramework\Bootstrap\IBootstrap on the application class instead.

and this changed to several other messages after disabling some apps i don’t use; see the nextcloud.log provided below

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'instanceid' => 'lol',
  'overwrite.cli.url' => 'https://my.domain/nextcloud',
  'htaccess.RewriteBase' => '/nextcloud',
  'passwordsalt' => 'lol',
  'secret' => 'lol',
  'overwriteprotocol' => 'https',
  'trusted_domains' =>
  array (
    0 => '192.168.0.152',
    1 => 'my.domain',
  ),
  'datadirectory' => '/mnt/nextcloud-hdd/data',
  'dbtype' => 'mysql',
  'version' => '22.2.10.2',
  'dbname' => 'NextcloudDB',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'oc_lol1',
  'dbpassword' => 'lol',
  'installed' => true,
  // mail-related entries removed...,
  'maintenance' => false,
  'updater.secret' => 'lol',
  'theme' => '',
  'loglevel' => 0,
);

The output of your Apache/nginx/system log in /var/log/____:
Not many relevant messages in error.log besides these:

[Fri Jun 23 10:58:37.620441 2023] [autoindex:error] [pid 7064] [client 192.168.0.1:49622] AH01276: Cannot serve directory /var/www/nextcloud: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by Options directive
[Fri Jun 23 11:07:45.026648 2023] [autoindex:error] [pid 2028] [client 174.203.134.159:7266] AH01276: Cannot serve directory /var/www/nextcloud: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by Options directive
[Fri Jun 23 11:55:41.941601 2023] [autoindex:error] [pid 7132] [client 127.0.0.1:60282] AH01276: Cannot serve directory /var/www/nextcloud: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by Options directive
[Fri Jun 23 12:05:53.971810 2023] [autoindex:error] [pid 2028] [client 192.168.0.1:60946] AH01276: Cannot serve directory /var/www/nextcloud: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by Options directive
[Fri Jun 23 12:06:02.984619 2023] [php7:error] [pid 9596] [client 192.168.0.1:60968] script '/var/www/nextcloud/apps/files/remote.php' not found or unable to stat
[Fri Jun 23 12:06:26.424744 2023] [php7:error] [pid 1446] [client 192.168.0.1:60984] script '/var/www/nextcloud/apps/files/remote.php' not found or unable to stat
[Fri Jun 23 12:08:28.336643 2023] [php7:error] [pid 2028] [client 192.168.0.1:61027] script '/var/www/nextcloud/apps/files/remote.php' not found or unable to stat
...
[Fri Jun 23 21:53:28.700710 2023] [mpm_prefork:notice] [pid 1419] AH00169: caught SIGTERM, shutting down
[Fri Jun 23 21:53:28.881551 2023] [mpm_prefork:notice] [pid 58381] AH00163: Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f configured -- resuming normal operations
[Fri Jun 23 21:53:28.881616 2023] [core:notice] [pid 58381] AH00094: Command line: '/usr/sbin/apache2'
[Fri Jun 23 21:54:42.298121 2023] [autoindex:error] [pid 58777] [client 98.219.180.50:56163] AH01276: Cannot serve directory /var/www/nextcloud: No matching DirectoryIndex (none) found, and server-generated directory index forbidden by Options directive

Output errors in nextcloud.log in /var/www/ or as admin user in top right menu, filtering for errors. Use a pastebin service if necessary.
nextcloud.log:

https://drive.google.com/file/d/1OTuY-cJFv0Km3OF853MDyQtqiIQmgWkn/view?usp=sharing

First things first, let’s change your loglevel to something more sane like 2. Having it at 0 will hurt performance and also make it challenging because of all the noise. If we need it at DEBUG level (i.e. 0) for a specific purpose we can always turn it back on.

It sounds to me like previously either:

  1. Your other provider’s “gateway” or your existing router (same router, correct?) was doing some hairpin routing for the external IP address associated with my.domain.
  2. You had static DNS entry on your caching DNS resolver for my.domain that pointed at the internal IP address. This effectively overrode IP address used to access NC by internal clients (with some local device related caching caveats). If this was overlooked during the change, this may have continued to work for a bit due to local network and device DNS caching… then it would have slowly dropped off as caches expired (and picked up the external IP address instead).

Or, possibly, both.

From the sounds of some of your symptoms I’d lean a bit towards #2 being what you primarily were relying on previously (though hairpin routing may have been in place as a fallback and - from the sounds of it - is in place right now albeit sub-optimally and now as your primary means of connectivity).

Do you happen to recall whether - prior to your “gateway” change - pinging my.domain whenever you were located inside of your network returned the public or the private IP address?

A lot times in a home or small business setting the router provides local DNS caching service (which usually includes simple static DNS mapping), but it could have been on your old “gateway” device too. Do you happen to know if you’re running any DNS services on either device?

Just to confirm a few other things… (mostly to make sure I’m understanding your topology accurately)…

Is HTTPS in-play under both access scenarios:

https://192.168.0.152/
https://my.domain/
?

And my.domain presumably maps to an external (public) IP address, correct? With this change in your Internet connectivity I presume you also had to deal with modifying the DNS for my.domain, correct?

On consumer and small business routers/gateways hairpin routing is often unreliable (or, at best, finicky to get working reliably). Frankly I wouldn’t rely on hairpin routing (other than as a fallback). I’d focus on working out the DNS side of things.

You may also want to check out https://speed.cloudflare.com/ since you’re got a new Internet connection. It gives a bit more data about the realistic performance under a few different scenarios. It could also be that Nextcloud’s web interface is bringing out some issues in your new connectivity (it has a pretty heft payload at the beginning).

Can you give an example of the other services on the same site that seem unaffected so far? I presume you’re only referring to ones that have external facing access, correct?

1 Like

I really appreciate the thorough response!

I did manage to stumble upon a solution I wasn’t expecting…

I changed the loglevel to 2 as suggested, and now the log reads more like this:

{"reqId":"d2tJMTwrOu0ygOI1u4UU","level":3,"time":"2023-06-25T01:39:34+00:00","remoteAddr":"192.168.0.152","user":"jboby93","app":"no app in context","method":"POST","url":"/nextcloud/apps/text/session/sync","message":"Could not boot richdocuments: Could not resolve OCA\\Federation\\TrustedServers! Class OCA\\Federation\\TrustedServers does not exist","userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0","version":"22.2.10.2","exception":{"Exception":"OCP\\AppFramework\\QueryException","Message":"Could not resolve OCA\\Federation\\TrustedServers! Class OCA\\Federation\\TrustedServers does not exist","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","line":131,"function":"resolve","class":"OC\\AppFramework\\Utility\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/ServerContainer.php","line":161,"function":"query","class":"OC\\AppFramework\\Utility\\SimpleContainer","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/DependencyInjection/DIContainer.php","line":435,"function":"query","class":"OC\\ServerContainer","type":"->"},{"file":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","line":228,"function":"query","class":"OC\\AppFramework\\DependencyInjection\\DIContainer","type":"->"},{"file":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","line":156,"function":"updateCSP","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/FunctionInjector.php","line":67,"function":"OCA\\Richdocuments\\AppInfo\\{closure}","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/BootContext.php","line":51,"function":"injectFn","class":"OC\\AppFramework\\Bootstrap\\FunctionInjector","type":"->"},{"file":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","line":158,"function":"injectFn","class":"OC\\AppFramework\\Bootstrap\\BootContext","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/Coordinator.php","line":178,"function":"boot","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":205,"function":"bootApp","class":"OC\\AppFramework\\Bootstrap\\Coordinator","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":139,"function":"loadApp","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/lib/base.php","line":988,"function":"loadApps","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/index.php","line":36,"function":"handleRequest","class":"OC","type":"::"}],"File":"/var/www/nextcloud/lib/private/AppFramework/Utility/SimpleContainer.php","Line":120,"CustomMessage":"Could not boot richdocuments: Could not resolve OCA\\Federation\\TrustedServers! Class OCA\\Federation\\TrustedServers does not exist"}}

and many many more repeats of this and similar errors relating to richdocuments and Federation. I had disabled Federation as a troubleshooting step, and I guess that broke richdocuments; disabling richdocuments and richdocumentscode seems to have restored normal operating speed and stability for Nextcloud both internally and externally, but I’m going to give it some time to make sure all stays well. So far it is, nothing flooding the logs anymore. Absolutely zero clue why a modem change would cause this sort of issue to surface in the first place, but whatever changed was apparently enough to make the server DDoS itself with errors lol

To answer the rest of your questions, however…

As far as I know, neither the old nor the new gateway (substituting that term with modem, to clear it up) is running any sort of DNS service or caching. The server that runs Nextcloud is using No-IP’s dynamic DNS updater to keep my domain name pointed to the correct public IP. I force-restarted the DDNS service on the server several times to let it resync its settings while I was troubleshooting.

HTTPS is indeed being used for both Nextcloud access scenarios (public domain and private IP). The SSL certificate is through Let’s Encrypt and installed with their CertBot tool. The same SSL certificate is being used for other pages on my web server as well, and there is one active Apache2 site configuration that includes the Alias/Directory directives for /nextcloud.

I do not recall if pinging the domain name from inside the network returned the public IP or the private IP. What I do know is that I was always able, while inside the network, to access the server via its domain name (web server, Nextcloud, SSH, etc).

To clear up the network topology,

The old setup was: internet comes into the Comcast modem, which feeds internet into a Netgear router, which had the port forwarding settings and was the access point for connecting to the network. The server was connected via ethernet to the router and had a static internal IP.

When the Comcast modem was changed out for the Verizon one and I noticed Nextcloud had disconnected on my home computer, I remoted into the server to attempt to fix it. The server is running at my work office, so no change was made to the way everything was physically plugged in besides switching out the modem. I turned on DMZ on the modem, and set the DMZ host to the router, so that it could continue to handle the port forwarding as needed. This worked for a few days, and then inexplicably stopped; instead, my domain name ended up pointing to the Verizon modem’s admin login page… not good.

When I was next in the office, the only way I was able to get everything (minus Nextcloud) to a working state was to bypass the router and plug the server directly into the Verizon modem. DMZ was disabled, and the port forwarding settings were re-entered into the modem. This is the current state of the network.

As for other services running on the server that were unaffected, I am running: a Plex server for media streaming, an SSH server for remote access, a few personal website pages (though nothing as complex as Nextcloud, mostly simple static pages), and on occasion I use it to run Minecraft servers.

As mentioned closer to the beginning of this, I was eventually able to restore normal access speed and stability by disabling the richdocument and richdocumentcode apps, which had been spewing relentless errors about Federation not being present (because I disabled it while troubleshooting). After about a day, all still seems fine with it. Still zero idea why something like this would surface from a modem replacement :thinking: