Photo viewing cripples server

wojtek_alma · February 23, 2024, 4:35pm

@spyro sure thing maybe not exactly replacement per se, rather kind-of-replacement-thing, but Im good

spyro · February 24, 2024, 12:37pm

Can you share what it is?

spyro · February 24, 2024, 12:59pm

Oh, for anyone else scrolling by, There hasn’t been any solution for a week now.

Here is what I’ve found in the mean time:

NC is not capable of running adequately on anything below about a 8 core 4GHz PC with 16+GiB of RAM.

Im not kidding. I’ve got an ARM and an x86-64 box with 4 and 8GiB respectively and 6 and 4 cores respectively, all around 2GHz. NC cripples them. Completely. Utterly.

NC core infratructure is woefully bad. Core functionality such as federation and image preview generation may work in very select use-cases, but utterly DO NOT in mine - the default settings for preview generation will cause both my servers to DoS themselves if you put 20k photos on them and then scroll through the photos app - guaranteed DoS, every time, since the photos app does not rate limit requests for previews.
If you’re not an enterprise customer, no-one cares. if your problem doesnt have a 10 second fix, it’ll be ignored.

Here’s what I’ve found to help:

disable preview generation (use the preview-generator app to replace this functionality)
‘enable_previews’ => false,
configure preview-generator to generate sane preview sizes:
‘preview_max_x’ => 256,
‘preview_max_y’ => 256,
I don’t know if these help or not, as I dont know if they apply to the built in preview generation or the preview-generator app, and frankly, I can’t be bothered to find out. The whole experience is too disheartening.

Can’t hurt though:

‘preview_concurrency_all’ => 4,
‘preview_concurrency_new’ => 2,

NOTE: limiting the preview_max_* values will result in the files app showing blurry images unless you set enable_previews to false.
ANNOYINH NOTE: setting enable_previews to false does NOT make the memories app use full size images for display

Hope this helps.

spyro · February 24, 2024, 1:07pm

Rinkana: It’s not disk speed or access time - iostat show the disk is practically idle - even on the ARM box, which as the slowest disk, “only” benching about 200MiB/s (ie. pretty much maxing out it’s Gen2 PCIe x2 SATA card, which is handling ~600MiB/s to the 3 disks)

Likewise - its not the network - there’s sod-all else on my LAN, and I know it’ll easily hit 300Mbit/s even from my craptop)

There’s nothing in the logs at all to indicate why its so pathetically slow.

I’ve disabled previews to see what happens, and you can actually watch the JPEGs peel down / across the screen, 1990 Netscape Navigator style, with a 3.1MiB JPEG taking about 8-12 seconds to get downloaded and rendered, which is just utterly tragic.

jtr · February 24, 2024, 4:07pm

Please post:

occ config:list system (or equivalent)
occ app:list
Your actual Nginx config
Your actual FPM config

For what it’s worth, my daily driver that runs my production Nextcloud instance as well as numerous test ones is this:

Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz

It’s a modest older thin client with 6 cores.

I currently have 18 (!) independent Nextcloud Server deployments (Docker-based) running on this device simultaneously. Most Apache/mod_php/MariaDB and a handful that are Nginx/FPM/PostgreSQL.

Memory usage, excluding OS buff/cache, is <4 GB total. But of course that can vary depending on what I’m doing / others are doing at the time so take it with a grain of salt. I’m not mentioning this as a “You must be doing something wrong!” statement, but merely to provide some context of where I’m coming from.

I also have a Raspberry Pi 4 (8 GB) where I hosted my main Nextcloud instance previously. I did extensive testing there - though it was awhile ago now - and the main file transfer problem on a Pi involving Nextcloud is the lack of AES-NI (which destroys HTTPS performance, but isn’t really the fault of Nextcloud).

A lack of AES-NI shouldn’t be applicable here… since it looks like your CPU supports AES-NI acceleration (though needs to be enabled in the BIOS - is aes listed under /proc/cpuinfo and/or is an “AES” message mentioned by the kernel at start-up?).

iperf3 is reporting 280Mbis/s over my WiFi to the server, but NC manages an abs… max of ~1.2MB/s.

So if the upload is slow for some other reason, WHY? it’s got gigabit ethernet, it works FAST for wget / iperf. so it’s not the network. DNS is correct. certs are correct.

Good question. See above. Let’s see your configs so we can see what we’re working with.

My upload speeds (testing against v28.0.2) from a workstation at the far end of my house over so-so wifi are anywhere from 74 Mbit/s to 273 Mbit/s (testing with ~512MiB file sizes).

If you’re only getting 1 MB/s (aka: 8 Mbit/s), let’s focus on that matter first since it’s (a) clearly a problem that doesn’t match up with what one expects; (b) could be causing other problems.

Has the whole world forgotten how to write anything that remotely resembles high performance software?

Or is it just the usual case of the documentation being too sh*t for anyone but the developers to actually know how to set it up?

Maybe it’s a documentation issue. Maybe something is actually wrong with the code. Maybe it’s something in your environment. Maybe it’s a mixture. Hard to say. We’d need to see your actual configs and figure out what got missed.

I’m also be curious what the following look like during a file transfer:

browser console logs (Network tab)
your FPM logs

spyro · February 24, 2024, 6:20pm

Ok, so your so-called “modest thin client” is over 6x faster than my Atom C2550.

I still say this /DOES NOT MATTER/ since scp is managing 250Mbit/s (right now, I checked).

occ config:list system results in a (badly) redacted copy of my config.php

I say badly, because bad is worse than unredacted - it looks redacted, but it missed some important values, so I wont be posting THAT…

The maybe relevant lines from this are:

    "enable_previews": false,
    "preview_concurrency_all": 4,
    "preview_concurrency_new": 2,
    "preview_max_x": 256,
    "preview_max_y": 256,
    "preview_ffmpeg_path": "\/usr\/bin\/ffmpeg",
    "enabledPreviewProviders": [
        "OC\\Preview\\BMP",
        "OC\\Preview\\GIF",
        "OC\\Preview\\JPEG",
        "OC\\Preview\\Krita",
        "OC\\Preview\\MarkDown",
        "OC\\Preview\\MP3",
        "OC\\Preview\\OpenDocument",
        "OC\\Preview\\PNG",
        "OC\\Preview\\TXT",
        "OC\\Preview\\XBitmap",
        "OC\\Preview\\Movie",
        "OC\\Preview\\PDF"
    ]

occ app:list

I mean, you could have just asked if I had installed any apps… and I don’t feel like sharing that list either - just accept that the only apps I have installed following the NC28 install, have been memories, preview-generator, and google-integration. They will all be the current versions, and I’ve already given all this info.

nginx config (for the site) is comepletly stock as per the setup guide.

Dunno what you want for fpm config, so you’ll have to clarify.

You “18” nextcloud deployments clearly have either a ton of RAM each, or non-default configurations, or very very few files on them, because even with a few thousand photos, the default “completely unlimited” photo preview generation will OOM the 8GiB machine, let alone the 4GiB one, if you scroll about in the photo browser. every time.

Sat idle, the 8GiB machine has over >7GiB free/buff/cache and the 4GiB one has 3.5GiB.

aes is available in /proc/cpuinfo on both machines.

You’ll need to be clearer about what you want from the browser console, and I’ve never even seen a log from fpm, so I can’t give you that.

I’ve completely removed the WiFi from the equation, and moved my laptop to the wired LAN. scp is (as above) managing ~250Mbit/s whilst NC28 is managing about 1.07 according to bmon on both my laptop and the NC server. (not simultaneously, obviously).

spyro · February 24, 2024, 7:28pm

Hm. Well I dont know if I reached the required level of frustration or not, but it wasn’t working, so I took it upstairs to try changing which port it was plugged into, and it behaved bizarrely. Now I can’t make it misbehave, and it’s getting ~6MiB/s on the sh*t wifi, ~12.5MiB/s on the slightly better laptop, and ~25MiB/s on the wired.

All that changed is moving the laptop around (no config changes) so GOK what its playing at…

Wild.

jtr · February 24, 2024, 7:32pm

Okay, well, I went back and pulled a few things from one of your other posts:

‘trusted_domains’ =>
  array (
    0 => 'domain.uk', # <----- domain of my NC box
    1 => ‘1.2.3.4’, # <------ wan IP of my router
    2 => ‘192.168.200.100’,# <---- internal IP of NC box
),

Does domain.uk just resolve directly to your router’s WAN IP? (the 1.2.3.4 above)? And does it resolve to the same IP address internally and externally?

‘overwriteprotocol’ => ‘https’,
‘overwritehost’ => 'domain.uk',
‘overwritecondaddr’ => ‘^1\.2\.3\.4$’,
‘overwrite.cli.url’ => ‘https://domain.uk’,

I’m not clear what you’re trying to achieve above[1]. Can you elaborate on your goal(s)?

// IIRC, this fixed cert issues when on the LAN, rather than WAN
‘trusted_proxies’ =>
  array (
// WAN IP on my router
    0 => '1.2.3.4, ),

Are your internal (LAN) clients all being NAT’d to originate from 1.2.3.4 or something?

You’ll need to be clearer about what you want from the browser console,

The transactions that appear in the Network tab of the console, while attempting an upload, may give some indications of weird behavior (like http->https redirections, anything appearing in red, etc.)

P.S. I’m choosing to focus on file transfers because that seems particularly fundamental (even more so than previews). But since you did now post your preview configuration, I suggest turning off Movie previews (which are not on by default[2]) until you’ve gotten a standard preview configuration to function reasonably (because otherwise you’re just exasperating the problem). Disable PDF previews too while you’re at it.

[1] The above overwrite set-up says: If I’m connecting from REMOTE_ADDR 1.2.3.4 (i.e. a client with that as their source IP address) then I want to overwrite Nextcloud’s internal detection of the host/protocol values and forcethe visitor to use https://domain.uk as the URL.

If you’re using NAT and your internal clients hit your router (which seems to be 1.2.3.4) then they may appear to be coming from 1.2.3.4 and thus trigger this condition.

Though I’m unclear which URL(s) you are using to access your instance day to day.

Ideally you use the same URL internally and externally (even if they resolve to different IP addresses to, say, bypass your WAN IP/router) because otherwise you’ll encounter weird issues with mismatched URLs in some places, TLS problems, and broken file sharing.

[2] Configuration Parameters — Nextcloud latest Administration Manual latest documentation

spyro · February 24, 2024, 7:46pm

Hi,

domain.uk resolves to the routers WAN IP.
domain.uk resolves to the NC servers LAN IP if resolution hits the local DNS server running on the router.
Nat loopback ought to take care of any hairpin routing issues (for devices that dont use the DHCP-advertised DNS server)

‘overwriteprotocol’ => ‘https’,
‘overwritehost’ => 'domain.uk',
‘overwritecondaddr’ => ‘^1\.2\.3\.4$’,
‘overwrite.cli.url’ => ‘https://domain.uk’,

This bit was required to get the various machines to stop complaining about ceritificate issues - I appreciate that that’s vague - I stopped fiddling with it once I got all the devices to work without complaining. It may be overkill. I have no idea what it does, it was scraped from somewhere on the internet (possibly here).

Lan clients aren’t natted to appear from 1.2.3.4 (WAN IP) unless their originating device actually tried to send the packet to the WAN - this (IIRC) was intended to catch packets sent by devices that started on the WAN and then moved to the LAN.

I’ll have a prod at the network tab (although http ->https redirections shouldnt be happening because Im not serving http at all).

Btw - how do you quote posts on here? Its not obvious, and its making replying coheretly hard…

Your last paragraph is why that redirect stuff exists. With it set up as you describe, I got endless certificate issues.

The “weird issues” you describe - can they be detected? where ill I find evidence of them? what are they?

spyro · February 24, 2024, 8:25pm

update:

I’ve disabled the PHP opcache and jit.
I’ve removed the redis settings from config.php
I’ve removed the redirect related lines from config.php
I’ve disabled previews

The darn thing is still behaving - and the annoying bit is there is no indication of anything before or after in the logs that suggests why.

The configuration is now (other than the basics) identical to the one on the wiki, except disabling previews.

The impact of the php opcache and redis being disabled appears to be close to zero, even on this “clunker” of a machine - which I’d expect, given it has at most 3 users (in practice, one right now). It’s maybe a tiny bit slower during login.

It makes no sense to me that several clients (linux desktop, apple, android (rooted and unrooted, the phonetrack app, owntracks, and the iphone tracker whos name escapes me) all had certificate issues without those changes (the result of hours of searching), and yet now, with those changes revoked, they all work fine. I’ve removed and reinstalled the client on my phone in disbelief, but it stubbornly works, despite my best efforts.

I’m waiting for the other shoe to drop now - maybe the devices have caches the certificates and aren’t checking (which seems unlikely to me, as it’d be an obvious flaw)?

I’ll try re-enabling the opcache and redis, see if it remains working…

spyro · February 24, 2024, 8:40pm

… and both machines continue to behave all of a sudden. Bizarre.

Maybe I’ll try my luck and see if they federate…

spyro · February 24, 2024, 8:41pm

is it really not possible to get the photos and memories apps to show the original file when clicked on? It’s such a shame - I’d really rather not lose the disk space to full resolution previews…

spyro · February 24, 2024, 8:50pm

Okay, this looks significant…

 ConnectException cURL error 7: Failed to connect to xxxxxx.uk port 80 after 23 ms: Couldn't connect to server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for xxxxxxx.uk/ocm-provider/
error while discovering ocm provider

First time thats showed up in the logs. And for some reason, its failing earlier when I try to use the federated share (video). It used to take ~30s, now its more like 10.

Hmmm. I seem to remember seeing a bunch of 10s timeouts in the code…

spyro · February 24, 2024, 9:08pm

ok, so there’s a second error going hand in hand with this, in
/var/www/nextcloud/apps/files_sharing/lib/External/Cache.php :

Undefined array key 1 at /mnt/vg0-data/www/nextcloud/apps/files_sharing/lib/External/Cache.php#41

I’ve modified the line to read:
$remote = explode(‘://’, $cloudId->getRemote(), 2);

which “got rid” of the error (im not a php coder) .

Now it’s back to the 30 second cURL timeouts:

index
ConnectException cURL error 28: Operation timed out after 30000 milliseconds with 70638784 out of 1309419430 bytes received (see libcurl - Error Codes) for https://xxxxxx.uk/public.php/webdav/

spyro · February 24, 2024, 9:20pm

MY gut feeling is that it’s the one above referring to port 80

I have nothing on port 80.

I assume something, somewhere in nextcloud is generating a bad URL, assuming redirection to be in place?

I’d prefer not to open port 80 unless I have to. Is it required?

jtr · February 24, 2024, 9:44pm

Does URL reflect the one listed for the remote user share? (the one that says “remote” for this file/folder under Sharing)?

EDIT: Scratch that… I was thinking of the other end. Is this an old share by chance or one you’re doing right now in near real-time?

It sort of sounds like it has a bogus URL (perhaps if its an old share from while you were testing or connected to the source server via http://some_other_url or something - maybe awhile back?). This could happen, I think, if you did any of the involved shares before the trusted_domain and overwrite* parameters and your internal DNS got all settled in.

spyro · February 24, 2024, 11:17pm

Unfortunately, the share has been recreated a fair few times, but it’s not working any better. the errors have changed in the last hour though.

the “memories” indexer is provoking this error, which seems more self-explanatory than the weird cURL error:

Failed to index file /username/files/bni (2).mp4: Failed to get local file: Server error: GET https://xxxxxxxx.uk/public.php/webdav/ resulted in a 502 Bad Gateway response

xxxxx.uk is the remote server, btw.

I have no idea why it might be giving a 502 here… it seems to be up and serving the web ui otherwise.

Also my webUI does not show “remote” anything under sharing.

Your last paragraph I understand from a hand-wavy level, but it doesnt give me enough to dig further - how can I be sure that any old state has been purged?

edit: on the remote server, it does show something on sharing,

myuser@https://otherbox.uk (remote)

no trailing slash either

everything seems to be set up right, but it’s not working…

SmallOne · February 25, 2024, 6:31am

Imaginary is a small addon that will help with this. It will resize on the fly and it is pretty quick

Previews is used due to problems to serve large images. One is scaling in browsers one is timeouts on slow connection. One is that if nc needs to rescale an image it uses cpu.

Many photo gallery apps use (web apps) uses preview generation to gain the ability to show things faster. And yes you will loose some space.

It seems like you have a combination of issues where some media is federated and takes time and some is local and previews of many small files eats connections and cpu at the same time.

spyro · February 25, 2024, 6:41pm

@SmallOne

Federation is not really working at all. the machines can “see” each other (admin->sharing) but the “lemon thing” is yellow and not green.

I dont see why 20Mbits should be too slow. I don’t mind waiting, but nothing actually happens at all, just a cURL timeout after 30s of nothing.

spyro · February 25, 2024, 6:51pm

Imaginary looks like a good concept - separate the functionality out into something else. I’ll look into that once its all up and running again.