Creative names for Nextcloud prevents auto-detection

Quix0r · August 13, 2023, 2:02pm

For some software I need to identify used/installed software on servers. Many people left nextcloud alone and didn’t touch there anything. But I have there 360 other cases where they have changed it.

My question here is: How I safely detect Nextcloud as nextcloud and not some other fancy name? Please take a look at this as an example: https://cloud.teamtammo.com

It is to your eye Nextcloud, for sure YOU can see it. But how can a software detect this as nextcloud?

Edit: There is a /.well-known/x-nodeinfo2 for “auto-detection” of software. Maybe by default always fill it out properly? Older ways are fetching /.well-known/nodeinfo first and then taking the href element your software supports (by protocol version).

Edit2: Possible solution candidate: Add <meta property="og:platform" content="nextcloud" /> to the HTML code.

bb77 · August 13, 2023, 3:25pm

Maybe this is of any help… https://cloud.teamtammo.com/status.php

Quix0r · August 13, 2023, 3:48pm

Yes, that response does contain nextcloud. The thing here is, it isn’t auto-detectable and it doesn’t have the same structure as nodeinfo JSON responses have. So I have to add /status.php as possible path + also productname as a possible key too look for. At first glance it looks good, but at second it means the software has to send out yet another request to the server (already done many before, like /nodeinfo/2.1.json, /nodeinfo/2.1 and same with older versions). At lot of these requests can be saved actually by providing /.well-known/nodeinfo or /.well-known/x-nodeinfo2 at least and properly formatted JSON response.

Quix0r · August 13, 2023, 3:51pm

Please take a look here: https://f.haeder.net/.well-known/nodeinfo

This is a very typical reply. When you follow both rel (only description of the JSON) and href (the actual reply) you can see very common responses. These only cost 2 requests or even only one to /.well-known/x-nodeinfo2. That is my point here, to keep sent requests low.

bb77 · August 13, 2023, 4:16pm

I’m not an expert on the topic. However, as far as I understand it (I could be wrong though), the nodeinfo is about providing metadata about protocols a server is running in order for software to know how interact with it, and not nececessarly to provide information about specific software products running on that server.

Sounds almost like someone is trying to fly under the radar…

Quix0r · August 13, 2023, 4:19pm

I just try to be nice to administrators.

Please take a look here:
https://f.haeder.net/nodeinfo/1.0

You can see there the JSON element software and then name (my software ignores version). That gives away enough info to know what software is running there.

bb77 · August 13, 2023, 4:37pm

As I said, I’m not an expert, but afaik nodeifo is only provided if an app is running on the server that uses something like Activity Pub, e.g when the Nextcloud Social app is installed. This would look as follows:

Version	"2.0"
software	
name	"Nextcloud Social"
version	"0.6.1"
protocols	
0	"activitypub"
rootUrl	"https://cloud.domain.tld/apps/social"
usage	[]
openRegistrations	false

Quix0r · August 13, 2023, 4:53pm

Yes, I know. Hmm. So I need to add /status.php as possible URL for checking software name?

Okay, other proposal:

People want to change the name ‘Nextcloud’ for whatever reason to their liking.
Let there be two fields: One for showing on website and one for internal use only and that remains nextcloud.

So the internal one is set as <meta property="og:platform" content="nextcloud" /> while the other can be changed and shown to users. Problem solved. My point here is, I need something that can be identified by software.

Another issue I have with /status.php is that it isn’t a standard URL, like those under /.well-known/ try to standardize such meta information (e.g. used software, version number, et cetera). And as I wrote earlier, I need to send another request to the other server. Also I need to add some code for only handling the JSON from /status.php and other software might send other data here.

If there is a standardized way, then let’s do this, e.g. og:platform is absolutely fine with me.

ernolf · August 13, 2023, 5:39pm

even though .well-known/nodeinfo only applies to fediverse servers and for exmple cloud.teamtammo.com does not provide them, you can use the .well-known mechanism to find out whether it is a Nextcloud server.
On any

.wel-known/<any_string_here>

-request, even though the response code is 404, nextcloud servers will send a

x-nextcloud-well-known: 1

-header. So you do not even have to download one single bit, you only have to perform a header-request:

:~$ curl -sI https://cloud.teamtammo.com/.well-known/you_name_it | grep nextcloud
x-nextcloud-well-known: 1

You can now simply integrate that into your automation.

Hope that helps,
much luck!

bb77 · August 13, 2023, 5:47pm

There isn’t the one standardized way how web applications, or applications in general, would or should identify themself. This depends on the type of application, the web standards or protocols they are using, and in many cases even on the specific app itself. The nodeinfo example is specific to ActivyPub, but there are tons of application specific APIs out there, and they don’t necessarly identify themslfs withe a big “Hello, I’m Nextcloud, please connect to me!”

On top of that, there are all the other internet protocols like imap, smtp, XMPP etc… etc… So, If you want to be a 100% sure what is running on a server, you would have to do a full portscan, and then, in the worst case, send out hundreds of queries in order to find out what application is running behind the open ports.

Quix0r · August 13, 2023, 5:50pm

My software is freely available: git.mxchange.org Git - fba.git/summary

It collects blocklists from fediverse instances and makes them searchable. Nextcloud isn’t providing these features, e.g. /api/v1/instance/<peers|domain_blocks> which is my main goal to fetch. Nextclooud is only half part of it. As it’s software name appears in some view. What I try here is not have so much different names there, og:platform, generator and also og:site_name are already checked if auto-discovery through /.well-known/nodeinfo fails.

Excerpt from the software:

Detection is done in following order:

AUTO_DISCOVERY: /.well-known/nodeinfo was reachable and software type was found in nodeinfo response

STATIC_CHECK: Node information was found by probing for well-known URLs

PLATFORM: Meta data og:platform was found in HTML code

GENERATOR: Meta data generator was found in HTML code

SITE_NAME: Meta data og:site_name was found in HTML code

None: the instance was not reachable or the used software was not stated

So first, /.well-known/x-nodeinfo2 or /.well-known/nodeinfo and then provided href URLs are fetched, if that fails, “static checks” on “well-known” e.g. /nodeinfo/2.1.json and so are checked, then next og:platform, then generator from <meta /> tag and last resort is og:site_name. If all failes, None is being set for software name.

They are all based on GET requests, adding a HEAD request would add more code, for now only for Nextcloud. Just adding a og:platform (and maybe others but I won’t read and analyze them) is very little effort to do.

Quix0r · August 13, 2023, 5:51pm

I only check Fediverse instances and their peers provided in both /api/v1/instance/peers and domain_blocks JSON APIs. If they fail to be fetched (many Fediverse instances don’t provide them). Then I go with other ways, e.g. Lemmy provides /instances which is a HTML response that I can extract peer names from.

For websites, there is a standardized way, <meta name="generator" value="xxx" /> or said og:platform are those ways.

bb77 · August 13, 2023, 6:13pm

Nextcloud isn’t primarly meant to be a Fediverse instance, and unless someone decides to install “Nextcloud Social”, in which case that instance will corretly identifiey itself as “Nextcloud Social”, it’s none of your software’s business, whether that instance is identifying itself properly, or if it identifies itself at all. I would even argue that most Nextcloud users have zero interest in connecting their Nextcloud instances to the Fediverse.

However, if you think that the Nextcloud Social app doesn’t properly follow the ActivyPub standards, I’d suggest you report it here: GitHub - nextcloud/social: 🎉 Social can be used for work, or to connect to the fediverse!

Quix0r · August 13, 2023, 6:18pm

Thank you for the long reply. I only look to reduce these fancy names they enter to just one generalized name, e.g. nextcloud. My software is not federating, not does it follow ActivityPub protocol.

bb77 · August 13, 2023, 6:35pm

Well, and I think you need to find another way to identify what’s going on on those servers. But maybe there is another uinque pattern you could query in order to find any Nextcloud instances, I’m not sure…

Anyways, to me this doesn’t sound like a feature (some might even call it an anti-feature ) the average Nextcloud user or even businesses would need, but rather like a very specific requirement on your part.

Quix0r · August 13, 2023, 7:24pm

It is so easy as I described in my starting post, internal name is nextcloud, the external part is shown e.g. as the link shows in footer: https://cloud.teamtammo.com/ There is no “special need” here, just that the used software is properly stated. For my software, I can just easily add a <meta /> tag to my base.html template.

bb77 · August 14, 2023, 4:35pm

I think the question is rather if it is necessary and/or wanted. But feel free to open a feature request on Github…

Quix0r · August 14, 2023, 5:04pm

Yes, true. On one side Nextcloud itself isn’t federating without Nextcloud Social being installed. That makes sense. On the other side, some “low-level” software propagation, e.g. <meta name="generator" /> or the whole og:* set isn’t much HTML code to be added and here I don’t have to add extra code only for /status.php and risking that other websites may also provide it but with different intend. That is why /.well-known/ was “invented” to have generic WWW-wide paths that are indeed well-known. It is similar with ports <=1024, they are well-known ports and services shouldn’t be other than default assumed, e.g. 25 is SMTP, 110 POP3 and so on.

kesselb · August 14, 2023, 6:13pm

A Nextcloud instance with Social App should provide you /.well-known/nodeinfo: https://github.com/nextcloud/social/blob/b92fb024b909be848b632ca00e9be04a99fb4bc4/lib/WellKnown/WebfingerHandler.php#L187

Otherwise: _oc_config should be in the HTML response for almost every Nextcloud instance.

Quix0r · August 14, 2023, 11:43pm

Issue Propagate software name through meta HTML tags properly · Issue #39875 · nextcloud/server · GitHub is ready.