Ich habe einen von vier anderen Instanzen geschafft “grün” zu bekommen als föderierten Server mit Vertrauensstellung.
Leider erschließt sich nicht, anhand welcher Kriterien die “Beurteilung” erfolgt. Inwieweit evtl. auch IPv4/6 bei den Serveradressen da mit rein spielt etc.
Die Dokumentation unter Configuring Federation Sharing — Nextcloud latest Administration Manual latest documentation gibt da nicht viel her
Ja, das haben wir doch schon alles durch (sonst würde ich ja nicht hier fragen )
Und wenn die Instanz dann immer noch nicht grün wird?
federation:sync-addressbooks liefert leider auch keine ausführliche Ausgabe.
Es ist immer hilfreich, solche Infos vorher in der Frage zu dokumentieren. Umso mehr Details umso besser. Oder auch: How to ask forum questions
Es gibt hier im Forum Fragen, welche vor und welche nach dem Lesen der Dokumentation gestellt werden. Beim Antworten, beginne ich persönlich zumindest, immer gerne bei den einfachen Lösungen, basierend auf den vorhandenen Infos.
Außer eben diese wurden schon im Detail dokumentiert
Was genau liefert also: occ federation:sync-addressbooks
?
4 [============================] < 1 sec 73.0 MiB
mit -vvv
Einer der föderierten Server haben wir auf “grün” bekommen.
Wir können nur nicht feststellen warum bzw. warum es bei den anderen nicht funktioniert
Du hast diesen Bug hier schon geprüft?
opened 03:09PM - 17 Aug 23 UTC
closed 09:32AM - 11 Apr 24 UTC
bug
2. developing
feature: federation
technical debt
29-feedback
### ⚠️ This issue respects the following points: ⚠️
- [X] This is a **bug**, … not a question or a configuration/webserver/proxy issue.
- [X] This issue is **not** already reported on [Github](https://github.com/nextcloud/server/issues?q=is%3Aopen+is%3Aissue+label%3Abug) OR [Nextcloud Community Forum](https://help.nextcloud.com/) _(I've searched it)_.
- [X] Nextcloud Server **is** up to date. See [Maintenance and Release Schedule](https://github.com/nextcloud/server/wiki/Maintenance-and-Release-Schedule) for supported versions.
- [X] I agree to follow Nextcloud's [Code of Conduct](https://nextcloud.com/contribute/code-of-conduct/).
### Bug description
In the Federation app, the pairing for Trusted Servers can be flaky.
This is because the way in which `shared_secret` is established is prone to race conditions leading to deadlocks.
There have been numerous reports of people being stuck in the orange state over the years:
- https://help.nextcloud.com/t/federated-trusted-servers-all-on-yellow-not-green/15152
- https://help.nextcloud.com/t/federation-cannot-add-trusted-server/8905
- https://github.com/nextcloud/server/issues/22510
I had the same issue today and investigated a bit. First, I will describe how the pairing flow works. Next, I will describe a situation where it can get stuck in the yellow state, and how to work around it. Finally, I'll suggest possible improvements.
I apologise in advance for the huge information dump. :grimacing:
## How the Pairing Works
The happy path is as follows:
1. Server A adds Server B as a trusted server, and vice versa.
a. This calls [`addServer`](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/TrustedServers.php#L86) which (1) generates a random token, and (2) adds a job to execute `RequestSharedSecret`.
b. When people [on the forum recommend](https://help.nextcloud.com/t/federated-trusted-servers-all-on-yellow-not-green/15152/22) to run `php cron.php`, this just causes the background job to be executed immediately, rather than waiting for the next normal cron to run.
At this point, A's admin settings are yellow 🟡 and A's database looks like this (`shared_secret` and `sync_token` are both NULL):
```
MariaDB [nextcloud]> select * from oc_trusted_servers;
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| id | url | url_hash | token | shared_secret | status | sync_token |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| 17 | https://cloud-b.example.com | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | tttttttttttttttt | NULL | 2 | NULL |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
```
2. Cloud A's `RequestSharedSecret` [does a](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/BackgroundJob/RequestSharedSecret.php#L131) `POST https://cloud-a.example.com/ocs/v2.php/apps/federation/api/v1/request-shared-secret` with body `{token: aaa}`. I.e. Cloud A sends its token to Cloud B.
3. Cloud B receives it, and **if** Cloud A's token is larger than B's token (happy path, see more below!!), B [creates a job](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/Controller/OCSAuthAPIController.php#L143) to later execute `GetSharedSecret`.
4. When B's cron next runs, it executes [`GetSharedSecret`](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/BackgroundJob/GetSharedSecret.php#L128) which pings back A's token to A.
5. A handles `GetSharedSecret` [here](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/Controller/OCSAuthAPIController.php#L168).
a. If the token that B sent is the token that A has it its database, A generates a `shared_secret` and returns it in the response to `GetSharedSecret`.
6. Now both A and B have the `shared_secret` that A generated.
At this point, A's admin settings are still yellow 🟡 and A's database looks like this (`shared_secret` is non-NULL, but `sync_token` is NULL):
```
MariaDB [nextcloud]> select * from oc_trusted_servers;
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| id | url | url_hash | token | shared_secret | status | sync_token |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| 17 | https://cloud-b.example.com | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | tttttttttttttttt | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 2 | NULL |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
```
7. Now Cloud A needs to execute `php occ federation:sync-addressbook`.
At this point, A's Admin settings are green 🟢! A's database looks like this (`shared_secret` and `sync_token` are both non-NULL):
```
MariaDB [nextcloud]> select * from oc_trusted_servers;
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| id | url | url_hash | token | shared_secret | status | sync_token |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
| 17 | https://cloud-b.example.com | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | tttttttttttttttt | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 1 | http://sabre.io/ns/sync/42 |
+----+-----------------------------+------------------------------------------+------------------+----------------------------------+--------+-----------------------------+
```
Now we know how it is supposed to work. Let's look at how it can fail.
## Reasons for being stuck in yellow
### Reason 1: `sync_token` is null
Running `php occ federation:sync-addressbook` as per [the docs](https://docs.nextcloud.com/server/27/admin_manual/configuration_files/federated_cloud_sharing_configuration.html#configuring-trusted-nextcloud-servers) fixes the `sync_token` being NULL -- IF `shared_secret` is non-NULL.
If `shared_secret` is NULL, then `federation:sync-addressbook` will silently-ish [fail here](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/SyncFederationAddressBooks.php#L66) with this message:
> Shared secret for https://cloud.example.com is null
"Silently", because it is only a DEBUG level log (it should be info or warning imho).
So if `php occ federation:sync-addressbook` doesn't get you to green and your `shared_secret` is NULL, you first need to get a value for `shared_secret`. Then return to `php occ federation:sync-addressbook`.
### Reason 2: `shared_secret` is null
**This is the real issue imho.** This is where people on the forums (and me) seem to get stuck. The way that `shared_secret` is established is flaky and prone to race conditions.
Recall that the server with the higher token "wins" and gets to initiate the `RequestSharedSecret`. The server with the lower token gets to send `GetSharedSecret`. Then the server with the higher token gets to chose `secret_token`.
One possible race condition that leads to a deadlock is the following (there may be more):
1. Server A adds the trusted server, cron runs, it does `RequestSharedSecret`.
2. Server B has not yet trusted A, so it will reply FORBIDDEN.
3. Server A will log `refused to ask for a shared secret` and will not retry the `RequestSharedSecret` job.
4. Server B adds A as a trusted server, cron runs, it does `RequestSharedSecret`.
5. **B's token is lower than A's.** So A will reply FORBIDDEN, and log `We will initiate the exchange of the shared secret.`. (But A is lying and won't initiate the exchange - it already tried earlier).
6. Now both A's and B's `RequestSharedSecret` jobs are exited and will never be rerun.
This leads to a deadlock. My recommended workaround is the following:
## Workaround
Informally, we just need to try on Server A until we win and have the higher token.
1. Don't touch Server B (but leave A in the list of trusted servers).
2. while true do on Server A:
a. Remove B as trusted server.
b. Add B as trusted server.
c. Run `php cron.php` (to speed things up).
d. Check your logs at https://cloud.example.com/settings/admin/logging`.
e. If you see an entry `https://cloud.example.com refused to ask for a shared secret.`: We are in the deadlock again. Go back to Step a.
f. If there is NO such entry: break the loop.
3. On Server A, wait and monitor your database (`select * from oc_trusted_servers;`). We need to wait for Server B to execute `GetSharedSecret`. On Server B, run `php cron.php` to speed things up.
4. On Server A, you should see `shared_secret` change from NULL to a value.
5. Now execute `php occ federation:sync-addressbooks`.
6. Now `sync_token` should also change from NULL to a value. You should now be green. 🟢
Now that you have established the `shared_secret`, you may need to rerun `php occ federation:sync-addressbooks` on Server B so that B becomes green 🟢 as well.
## Suggested Improvements
**Short term:**
`RequestSharedSecret` should retain the job if the other server returns FORBIDDEN. Currently it [does not](https://github.com/nextcloud/server/blob/daf3b29572921562abcb700052c1de19fdd2fe4e/apps/federation/lib/BackgroundJob/RequestSharedSecret.php#L163).
**Long-term 1:**
Imho the design with requesting to fetch and then fetching is overly complex, with unnecessary roundtrips. Why not fetch directly? The following design would be simpler and work to:
1. Server A executes `POST /ocs/.../shared-secret` with body `{token: aaa, shared_secret: AAA}`
2. Server B compares A's token with its own.
a. If B's token is larger: B replies with `{token: bbb, shared_secret: BBB}`
a. If A's token is larger: A replies with `{token: aaa, shared_secret: AAA}`
3. Both servers store the chosen shared_secret.
This way, you still have conflict resolution (the higher token wins). And you have only one request, easily avoiding race conditions.
**Long-term 2:**
With my security/cryptography hat on, this seems like the perfect use case for doing a Diffie-Hellman-style key exchange. Instead of one server choosing the shared secret, both servers should contribute one share to combine into a shared secret.
To not reinvent the wheel, choose one of the handshakes from the [Noise Framework](https://noiseprotocol.org/noise.html), which has been [formally verified](https://dennis-jackson.uk/assets/pdfs/noise.pdf).
### Steps to reproduce
See above.
### Expected behavior
See above.
### Installation method
Community Docker image
### Nextcloud Server version
27
### Operating system
Debian/Ubuntu
### PHP engine version
None
### Web server
None
### Database engine version
None
### Is this bug present after an update or on a fresh install?
Fresh Nextcloud Server install
### Are you using the Nextcloud Server Encryption module?
None
### What user-backends are you using?
- [X] Default user-backend _(database)_
- [ ] LDAP/ Active Directory
- [ ] SSO - SAML
- [ ] Other
### Configuration report
_No response_
### List of activated Apps
_No response_
### Nextcloud Signing status
_No response_
### Nextcloud Logs
_No response_
### Additional info
I can fill out the other fields if necessary (but I don't see the need right now).
Das hat auf jeden Fall schon mal geholfen (mehrmals löschen und neu anlegen)
Aber das kann doch so nicht gewollt sein?
Interessant ist auch, dass ich jetzt Benutzer des einen “grünen” Servers als Talk-Teilnehmer hinzufügen kann und bei dem zweiten “grünen” eine (nutzlose) Fehlermeldung erhalte. Sobald ich mehr Zeit habe, werde ich mich mal durch die Logs wühlen
Cool! Freut mich zu hören, dass es gelöst werden konnte. Da scheint was im Design vom Key-Exchange noch Luft nach oben zu haben. Aber da bin ich schon lange draußen..
Ja, Danke für den Hinweis!
Wirklich stabil scheint das aber nicht zu sein
Naja, deine Frage war ja:
Leider erschließt sich nicht, anhand welcher Kriterien die “Beurteilung” erfolgt.
Was eigentlich so aussehen sollte:
Server A und B werden gegenseitig als vertrauenswürdig eingetragen. Server A schickt sein Token zu Server B. Wenn A's Token größer ist, fordert B einen gemeinsamen Geheimcode an. Server A prüft das Token, erstellt den Code und schickt ihn zurück. Jetzt haben beide denselben Code und die Synchronisation funktioniert.
Und thgoebel hat ja dazu dann, hier sehr vereinfacht zwei der Probleme, so zusammengefasst warum es nicht immer klappt:
Race Condition :
Server mit höherem Token initiiert RequestSharedSecret
, aber falls der andere Server noch nicht antworten kann (z. B. weil er den ersten Server noch nicht als vertrauenswürdig eingetragen hat), bricht der Prozess ab – ohne Wiederholungsmechanismus.
Ergebnis: Beide Server warten aufeinander, aber keiner führt den Austausch zu Ende.
„Eine Wettlaufsituation (engl. race condition) tritt auf, wenn zwei Prozesse um denselben Zustand konkurrieren und das Ergebnis von der Timing-Reihenfolge abhängt."
Fehlende Fehlereskalierung :
Logs sind nur auf DEBUG
-Level (statt WARNING
), was die Diagnose erschwert, auch für dich.
Somit ist zumindest deine Frage beantwortet
Ansonsten können wir anscheinend nur “händisch” eintragen, bis es klappt. Die Lösung liegt bei den Entwicklern.
1 Like