NC AIO webif throws ERR_SSL_PROTOCOL_ERROR / letsencrypt cert not renewing

Hi folks,

I upgraded my Nextclound AIO instance to ver 9.0.1 - after that I realized that despite all containers are started my webif is unreachable throwing ERR_SSL_PROTOCOL_ERROR. Analyzing this a bit deeper it turns out that my apache container throws cyclic errors about failed certificate renewal as follows (log obfuscated in terms of names and numbers):

{"level":"error","ts":1719125456.8688223,"logger":"tls.issuance.acme.acme_client","msg":"challenge failed","identifier":"my.nc-aio.example","challenge_type":"tls-alpn-01","problem":{"type":"urn:ietf:params:acme:error:connection","title":"","detail":"1.2.3.4: Timeout during connect (likely firewall problem)","instance":"","subproblems":[]}}
{"level":"error","ts":1719125456.8691387,"logger":"tls.issuance.acme.acme_client","msg":"validating authorization","identifier":"my.nc-aio.example","problem":{"type":"urn:ietf:params:acme:error:connection","title":"","detail":"1.2.3.4: Timeout during connect (likely firewall problem)","instance":"","subproblems":[]},"order":"https://acme-staging-v02.api.letsencrypt.org/acme/order/157223335/24684621354","attempt":1,"max_attempts":3}
{"level":"error","ts":1719125456.8692725,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"my.nc-aio.example","issuer":"acme-v02.api.letsencrypt.org-directory","error":"HTTP 400 urn:ietf:params:acme:error:connection - 1.2.3.4: Timeout during connect (likely firewall problem)"}
{"level":"error","ts":1719125456.8696342,"logger":"tls.obtain","msg":"will retry","error":"[my.nc-aio.example] Obtain: [my.nc-aio.example] solving challenge: my.nc-aio.example: [my.nc-aio.example] authorization failed: HTTP 400 urn:ietf:params:acme:error:connection - 1.2.3.4: Timeout during connect (likely firewall problem) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)","attempt":2,"retrying_in":120,"elapsed":84.447442784,"max_duration":2592000}

I’m not behind a cloudflare tunnel have no loadbalancers in front and no CNAMES/ALIASES - I’m utilizing plain portforwarding on perimeter towards NC AIO for port 80+443 which in the past worked for retrieving letsencrypt certs. Via a tcpdump on the physical nic of the docker host I see requests from external arriving at port 80 (so my portforwarding obviously works) these are getting redirected to NC AIOs port 443 (also obfuscated):

> 09:16:36.139034 IP 5.6.7.8.59154 > 1.2.3.4.80: Flags [P.], seq 1:424, ack 1, win 502, options [nop,nop,TS val 3652117345 ecr 315700760], length 423: HTTP: GET / HTTP/1.1
> 09:16:36.139137 IP 1.2.3.4.80 > 5.6.7.8.59154: Flags [.], ack 424, win 506, options [nop,nop,TS val 315700803 ecr 3652117345], length 0
> 09:16:36.165297 IP 1.2.3.4.80 > 5.6.7.8.59154: Flags [P.], seq 1:139, ack 424, win 506, options [nop,nop,TS val 315700829 ecr 3652117345], length 138: HTTP: HTTP/1.1 301 Moved Permanently

Supposingly these redirected requests don’t get answered due to the ERR_SSL_PROTOCOL_ERROR I mentioned at the beginning, so this seems to be a sort chicken-and-egg problem.

Btw. I seem to recall that last time I requested my certs via the http01 challenge - am I hallucinating or was this changed by any update?

I don’t have much experience regarding docker containers and have no glue of how to debug this further so any help would be highly appreciated.

Hi, the apache container uses the tls-alpn challenge that only needs port 443 open to work correctly. Maybe something is blocking access to that?

Hi my NCs webif used to work until I upgraded and I also get the ERR_SSL_PROTOCOL_ERROR when accessing the webif from common internet. So I suppose this is rather a logical problem than a connectivity issue.
As I said my cert already was expired before the upgrade and now it doe not renew.
I suppose it’s not expected behaviour that Caddy produces ERR_SSL_PROTOCOL_ERROR in the process of cert renewal when cert has already expired or should that be normal.

Is there a way I can test the challenge mechanism myself from the outside?