Bad Gateway/Connection refused while connecting to upstream via reverse proxy nginx/LE companion

Feynt · March 21, 2021, 1:33pm

Man, where do I begin. I’ve tried setting up Nextcloud a few times, and it seems like every time I have problems. My last mostly successful attempt had a slight issue with upgrading; all done the conventional normal installation through local program installation and unpacking of files. Unfortunately the NC directory was lost on a corrupted hard drive and I had to look into reinstalling yet again. This time I thought I would do it the “easy way” and use Docker on my homelab. Unfortunately, it’s not working, and I’m having a hell of a time sorting out this issue. Here’s everything so far:

Base OS install is Centos 8
I’m not using docker-compose, instead opting for individual docker containers so I can monitor stats through portainer.io
I’m no wizard with Docker. I understand how it works, just not how to use it well
I’ve got a qualified domain through Google
Starting httpd via Docker allows me to access the sample page to confirm the page is working
Installing nginx-proxy and letsencrypt-nginx-proxy-companion went smoothly and I can also confirm running httpd behind them works flawlessly, simply by adding VIRTUAL_HOST/LETSENCRYPT_HOST entries
I have properly signed certs for the main domain and the intended subdomain nextcloud will be running on
Running the super bog standard docker run -d -p 8080:80 nextcloud does work, but obviously doesn’t allow for much permanence or customisability, and doesn’t work behind nginx

The problem comes when I attempt to create a container with nextcloud:latest. Inspecting the logs I see Initializing nextcloud 21.0.0.18 ..., but any attempt to access it returns a 502 error. curl -v results below (sanitised, of course):

*   Trying 1.2.3.4:443...
* TCP_NODELAY set
* Connected to nextcloud.addr.mine (1.2.3.4) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=nextcloud.addr.mine
*  start date: Mar 21 10:15:37 2021 GMT
*  expire date: Jun 19 10:15:37 2021 GMT
*  subjectAltName: host "nextcloud.addr.mine" matched cert's "nextcloud.addr.mine"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fffdfe1caa0)
> GET / HTTP/2
> Host: nextcloud.addr.mine
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 502
< server: nginx/1.19.3
< date: Sun, 21 Mar 2021 12:55:38 GMT
< content-type: text/html
< content-length: 157
< strict-transport-security: max-age=31536000
<
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.3</center>
</body>
</html>
* Connection #0 to host nextcloud.addr.mine left intact

Looking at the logs from nginx-proxy, I get:

nginx.1    | nextcloud.addr.mine 192.168.123.254 - - [21/Mar/2021:12:52:55 +0000] "GET / HTTP/2.0" 502 559 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36",
nginx.1    | 2021/03/21 12:52:55 [error] 184#184: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 1.2.3.4, server: nextcloud.addr.mine, request: "GET / HTTP/2.0", upstream: "http://172.17.0.6:80/", host: "nextcloud.addr.mine",
nginx.1    | 2021/03/21 12:52:55 [error] 184#184: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 1.2.3.4, server: nextcloud.addr.mine, request: "GET /favicon.ico HTTP/2.0", upstream: "http://172.17.0.6:80/favicon.ico", host: "nextcloud.addr.mine", referrer: "https://nextcloud.addr.mine/"

172.17.0.6 is the docker bridge network IP for the container, so no harm sharing that. 192.168.123.254 is my router. 1.2.3.4 is a sanitised reference to my public IP.

Now, again, with or without nginx-proxy involved, I can run an apache web server and get a test page.

Environment variable wise, I’m passing in the following:

VIRTUAL_HOST=nextcloud.addr.mine
LETSENCRYPT_HOST=nextcloud.addr.mine
TRUSTED_PROXIES=172.17.0.4 (the address of the nginx-proxy, though I’ve tried a number of others including 127.0.0.1)
VIRTUAL_PORT=8282 (testing in vain to export a non-80 port to get this to work, as I had done with httpd successfully)

I’m assigning a volume binding to a network directory on my homelab with several terabytes of space and redundantly backed up, just so I have some place to upload files and share with friends/co-workers without worrying about running out of space and without drive failure being an issue. When I start up the Docker container it populates the specified network directory in what appears to be a correct arrangement.

Addendum:

Because I forgot to mention: In my testing with nginx-proxy/LE-companion and httpd, I can confirm it did not work (as expected) when the nginx-proxy container was offline, and started working properly when it was brought online (again, as expected). Likewise running the basic docker command to run nextcloud without settings, I’m able to get it to work without nginx-proxy running, but as soon as I assign it to the bridge instead of the host network it won’t pass through nginx-proxy.

Feynt · March 22, 2021, 4:50am

Doing further testing I managed to confirm that the basic docker run, with no volume changes, does work through the nginx-proxy link. I believe the issue comes down to the volume creation that binds to the storage drive. This is odd though because when I clear out the directory that nextcloud would be using, upon starting the container, the directory would populate. So how would it be able to do that without access to the directory?

Feynt · March 22, 2021, 9:51pm

I was able to “resolve” the issue by making a symlink directory to my storage NFS mount point at the root of my docker volumes. This is not an acceptable solution, but it is the only one I’ve found so far. If anyone else is familiar with docker volume creation and pointing at “remote” drives, please let me know.

For clarification, my storage volumes are under the purview of a file server VM which covers downloads (Transmission, etc.), and other VMs connect through it via NFS mounts if necessary (such as Plex, which is working great). All of the drives are within one workstation homelab running CentOS 8. Docker is running on the main OS image.

Feynt · March 26, 2021, 4:08am

Binding an external directory which is itself bound to the NFS mount seems to have worked after all. Apparently the reason it shows Bad Gateway is because if you’re not using the internal docker volume it has to copy everything outside of the container in a very slow copy process. After half an hour or something of copying it becomes accessible and allows for setup. After that, setting up MariaDB was straight forward.