Personal performance enhancements and experiences

Kerasit · July 13, 2023, 8:01am

Hi Community.

The following is a personal experience, hence subjective. It is based on personal trial and errors, tests and successfull optimizations of my personal Nextcloud instance. I am running two different instances, where one is for a none-profit organization and the other is my personal family cloud. The first instance is setup using load balancing with HAProxy, Galea Cluster and syncthings for high availability, and as such, is mission critical for said organization. This runs smoothly with no incidents for 3 years straight, and I do not dare touch a single thing.

The other instance is the one in scope for this post, and is serving as my sandbox for production. I have third instance, which is a development and test instance, however it is a copy of the second instance in terms of infrastructure and is served by the same services.

Infrastructure
HTTPS: HAProxy → [LXC] Apache2 HTTP2
DB: HAProxy → [LXC] Database engine

Hosts
Host 1:
HAProxy (SNAP package)
LXC

Nextcloud Production Container
OpenLDAP (prod) Container

Host 2:
LXC

Nexcloud Dev Container
PostgreSQL Container
MariaDB Container
Jitsi Meet Container
OpenLDAP (prod) Container
OpenLDAP (dev) Container

HAProxy
SNI_Preread for hostname based determination of backend:

frontend env_ssl_frontend
  bind *:443
  bind :::443
  mode tcp
  http-request redirect scheme https if ! { ssl_fc }
  option tcplog
  tcp-request inspect-delay 10s
  tcp-request content accept if { req_ssl_hello_type 1 }
  use_backend bk-https-nc if { req_ssl_sni -i cloud.mydomain.tld }

TLS is not terminated on reverse proxy, but is passed-through to the Apache2 webserver in the LXC container itself.

For database trafic I use HAProxy to have the freedom of moving containers around and by local DNS I can move these containers - and even entire databases - around without having to change anything but an IP adress in the HAProxy config file. No need to change anything on any consuming services.

# PostgreSQL Cluster present.
frontend db_pgsql
  bind *:5432
  mode tcp
  tcp-request inspect-delay 10s
  use_backend db-postgres

# Galea Cluster sandbox.
frontend db_maria
  bind *:3306
  mode tcp
  tcp-request inspect-delay 10s
  use_backend db-maria

I use HAProxy for LDAP as well. In backend it loadbalances between the two OpenLDAP containers, but prefers the one on Host 1. They are replicas:

frontend env_ldaps_frontend
  bind *:636
  mode tcp
  option tcplog
  tcp-request inspect-delay 10s
  tcp-request content accept if { req_ssl_hello_type 1 }
  use_backend bk-ldaps-prod if { req.ssl_sni -i ldap.mydomain.tld }
  use_backend bk-ldaps-dev if { req.ssl_sni -i ldap-dev.mydomain.tld }

LXC Host notes
Host 1 has a dedicated full OpenZFS pool for containers. It is through OpenZFS snapshot based backup is done.

Containers
Only the important parts:

Nexcloud Production

Apache2 HTTP2
PHP8.2-FPM
This one got plenty of dedicated resources.

Nexcloud Development

Apache2 HTTP2
PHP8.2-FPM
This one has bare minimum resources.

OpenLDAP Prod

OpenLDAP
LDAPS and StartTLS

Performance
At first I was worried that using HAProxy is a center for all trafic would be a bottleneck. It certainly is a bottleneck if compared to dedicating the entire Host 1 to only serve Nextcloud by installing and running all the needed pieces from same host. However this statement is only true if I am willing to pump up one Host with all the juice it requires: CPU cores, memory, Disc drives etc. It also comes with the risk of singlepoint of failure, and as I use the LDAP for other things it already was running as master on Host 2. I also use MariaDB for other projects, so this to, was already running on Host 2. Despite having MariaDB already, I initially setup everything in one container except OpenLDAP.
HAProxy was already serving HTTP and HTTPS reverse proxying and loadbalancing for other services, so I knew even before starting the Nextcloud project, that I would have to use HAProxy for reaching Nextcloud on my public IP.

So my first Nextcloud setup was like the following:

NCP behind HAProxy
Using LDAP for user and group management
This ran not so smoothly as I expected and BTRFS was the prefered file system by NCP, so at some point I decided to re-do my Nextcloud.

My second Nextcloud setup:

Manuel setup according to Nextcloud documentation
PHP-fpm
MySQL (at that time)
Apache2
Redis (added later)
This ran much more smooth than NCP for me. MySQL was not as stable as I liked and was the cause of several issues with my Nextcloud, so I switched to MariaDB. Still on same host.
After adding Redis, it started perform better. After adding OPCache, it was now pretty smooth.

Then I made my Nextcloud development.
I had issues with almost each and every Nextcloud version upgrade, as the backup step always made the upgrade stall. So I decided to setup a second instance, just to fool around with how to improve backup, upgrade and restore.
This development instance was set up excactly like the one in production. After successfully developing, testing and verifying my backup, upgrade and restore routine, I had an instance to play with for other fun projects. I moved the database to my existing MariaDB container on Host 2, and emmediatly noticed performance improvements. Not only does Host 2 has more cores and memory, the MariaDB container is also optimized to serve MariaDB. And my Nextcloud instance could be tailored and optimized for Apache2, php-fpm and redis and use all available resources.
Moving the database from the Nextcloud container itself to the dedicated MariaDB instance in production, had the similar noticeable improvements.

Moving to PostGreSQL
I did various optimizations like using APCu for local memcaching and redis for filelocking and stop using groupfolders app and instead use use shared folders from a dedicated none-personal user. The last one basically cut loading times in half… o_O
The last optimization I did was setting up PostGreSQL on Host 2 and tweaking both the container with the best possible image for pgsql and the pgsql config itself.
Switching over to pgsql further optimized the performance.

Learning points
Logicaly serving all of these middleware components through HAProxy+the virtuel network layer in the LXD layer instead of bare metal and everything on localhost, should be for a maintenance and high availability point of view as well as resilience only. However in my case, it has brought me better performance, simplicity of management (oddly enough), solid and easy backup-restore strategy and easy migration of services. Working with security, all of this runs as none-privileged, which for me, is a huge advantage over docker. I can even dedicate security hardening for each container. Some services just need to be bundled, and works best when running on same localhost, talking to each other on sockets.

My own personal preference for Nextcloud is based on the above journey.

PHP-fpm over mod_php
pgsql over MariaDB
LDAP over local users and groups
Do not use Groupfolders
Loadbalancing and reverse proxying is king
Containers over bare metal (I am favouring LXC over Docker. That is a personal flavour)