Unable to scale out nextcloud on kubernetes

Nextcloud version (eg, 20.0.5): 28.0.1
Operating system and version (eg, Ubuntu 20.04): Kubernetes helm chart version 4.5.10
Apache or nginx version (eg, Apache 2.4.25): if someone can tell me how the hell to find this I'll gladly provide it
PHP version (eg, 7.4): 8.2.13

The issue you are facing:
When I scale the nextcloud helm chart up to more than one replica, I get a whole host of issues that render my instance unusable. Most noticeably when I browse on the web the page says there are no files when there are. It says I have absolutely no files. Scaling back down makes them all appear again.

Now, I know that sounds like I have two instances with different data directories when I scale but I assure you all replicas are mounting the same data directory via a PVC with ReadWriteMany permission.

It seems everything just locks up.

Another symptom was on install when I could barely set up 2 factor authentication. It took many tries until I tried scaling down to 1 replica when it went fine. Trying with multiple replicas just kept failing and forcing a login loop.

On occassion I got the error “too many redirects” which I believe is where my issue is but I can’t figure out how.

Is this the first time you’ve seen this error? (Y/N):
y

Steps to replicate it:

  1. Install the helm chart
  2. Use a PVC that allows multiple mounts
  3. Scale out

The output of your Nextcloud log in Admin > Logging:
Frustratingly this actually won’t load and I only have one replica currently. It just says it can’t load log entries.

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):


<?php
$CONFIG = array (
  'htaccess.RewriteBase' => '/',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'apps_paths' =>
  array (
    0 =>
    array (
      'path' => '/var/www/html/apps',
      'url' => '/apps',
      'writable' => false,
    ),
    1 =>
    array (
      'path' => '/var/www/html/custom_apps',
      'url' => '/custom_apps',
      'writable' => true,
    ),
  ),
  'overwritehost' => 'my.domain.name',
  'overwriteprotocol' => 'https',
  'overwrite.cli.url' => 'https://my.domain.name',
  'filelocking.enabled' => 'true',
  'loglevel' => '2',
  'enable_previews' => true,
  'trusted_domains' =>
  array (
    0 => 'nextcloud',
    1 => 'my.domain.name',
  ),
  'trusted_proxies' =>
  array (
    0 => '10.0.0.0/8',
  ),
  'default_phone_region' => 'nz',
  'memcache.distributed' => '\\OC\\Memcache\\Redis',
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' =>
  array (
    'host' => 'redis-service',
    'port' => 6379,
  ),
  'mail_smtpmode' => 'smtp',
  'mail_smtphost' => 'smtp.gmail.com',
  'mail_smtpport' => '465',
  'mail_smtpsecure' => 'ssl',
  'mail_smtpauth' => true,
  'mail_smtpauthtype' => 'LOGIN',
  'mail_smtpname' => 'email',
  'mail_smtppassword' => 'password',
  'mail_from_address' => 'nextcloud',
  'mail_domain' => 'gmail.com',
  'passwordsalt' => 'salt',
  'secret' => 'secret',
  'datadirectory' => '/var/www/html/data',
  'dbtype' => 'pgsql',
  'version' => '28.0.1.1',
  'dbname' => 'postgres',
  'dbhost' => 'postgres-service:5432',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'oc_ncadmin',
  'dbpassword' => 'password',
  'installed' => true,
  'instanceid' => 'id',
  'twofactor_enforced' => 'true',
  'twofactor_enforced_groups' =>
  array (
  ),
  'twofactor_enforced_excluded_groups' =>
  array (
  ),
);

The output of your Apache/nginx/system log in /var/log/____:

Seems to not want to load?

Output errors in nextcloud.log in /var/www/ or as admin user in top right menu, filtering for errors. Use a pastebin service if necessary.

There are no errors shown here when this happens

I know there’s not much to go on but where do I start?

  • Enable and check your logs[1]
  • What Kubernetes implementation/platform?
  • What changes have you mean from the default configuration beyond replicaCount?

[1] helm/charts/nextcloud/README.md at main · nextcloud/helm · GitHub

Logging is enabled but I’ll increase it to debug. The logs just won’t show in the web UI for some reason.

This is on k3s self hosted.

Here is my values.yaml and ingress class

image:
      repository: nextcloud
      pullPolicy: IfNotPresent
    replicaCount: 2

    ingress:
      enabled: false

    phpClientHttpsFix:
      enabled: false
      protocol: https

    nextcloud:
      host: domain. Name
      username: admin
      password: changeme
      ## Use an existing secret
      existingSecret:
        enabled: true
        secretName: nextcloud-secret
        usernameKey: adminusername
        passwordKey: adminpassword
        # tokenKey: serverinfo_token
        smtpUsernameKey: smtp_username
        smtpPasswordKey: smtp_password
        smtpHostKey: smtp_host
      update: 0
      # If web server is not binding default port, you can define it
      containerPort: 80
      datadir: /var/www/html/data
      persistence:
        subPath:
      mail:
        enabled: true
        fromAddress: nextcloud
        domain: gmail.com
        smtp:
          host: smtp.gmail.com
          secure: ssl
          port: 465
          authtype: LOGIN
          # name: user
          # password: pass
      # PHP Configuration files
      # Will be injected in /usr/local/etc/php/conf.d for apache image and in /usr/local/etc/php-fpm.d when nginx.enabled: true
      phpConfigs: {}
      # Default config files
      # IMPORTANT: Will be used only if you put extra configs, otherwise default will come from nextcloud itself
      # Default confgurations can be found here: https://github.com/nextcloud/docker/tree/master/16.0/apache/config
      defaultConfigs:
        # To protect /var/www/html/config
        .htaccess: true
        # Redis default configuration
        redis.config.php: false
        # Apache configuration for rewrite urls
        apache-pretty-urls.config.php: true
        # Define APCu as local cache
        apcu.config.php: true
        # Apps directory configs
        apps.config.php: true
        # Used for auto configure database
        autoconfig.php: true
        # SMTP default configuration
        smtp.config.php: true

      configs:

        custom.config.php: |-
          <?php
          $CONFIG = array (
            'overwritehost' => 'domain. Name',
            'overwriteprotocol' => 'https',
            'overwrite.cli.url' => 'https://domain. Name',
            'filelocking.enabled' => 'true',
            'loglevel' => '0',
            'enable_previews' => true,
            'trusted_domains' =>
              [
                'nextcloud',
                'Domain. Name'
              ],
            'trusted_proxies' => ['10.0.0.0/8'],
            'default_phone_region' => 'nz',
          );

        redis.config.php: |-
          <?php
          $CONFIG = array (
            'memcache.distributed' => '\OC\Memcache\Redis',
            'memcache.locking' => '\OC\Memcache\Redis',
            'redis' => array(
              'host' => 'redis-service',
              'port' => 6379,
            )
          );


      # Extra mounts for the pods. Example shown is for connecting a legacy NFS volume
      # to NextCloud pods in Kubernetes. This can then be configured in External Storage
      extraVolumes:
       - name: shared
         nfs:
          server: storage
          path: /mnt/Volume01/Shared
      extraVolumeMounts:
       - name: shared
         mountPath: "/shared"

    nginx:
      enabled: false
    
    internalDatabase:
      enabled: false

    externalDatabase:
      enabled: true

      type: postgresql

      ## Database host
      host: postgres-service:5432

      ## Database name
      database: postgres

      ## Use a existing secret
      existingSecret:
        enabled: true
        secretName: nextcloud-secret
        usernameKey: db-username
        passwordKey: db-password

    mariadb:
      enabled: false

    postgresql:
      enabled: false

    redis:
      enabled: false

    cronjob:
      enabled: true
      resources:
        limits:
          # cpu: 50m
          memory: 100Mi
        requests:
         cpu: 50m
         memory: 100Mi

    service:
      type: ClusterIP
      port: 80

    persistence:
      # Nextcloud Data (/var/www/html)
      enabled: true
      storageClass: "rook-cephfs"
      accessMode: ReadWriteMany
      size: 8Gi
      nextcloudData:
        enabled: true
        subPath:
        annotations: {}
        storageClass: "managed-nfs-storage"
        # existingClaim:
        accessMode: ReadWriteMany
        size: 10Gi

    resources: 
      limits:
        # cpu: 500m
        memory: 500Mi
      requests:
       cpu: 500m
       memory: 500Mi

    livenessProbe:
      enabled: true
      initialDelaySeconds: 10
      periodSeconds: 20
      timeoutSeconds: 5
      failureThreshold: 3
      successThreshold: 1
    readinessProbe:
      enabled: true
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
      successThreshold: 1
    startupProbe:
      enabled: true
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 600
      successThreshold: 1

    hpa:
      enabled: false
      cputhreshold: 60
      minPods: 1
      maxPods: 10

    nodeSelector: {}

    tolerations: []

    affinity: 
      nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 50
            preference:
              matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
      podAntiAffinity:                                 
        requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname     
          labelSelector:                               
            matchLabels:                               
              app.kubernetes.io/name: nextcloud

    metrics:
      enabled: false

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: nextcloud
  namespace: nextcloud
  annotations: 
    kubernetes.io/ingress.class: traefik-external
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`domain. Name`)
      kind: Rule
      services:
        - name: nextcloud
          port: 80
      middlewares:
        - name: default-headers
          namespace: default
        - name: nextcloud-redirect
          namespace: nextcloud
        - name: nextcloud-webfinger
          namespace: nextcloud
  tls:
    secretName: tls

I imagine there will potentially be sensitive info in the logs, is there anything I will need to edit before sharing?

Aha, with debug logs I instantly see dozens of this error:

CSRF check failed","userAgent":"Mozilla/5.0 (Android 14; Mobile; rv:121.0) Gecko/121.0 Firefox/121.0","version":"28.0.1.1","exception":{"Exception":"OC\\AppFramework\\Middleware\\Security\\Exceptions\\CrossSiteRequestForgeryException","Message":"CSRF check failed","Code":412,"Trace":[{"file":"/var/www/html/lib/private/AppFramework/Middleware/MiddlewareDispatcher.php","line":96,"function":"beforeController","class":"OC\\AppFramework\\Middleware\\Security\\SecurityMiddleware","type":"->","args":[["OC\\Core\\Controller\\ContactsMenuController"],"index"]},{"file":"/var/www/html/lib/private/AppFramework/Http/Dispatcher.php","line":129,"function":"beforeController","class":"OC\\AppFramework\\Middleware\\MiddlewareDispatcher","type":"->","args":[["OC\\Core\\Controller\\ContactsMenuController"],"index"]},{"file":"/var/www/html/lib/private/AppFramework/App.php","line":184,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[["OC\\Core\\Controller\\ContactsMenuController"],"index"]},{"file":"/var/www/html/lib/private/Route/Router.php","line":315,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OC\\Core\\Controller\\ContactsMenuController","index",["OC\\AppFramework\\DependencyInjection\\DIContainer"],["core.contactsMenu.index"]]},{"file":"/var/www/html/lib/base.php","line":1069,"function":"match","class":"OC\\Route\\Router","type":"->","args":["/contactsmenu/contacts"]},{"file":"/var/www/html/index.php","line":39,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"/var/www/html/lib/private/AppFramework/Middleware/Security/SecurityMiddleware.php","Line":219,"message":"CSRF check failed","exception":{},"CustomMessage":"CSRF check failed"}}

Now how do I go about fixing that?

Do I need to enforce sticky sessions on traefik? When I log out with 2 replicas and try to log back in I get a login loop, presumably because I’m bouncing between the two sessions.

I thought this was what redis was for? I don’t want to have to enforce sticky sessions because that’s seems a bit hacky.

Looks like you need session locking in redis enabled. How can I do this in the helm chart though?

@2fst4u Good day to you, friend! Thanks for your question.

I’m also trying to scale out Nextcloud installation. And the moment I run two replicas behind one HAProxy I’m getting issues thorought app. For example, in the “Files”-app no files are fetched. I’ve found a single HTTP 401 for PROPFIND /remote.php/dav/files/<REDACTED>/ with following response:

<?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Sabre\DAV\Exception\NotAuthenticated</s:exception>
  <s:message>CSRF check not passed.</s:message>
</d:error>

Also there is a continious spam in Dev Console:

Request to https://<REDACTED>/apps/files/api/v1/stats failed because of a CSRF mismatch. Fetching a new token
New request token zWMs8eiSa<REDACTED>C0IQ= fetched

My configuration:

To scale installation I’m running two Docker-hosts with two Nextcloud apps, two Nginx’es and two Traefiks. And then I’m configuring HAProxy’s backends and frontends to use said Docker-hosts.

UPD: Configured session locking in Redis, but aforementioned issues persist :frowning: