Occasional Caldav hangs [solved by workaround]

The Basics

  • Nextcloud Server version (e.g., 29.x.x):
    • 32.0.8
  • Operating system and version (e.g., Ubuntu 24.04):
    • Debian 13
  • Web server and version (e.g, Apache 2.4.25):
    • Apache 2.4.66-1~deb13u2
  • Reverse proxy and version _(e.g. nginx 1.27.2)
    • None
  • PHP version (e.g, 8.3):
    • 8.4 (Apache2 mod_php)
  • Is this the first time you’ve seen this error? (Yes / No):
    • Seen this several times during the last few weeks
  • When did this problem seem to first start?
    • After update from Nextcloud 31 to 32
  • Installation method (e.g. AlO, NCP, Bare Metal/Archive, etc.)
    • Bare Metal/Archive
  • Are you using CloudfIare, mod_security, or similar? (Yes / No)
    • No

Summary of the issue you are facing:

When subscribing multiple Nextcloud calendars in Thunderbird (other clients seem to be ok), from time to time (can be minutes or days) Nextcloud hangs. This affects the web interface and Caldav/Carddav/Webdav access for everyone. New incoming requests simply start additional Apache processes, which also hang, consuming lots of CPU, until Apache’s process limit is reached.

There seems to be a deadlock somewhere.

Steps to replicate it (hint: details matter!):

  1. Create around 20 calendars in Nextcloud

  2. In Thunderbird (I’m using Debian’s 140.9ESR) subscribe to those calendars using CalDAV and set sync intervall to e.g. 5 minutes.

  3. Wait

Log entries

Apache’s server-info page shows “W” (Waiting) for the hanging CalDAV “profind” and “get” requests.

Nextcloud

Nothing.

Web server / Reverse Proxy

Nothing.

Configuration

Nextcloud

The output of occ config:list system or similar is best, but, if not possible, the contents of your config.php file from /path/to/nextcloud is fine (make sure to remove any identifiable information!):

{
    "system": {
        "debug": false,
        "installed": true,
        "dbtype": "mysql",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbtableprefix": "oc_",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "forcessl": false,
        "trusted_domains": [
      **REMOVED**
        ],
        "overwrite.cli.url": "**REMOVED**",
        "default_phone_region": "DE",
        "logtimezone": "Europe\/Berlin",
        "log_query": false,
        "log_authfailip": true,
        "log_rotate_size": 10485760,
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpdebug": false,
        "mail_smtpmode": "sendmail",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "25",
        "mail_smtptimeout": 10,
        "memcache.local": "\\OC\\Memcache\\APCu",
        "filelocking.enabled": true,
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "memcache.distributed": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 0,
            "timeout": 0
        },
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "maintenance": false,
        "apps_paths": [
            {
                "path": "\/var\/www\/nextcloud\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/nextcloud\/extra-apps",
                "url": "\/extra-apps",
                "writable": true
            }
        ],
        "enable_previews": true,
        "enabledPreviewProviders": [
            "OC\\Preview\\BMP",
            "OC\\Preview\\GIF",
            "OC\\Preview\\JPEG",
            "OC\\Preview\\Krita",
            "OC\\Preview\\MarkDown",
            "OC\\Preview\\MP3",
            "OC\\Preview\\OpenDocument",
            "OC\\Preview\\PNG",
            "OC\\Preview\\TXT",
            "OC\\Preview\\XBitmap",
            "OC\\Preview\\Font",
            "OC\\Preview\\HEIC",
            "OC\\Preview\\Illustrator",
            "OC\\Preview\\Movie",
            "OC\\Preview\\MSOffice2003",
            "OC\\Preview\\MSOffice2007",
            "OC\\Preview\\MSOfficeDoc",
            "OC\\Preview\\PDF",
            "OC\\Preview\\Photoshop",
            "OC\\Preview\\Postscript",
            "OC\\Preview\\StarOffice",
            "OC\\Preview\\SVG",
            "OC\\Preview\\TIFF"
        ],
        "preview_max_scale_factor": 10,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "version": "32.0.8.2",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "appstore.experimental.enabled": true,
        "updatechecker": false,
        "loglevel": 2,
        "theme": "",
        "integrity.check.disabled": false,
        "asset-pipeline.enabled": true,
        "upgrade.disable-web": "true",
        "logfile": "\/var\/log\/nextcloud\/nextcloud.log",
        "mail_sendmailmode": "smtp",
        "mysql.utf8mb4": true,
        "ldapIgnoreNamingRules": false,
        "ldapProviderFactory": "OCA\\User_LDAP\\LDAPProviderFactory",
        "simpleSignUpLink.shown": false,
        "trashbin_retention_obligation": "30, 90",
        "memories.exiftool": "\/var\/www\/nextcloud\/apps\/memories\/bin-ext\/exiftool-amd64-glibc",
        "memories.vod.path": "\/var\/www\/nextcloud\/apps\/memories\/bin-ext\/go-vod-amd64",
        "memories.vod.ffmpeg": "\/usr\/bin\/ffmpeg",
        "memories.vod.ffprobe": "\/usr\/bin\/ffprobe",
        "memories.gis_type": 1,
        "maintenance_window_start": "1",
        "allow_local_remote_servers": "1",
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***"
    }
}

Apps

The output of occ app:list (if possible).

Enabled:
- activity: 5.0.0
- bookmarks: 16.1.3
- calendar: 6.2.2
- checksum: 2.0.3
- circles: 32.0.0
- cloud_federation_api: 1.16.0
- comments: 1.22.0
- contacts: 8.3.7
- contactsinteraction: 1.13.1
- dav: 1.34.2
- epubviewer: 1.9.2
- event_update_notification: 2.8.0
- federatedfilesharing: 1.22.0
- federation: 1.22.0
- files: 2.4.0
- files_3dmodelviewer: 0.0.16
- files_antivirus: 6.2.0
- files_downloadactivity: 1.18.1
- files_downloadlimit: 5.0.0
- files_external: 1.24.1
- files_fulltextsearch: 32.0.2
- files_fulltextsearch_tesseract: 32.0.0
- files_pdfviewer: 5.0.0
- files_reminders: 1.5.0
- files_sharing: 1.24.1
- files_trashbin: 1.22.0
- files_versions: 1.25.0
- firstrunwizard: 5.0.0
- forms: 5.2.5
- fulltextsearch: 32.0.0
- fulltextsearch_elasticsearch: 32.0.2
- guests: 4.6.0
- integration_github: 3.2.2
- integration_gitlab: 4.0.0
- ldap_write_support: 1.14.1
- lookup_server_connector: 1.20.0
- mail: 5.7.6
- maps: 1.6.0
- music: 3.0.0
- notes: 4.13.1
- notifications: 5.0.0
- notify_push: 1.3.1
- oauth2: 1.20.0
- photos: 5.0.0
- polls: 8.6.3
- previewgenerator: 5.13.0
- privacy: 4.0.0
- profile: 1.1.0
- provisioning_api: 1.22.0
- quota_warning: 1.23.0
- related_resources: 3.0.0
- richdocuments: 9.0.5
- serverinfo: 4.0.0
- settings: 1.15.1
- sharebymail: 1.22.0
- spreed: 22.0.11
- systemtags: 1.22.0
- tasks: 0.17.1
- text: 6.0.1
- theming: 2.7.0
- twofactor_backupcodes: 1.21.0
- user_ldap: 1.23.0
- viewer: 5.0.0
- weather_status: 1.12.0
- webhook_listeners: 1.3.0
- workflowengine: 2.14.0

Tips for increasing the likelihood of a response

  • Use the preformatted text formatting option in the editor for all log entries and configuration output.
  • If screenshots are useful, feel free to include them.
    • If possible, also include key error output in text form so it can be searched for.
  • Try to edit log output only minimally (if at all) so that it can be ran through analyzers / formatters by those trying to help you.

So you can access to it. When this hang happens, can you access the disk and e.g. read a random file? Perhaps it puts the disk in a sleep mode and it has trouble spinning up? I’d also check other system logs. Or if you run low on memory and the system starts caching or something like that?
Or is there another systematic, e.g. certain cronjobs?

During that CalDAV hangs, the server itself works fine (except for the extreme CPU load due to Apache/Nextcloud). I don’t see any abnormal I/O activity or other suspicious things happen in any logs. Reading data from the disk works perfectly, accessing the MariaDB database as well.

As soon as I close Thunderbird and restart Apache, the server goes back to normal.

The more Nextcloud calendars are subscribed in Thunderbird, the more likely this happens.

As this did never ever happen in Nextcloud 31 and before.

Anything in your nextcloud.log? Can you check if at the time of the hang there’s a deadlock situation in MariaDB?

No errors or warnings in nextcloud.log.

How can I check for a deadlock situtation in MariaDB?

I switched Redis from socket to IP around 3 weeks ago.

Since then I haven’t had any hangs any more.

Hi @OldNobody,

Great that it’s working now, and thank you for coming back to report — most people don’t, and it really helps anyone hitting the same issue later.

As a tech person I’d love to understand exactly why the socket → IP switch fixed it, because it’s not obvious. Your config.php shows:

port: 0 means Nextcloud was connecting to Redis via Unix socket. timeout: 0 means no timeout — if the socket connection ever stalled, Nextcloud’s PHP process would wait forever. With 20 CalDAV subscriptions syncing concurrently in Thunderbird, any momentary Redis hiccup (brief memory pressure, RDB save, whatever) would cause multiple Apache workers to hang indefinitely, and the cascade you described follows naturally.

There is also a second potential layer: PHP-level session locking via phpredis (redis.session.locking_enabled). If that was enabled on your system, a stalled socket connection during session lock acquisition would have the same effect — the worker hangs waiting for a Redis response that never comes. The grep below will tell us whether that was in play too.

TCP connections have kernel-level keepalive and much better stall detection. That’s almost certainly what made the difference.

To confirm the picture, two things would be very helpful:

1. Your Redis server config — the non-comment, non-empty lines:

grep -v '^\s*#' /etc/redis/redis.conf | grep -v '^\s*$'

2. Your active PHP Redis settings:

grep -R 'redis\.' /etc/php/8.4/ 2>/dev/null | grep -v '^\s*;' | grep -v '^\s*$'

And if you’re happy sharing it: the new redis block from your config.php (host/port/timeout after the change), to see whether Redis moved to a different host or stayed local with just the protocol change.

The reason I ask about the host: if Redis moved to a separate machine, the real cause might have been resource pressure from co-locating Redis with Apache and MariaDB rather than the socket/TCP difference itself.


ernolf

Redis, Apache and MariaDB are still on the same server.

In addition to not having experienced any hangs since switching Nextcloud from socket connection to IP, the last Nextcloud update was the first one which did not break with “Redis server went away”. Seems like Nextcloud has some trouble in general when using Redis via socket.

config.php:
‘memcache.local’ => ‘\OC\Memcache\APCu’,
‘filelocking.enabled’ => true,
‘memcache.locking’ => ‘\OC\Memcache\Redis’,
‘memcache.distributed’ => ‘\OC\Memcache\Redis’,
‘redis’ =>
array (
‘host’ => ‘127.0.0.1’,
‘port’ => ‘6379’,
),

PHP continues to use a socket:
session.save_handler = redis
session.save_path = “unix:///var/run/redis-php-session/redis-server.sock?&database=10”
redis.session.locking_enabled = 1
redis.session.lock_retries = -1
redis.session.lock_wait_time = 10000

The Redis config is Debian’s default one and pretty long, so I do not share it here.
I simply run a seperate instance (redis-server@nextcloud.service) for Nextcloud:

include /etc/redis/redis.conf
bind 127.0.0.1 ::1
port 6379
pidfile /var/run/redis-nextcloud/redis-server.pid
loglevel notice
logfile /var/log/redis/redis-server@nextcloud.log

Hi @OldNobody,

Glad to hear things have stabilised — and the detail you’ve shared does help.

  • session.save_path: typo

    unix:///var/run/redis-php-session/redis-server.sock?&database=10
    

    The ?&: in standard URL query string syntax, ? introduces the query string and & separates subsequent parameters. ?&database=10 therefore has an empty first parameter before database=10. phpredis passes the query string directly to PHP’s parse_str() (redis_session.c L659) — and parse_str("&database=10", ...) silently drops the empty leading parameter and correctly extracts database = 10. The typo is functionally harmless, but the correct form is unambiguous and costs nothing to fix.


Two independent locking mechanisms — and two independent ways to fail

Before the questions below make sense, it helps to be clear that your setup contains two completely separate Redis-based locking mechanisms. They fail independently and produce different symptoms.

  1. Nextcloud file locking (memcache.locking => Redis) uses the connection defined in config.php — in your case TCP 127.0.0.1:6379. This is Nextcloud’s own transactional locking against race conditions on file operations. When that Redis instance is unreachable or restarts, Nextcloud throws a locking exception. The error “Redis server went away” comes from here.

  2. PHP session storage and locking is controlled by three settings that must all be present and consistent:

    session.save_handler = redis
    session.save_path    = unix:///path/to/redis.sock  (or tcp://host:port)
    redis.session.locking_enabled = 1
    

    redis.session.locking_enabled only has any effect when session.save_handler = redis is active and session.save_path points to a reachable Redis instance. To understand exactly what happens when all three are in place, let’s look at the source: the lock_acquire() function in redis_session.c (L344–L408) shows that phpredis acquires a distributed lock key in Redis via SET key NX PX expiry before serving a PHP session. If session.save_path points to a socket that does not exist, phpredis fails fast: PS_READ_FUNC emits E_WARNING: "Redis connection not available" and returns FAILURE, which causes session_start() to fail — no hang. When a concurrent request holds the session lock, the waiting request retries at redis.session.lock_wait_time intervals. With redis.session.lock_retries = -1 (the retry condition at L397: retries >= 0 && attempt++ >= retries evaluates false for -1), the retry loop runs until the lock key expires in Redis. Note that redis.session.lock_retries = -1 is explicitly recommended by the Nextcloud documentation to prevent session corruption — so this is not a misconfiguration. The lock TTL is controlled by redis.session.lock_expire (default: 0). When redis.session.lock_expire is 0 — which it is unless explicitly set — phpredis falls back to PHP’s max_execution_time as the lock TTL. Only if both are zero does the lock key have no expiry at all. In a Nextcloud context this matters: the Nextcloud documentation recommends max_execution_time = 3600 for large file uploads, and virtually every Nextcloud installation follows this. With redis.session.lock_expire = 0 and max_execution_time = 3600, a request waiting for a held session lock can be blocked for up to one hour. There is no automatic PHP-level fallback to file-based sessions if the Redis session handler fails. phpredis does offer an opt-in partial fallback via redis.session.lock_failure_readonly = 1, which delivers a read-only session instead of a hard failure when lock acquisition fails — but this is not the default.

These two mechanisms are completely orthogonal and use separate Redis connections. Switching Nextcloud file locking to TCP eliminates one failure path and does not affect PHP session behaviour in any way. The questions below exist because we do not yet know whether the session socket actually exists and which Redis process serves it.


The diagnostics I asked for earlier still matter — and there are now more open questions than before

I checked: the Debian bookworm package (redis-server_7.0.15-1~deb12u7) ships with port 6379 active and the unixsocket directive commented out. The file itself states: “There is no default, so Redis will not listen on a unix socket when not specified.” In other words, out of the box, the Debian Redis listens on TCP only. That already raises a list of questions that your post leaves open:

  1. Does the socket even exist? The Debian default has the unixsocket directive commented out. If your /etc/redis/redis.conf was not modified to enable it, the file /var/run/redis-php-session/redis-server.sock does not exist — and every PHP session attempt fails with E_WARNING: "Redis connection not available" and session_start() returns FAILURE. No automatic fallback to file-based sessions occurs.

  2. If the socket does not exist: what happens to CalDAV? Looking at the phpredis source (redis_session.c L360–L408), a connection failure causes an immediate FAILURE return — no retry, regardless of redis.session.lock_retries. session_start() fails fast. Nextcloud’s CalDAV auth plugin (Auth.php) uses the PHP session to store and check a DAV_AUTHENTICATED key on every request. If session_start() fails, that check cannot run — CalDAV authentication breaks. The requests do not hang; they fail with errors. Whether those errors produce the “Redis server went away” symptom you described or a different error message is one of the things the diagnostics below would clarify.

  3. If the socket does exist: where does it come from? The Debian default has the unixsocket directive commented out at /run/redis/redis-server.sock. Your path /var/run/redis-php-session/redis-server.sock resolves to the same base (/var/run is a symlink to /run), but the subdirectory redis-php-session is not the package default redis. Your /etc/redis/redis.conf would have to have been modified to both enable the socket and set a custom subdirectory — which contradicts calling it “Debian’s default.” As a side note, FHS 3.0 recommends using /run directly rather than the legacy /var/run symlink.

  4. If /etc/redis/redis.conf was modified to port 0 + unixsocket: are both Redis instances actually running? Your redis-server@nextcloud.service includes that same file and then overrides port 6379. Redis’s initListeners() registers TCP and Unix socket as independent listeners and activates both in the same loop — so yes, a single Redis instance listens on TCP and a Unix socket simultaneously when both are configured. That means the nextcloud instance would accept connections on both TCP 6379 and the inherited socket. The default redis-server.service would also try to start — on the socket only, with port 0. Two processes, one socket file: only one can create it. The other fails silently at startup.

  5. What did the original broken setup look like — and what actually caused “Redis server went away”? Your original config.php had "port": 0. In Nextcloud’s RedisFactory.php L71, port: 0 means Unix socket — the host value is the socket path. So Nextcloud was previously connecting to Redis via a Unix socket, not TCP. Redis itself does not restart during a Nextcloud update, so “Redis server went away” was not caused by the Nextcloud updater directly. The more likely trigger is a simultaneous apt upgrade updating the redis-server package itself, or a PHP-FPM reload that drops persistent connections. With two Redis instances both configured with sockets — one for Nextcloud, one for PHP sessions — a redis-server package update restarting redis-server.service could delete and recreate the socket file, breaking existing connections from redis-server@nextcloud.service that had inherited the same socket via include. That would explain both symptoms at once: “Redis server went away” when the socket briefly disappeared, and CalDAV failures when session_start() could not reach its Redis instance. This remains a hypothesis. Confirming or ruling it out requires the outputs I asked for — specifically the actual active lines of /etc/redis/redis.conf and the systemctl status of both instances. Describing the config as “Debian’s default” while clearly having modified it does not help narrow this down.

The output of:

grep -v '^\s*#' /etc/redis/redis.conf | grep -v '^\s*$'
systemctl status redis-server
systemctl status redis-server@nextcloud

would answer questions 1–3 directly.

The PHP side is equally important. The command:

grep -R -E 'redis\.|session\.save' /etc/php/8.4/ 2>/dev/null | grep -v ':\s*;' | grep -v '^\s*$'

was intentionally written to show which file each setting lives in — not just the values. PHP configuration is spread across php.ini, conf.d drop-ins, and FPM pool configs, and a directive in the wrong file can be silently ignored.


Epiphany: do you actually need Redis as a session handler at all?

The Nextcloud documentation is clear on the purpose: Redis session handling exists for deployments where PHP runs on multiple machines or containers simultaneously — so that all instances share the same session store. Real cases where this is actually needed: the official Nextcloud Helm chart for Kubernetes explicitly supports running multiple Nextcloud pods simultaneously (via replicaCount or Horizontal Pod Autoscaling up to 10 pods behind a load balancer), and its documentation states that Redis is required in that configuration. Docker Swarm with multiple Nextcloud containers is another equivalent scenario. In all these cases, a user’s request may land on any node, and without a shared session store, one node would not recognise a session created by another.

You stated that Redis, Apache and MariaDB all run on the same machine. As long as there is only one PHP runtime (whether bare-metal, VM, or a single container), all PHP-FPM workers share the same filesystem and the standard file-based session handler works perfectly. Redis as a session handler adds complexity with no benefit in that situation.

The straightforward fix is to remove the session-related lines from whichever PHP config file(s) they live in — the grep output above will tell you exactly which files to edit. Specifically, remove:

session.save_handler = redis
session.save_path    = ...
redis.session.locking_enabled = ...
redis.session.lock_retries = ...
redis.session.lock_wait_time = ...

Leave extension=redis.so in place — Nextcloud still needs the Redis PHP extension for memcache.locking. Only the session-related directives go. Important: session.save_path must be removed together with session.save_handler — if session.save_handler reverts to files but session.save_path still contains a Redis URL, the file handler will try to use that URL as a filesystem path, which will fail.

files is the PHP default; the Debian PHP-FPM package already configures a working session.save_path (typically /var/lib/php/sessions). No explicit setting is needed — the defaults take over once the Redis overrides are gone.

If you do want to keep Redis for sessions despite the above, at minimum use TCP to the same instance Nextcloud already uses, and drop the Unix socket entirely:

session.save_path = tcp://127.0.0.1:6379?database=10

No obligation to reply — but if you do, the outputs above would let us close the remaining questions properly rather than leaving them as educated guesses.


ernolf

This is nice to read, very long, party misleading and not especially helpful.

In case this gets read by the human responsible for this:
Please stop stealing peoples’ time, making my storage incredibly expensive, and accelerate global warming.

Sorry it wasn’t useful. I spent most of a day reading source code and working through your diagnostic output, hoping to find the actual root cause. I did it because I wanted to help, not for any other reason. I’m sorry it landed the way it did.

peace :folded_hands:


ernolf