Upgrade locking, nextcloud-init-sync.lock, "Another process is initializing..." - ephemeral vs persistent lock

I run Nextcloud in a Docker container. I use watchtower to monitor for new images and auto-upgrade the container. Today at midnight it found 24.0.1.1 and attempted to upgrade from 24.0.0.12. It got stuck so Nextcloud was down. Hereā€™s a log snippet:

app_1            | Conf remoteip disabled.
app_1            | To activate the new configuration, you need to run:
app_1            |   service apache2 reload
app_1            | Configuring Redis as session handler
app_1            | Initializing nextcloud 24.0.1.1 ...
app_1            | Upgrading nextcloud from 24.0.0.12 ...
app_1            | Another process is initializing Nextcloud. Waiting 10 seconds...
app_1            | Another process is initializing Nextcloud. Waiting 20 seconds...
app_1            | Another process is initializing Nextcloud. Waiting 30 seconds...
app_1            | Another process is initializing Nextcloud. Waiting 40 seconds...
app_1            | Another process is initializing Nextcloud. Waiting 50 seconds...
app_1            | Another process is initializing Nextcloud. Waiting 60 seconds...

I found a workaround here: [Bug]: Upgrade 23.0.3 to 23.0.4 docker Server does not migrate Ā· Issue #1742 Ā· nextcloud/docker Ā· GitHub

So, yay! Manually deleting nextcloud-init-sync.lock and restarting the container worked: it was able to resume and complete the upgrade.

My question is about the lock itself. nextcloud-init-sync.lock seems like a ā€œpersistent lockā€, meaning, the presence of that file is used to indicate something else is in-progress. Iā€™m guessing the upgrade somehow stopped halfway (maybe a download timed out? I donā€™t know).

Is it possible to use an ā€œephemeral lockā€ instead, such as a call to PHPā€™s flock() function? Maybe that would be a more robust way to lock during an upgrade.

I searched around the nextcloud/server code a bit and couldnā€™t find the code responsible for creating and checking this file.

3 Likes

Ah, no wonder I couldnā€™t find it in the server code, this is done in Docker-related code:
https://github.com/nextcloud/docker/blob/master/docker-entrypoint.sh

Maybe this will prevent the same failure in the future: use ephemeral instead of manual/persistent lock during upgrade Ā· Issue #1756 Ā· nextcloud/docker Ā· GitHub

1 Like

Note this is still causing some pain and there is still interest in using an ephemeral flock. Keep your eyes on this patch: