Integrity checks take an extremely long time

Nextcloud version (eg, 20.0.5): 28.0.4
Operating system and version (eg, Ubuntu 20.04): Ubuntu 22.04.3 LTS/Kubernetes/Official Docker Image
Apache or nginx version (eg, Apache 2.4.25): Nginx 1.25.4
PHP version (eg, 7.4): 8.2.17

The issue you are facing:
Whenever I perform an upgrade for Nextcloud the process completes successfully, but takes a long time. For example when upgrading from 28.0.3.2 to 28.0.4.1 today, the container goes through the process normally: Nextcloud/apps require upgrade, turning on maintenance mode, updating database, updating a handful of apps. That takes less than a minute to complete. Then it starts the code integrity check, this normally takes between 20 minutes and an hour. If I look at the CPU usage of the host this image runs on, there’s minimal activity. If I look at the memory usage of the host, the memory usage will ramp up until it consumes the entirety of the host’s memory with cache data (roughly 40 Gi).

Once the code integrity check completes, the upgrade finishes in a matter of seconds, and everything is happy. The issue is that the code integrity check just runs forever. Once upgraded, the system is slow for the first few minutes, then behaves normally. File uploads and downloads are quick, performance is as I’d desire. This has happened on every upgrade since the server was setup roughly a year or so ago.

This system is running in Kubernetes using the official helm chart, and the backend data storage is running on CephFS. There’s roughly 110GB data stored on the system.

Is this the first time you’ve seen this error? (Y/N): N

Steps to replicate it:

  1. Perform an upgrade of Nextcloud

Upgrade log:

Thu, Apr 4 2024 6:14:45 pm Configuring Redis as session handler
2024-04-04T18:14:46.003314682-05:00 Initializing nextcloud 28.0.4.1 ...
2024-04-04T18:14:46.003334872-05:00 Upgrading nextcloud from 28.0.3.2 ...
Thu, Apr 4 2024 6:16:59 pm => Searching for scripts (*.sh) to run, located in the folder: /docker-entrypoint-hooks.d/pre-upgrade
Thu, Apr 4 2024 6:16:59 pm Nextcloud or one of the apps require upgrade - only a limited number of commands are available
2024-04-04T18:16:59.119831129-05:00 You may use your browser or the occ upgrade command to do the upgrade
Thu, Apr 4 2024 6:16:59 pm Setting log level to debug
Thu, Apr 4 2024 6:16:59 pm Turned on maintenance mode
Thu, Apr 4 2024 6:16:59 pm Updating database schema
Thu, Apr 4 2024 6:16:59 pm Updated database
Thu, Apr 4 2024 6:16:59 pm Updating <circles> ...
Thu, Apr 4 2024 6:16:59 pm Updated <circles> to 28.0.0
Thu, Apr 4 2024 6:16:59 pm Updating <support> ...
Thu, Apr 4 2024 6:16:59 pm Updated <support> to 1.11.1
Thu, Apr 4 2024 6:17:03 pm Starting code integrity check...
Thu, Apr 4 2024 6:35:37 pm Finished code integrity check
Thu, Apr 4 2024 6:35:37 pm Update successful
Thu, Apr 4 2024 6:35:37 pm Turned off maintenance mode
Thu, Apr 4 2024 6:35:37 pm Resetting log level
Thu, Apr 4 2024 6:36:08 pm The following apps have been disabled:
Thu, Apr 4 2024 6:36:08 pm  circles
2024-04-04T18:36:08.685762462-05:00  support
Thu, Apr 4 2024 6:36:08 pm => Searching for scripts (*.sh) to run, located in the folder: /docker-entrypoint-hooks.d/post-upgrade
Thu, Apr 4 2024 6:36:08 pm Initializing finished
Thu, Apr 4 2024 6:36:08 pm => Searching for scripts (*.sh) to run, located in the folder: /docker-entrypoint-hooks.d/before-starting
Thu, Apr 4 2024 6:36:08 pm [04-Apr-2024 23:36:08] NOTICE: fpm is running, pid 1
Thu, Apr 4 2024 6:36:08 pm [04-Apr-2024 23:36:08] NOTICE: ready to handle connections

Is it similarly slow (or really fast) when you run an integrity check against the core only? You can do so via occ integrity:check-core.

My gut tells me this is the app side of the integrity checks hitting your situation for some reason, not the core checks. Still a problem, but determining this might give us a better clue as to where to look to figure out what is going on.

If my hunch is wrong, that would be useful to learn as well so that we can explore other possible culprits.

Btw, do the integrity checks ever fail?

P.S. If it is faster against core, please share the output of occ app:list.