OCC scan takes for ever

We have migrated 60TB of data from an overly complicated windows server setup to Nextcloud to use for about 30 users .

The setup is on Truenas Scale on a server that should handle it. Reddis is up and running and everything is hunky dory except occ scanning.

I used this command the first time I scanned 9tb of files and it didnt take to long (from shell in apps)
su
su -m www-data -c “php /var/www/html/occ files:scan --all”

After this I have added some more group folders and the second group folder has another 9 tb of files and I have used

su -m www-data -c “php /var/www/html/occ groupfolders:scan -v --all”
where I got the dreaded scanner locked error which is why I now have Redis.

And I have used su -m www-data -c “php /var/www/html/occ groupfolders:scan -v 8”
Where 8 is the __groupfolders/8/ path.

Right now the scanner goes from fast until it slows down to 1-2 files a second and with a few milion files this will take forever.

Is there a way to troubleshot this? Or a way to get this to be faster? I had text search app enabled, can this be a reason? Or is it the type of files?

I have used -v to see if I get a file lock error.

I still have 50 TB of data to sync in to the system

I am running on 27.1.3

scan time

regards
Tomas

Instead of asking this question here, you should try whether deactivating the app for the duration of the scan results in faster processing. It could be possible.

A groupfolders:scan is a load-intensive action that, in addition to occ, loads the databases (MariaDB and Redis).

You should use home remedies like top, htop, iostat, free etc. to observe the scan in real time to determine whether the resources are sufficient and, above all, whether MariaDB has enough memory available to move freely.

Is your kernel properly tuned to process enough simultaneous asynchronous io requests?

Fixing sysctl configuration (/etc/sysctl.conf):

It is a system wide setting: Linux FS Kernel settings

You can check its values via:

$ cat /proc/sys/fs/aio-*
65536
2305

For example, to set the aio-max-nr value, add the following line to the /etc/sysctl.conf file:

fs.aio-max-nr = 1048576

To activate the new setting:

$ sysctl -p /etc/sysctl.conf

There are a number of other possible bottlenecks that can be responsible for the slowdown.
Observe, find the cause.

Much luck,
ernolf

Thank you. I had disabled elastic search already but you have led me to one path that might give me an answer.

The server is maybe not the strongest and that might be the issue.

5 x MIRROR | 2 wide | 16.37 TiB, 128 GiB, Intel(R) Atom™ CPU C3758 @ 2.20GHz, 8 core TrueNAS-SCALE-22.12.3.3

I stared occ scan from shell (its running in a truenas scale pod) and just typing
topfrom Truenas Scale’ s shell I see that the postgress server (pod) maxes out when the scanner hits some files. It will scan a few thousand files in some folders, then stop and continue.

So this seems to be a issue with postgres in the pod.

023-11-04T02:53:19.369295109Z 2023-11-04 02:53:19.369 UTC [109616] ERROR:  duplicate key value violates unique constraint "oc_filecache_extended_pkey"
2023-11-04T02:53:19.369387859Z 2023-11-04 02:53:19.369 UTC [109616] DETAIL:  Key (fileid)=(4664648) already exists.
2023-11-04T02:53:19.369412642Z 2023-11-04 02:53:19.369 UTC [109616] STATEMENT:  INSERT INTO "oc_filecache_extended" ("fileid", "upload_time") VALUES($1, $2)
2023-11-04T02:53:19.480384048Z 2023-11-04 02:53:19.480 UTC [175394] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:19.480490312Z 2023-11-04 02:53:19.480 UTC [175394] DETAIL:  Key (file_id, "timestamp")=(4666021, 1544559530) already exists.
2023-11-04T02:53:19.480515150Z 2023-11-04 02:53:19.480 UTC [175394] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:19.509575490Z 2023-11-04 02:53:19.509 UTC [157972] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:19.509656755Z 2023-11-04 02:53:19.509 UTC [157972] DETAIL:  Key (file_id, "timestamp")=(4667617, 1550841542) already exists.
2023-11-04T02:53:19.509681731Z 2023-11-04 02:53:19.509 UTC [157972] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:31.079134628Z 2023-11-04 02:53:31.078 UTC [150162] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:31.079354823Z 2023-11-04 02:53:31.078 UTC [150162] DETAIL:  Key (file_id, "timestamp")=(4667699, 1550841462) already exists.
2023-11-04T02:53:31.079381252Z 2023-11-04 02:53:31.078 UTC [150162] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:31.214466832Z 2023-11-04 02:53:31.214 UTC [117638] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:31.214527424Z 2023-11-04 02:53:31.214 UTC [117638] DETAIL:  Key (file_id, "timestamp")=(4664566, 1699066387) already exists.
2023-11-04T02:53:31.214551760Z 2023-11-04 02:53:31.214 UTC [117638] STATEMENT:  INSERT INTO "oc_group_folders_versions" ("file_id", "timestamp", "size", "mimetype", "metadata") VALUES($1, $2, $3, $4, $5)
2023-11-04T02:53:36.997713232Z 2023-11-04 02:53:36.997 UTC [150365] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:36.997912223Z 2023-11-04 02:53:36.997 UTC [150365] DETAIL:  Key (file_id, "timestamp")=(4667704, 1550841509) already exists.
2023-11-04T02:53:36.997937940Z 2023-11-04 02:53:36.997 UTC [150365] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:37.019759857Z 2023-11-04 02:53:37.019 UTC [111196] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:37.019833049Z 2023-11-04 02:53:37.019 UTC [111196] DETAIL:  Key (file_id, "timestamp")=(4664531, 1532905459) already exists.
2023-11-04T02:53:37.019857494Z 2023-11-04 02:53:37.019 UTC [111196] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:37.065251719Z 2023-11-04 02:53:37.065 UTC [112413] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:37.065333512Z 2023-11-04 02:53:37.065 UTC [112413] DETAIL:  Key (file_id, "timestamp")=(4664579, 1548877199) already exists.
2023-11-04T02:53:37.065358007Z 2023-11-04 02:53:37.065 UTC [112413] STATEMENT:  UPDATE "oc_group_folders_versions" SET "timestamp" = $1 WHERE "id" = $2
2023-11-04T02:53:37.082539951Z 2023-11-04 02:53:37.082 UTC [111196] ERROR:  duplicate key value violates unique constraint "oc_filecache_extended_pkey"
2023-11-04T02:53:37.082618152Z 2023-11-04 02:53:37.082 UTC [111196] DETAIL:  Key (fileid)=(4664596) already exists.
2023-11-04T02:53:37.082643534Z 2023-11-04 02:53:37.082 UTC [111196] STATEMENT:  INSERT INTO "oc_filecache_extended" ("fileid", "upload_time") VALUES($1, $2)
2023-11-04T02:53:37.116418369Z 2023-11-04 02:53:37.116 UTC [112413] ERROR:  duplicate key value violates unique constraint "oc_filecache_extended_pkey"
2023-11-04T02:53:37.116502861Z 2023-11-04 02:53:37.116 UTC [112413] DETAIL:  Key (fileid)=(4664577) already exists.
2023-11-04T02:53:37.116527689Z 2023-11-04 02:53:37.116 UTC [112413] STATEMENT:  INSERT INTO "oc_filecache_extended" ("fileid", "upload_time") VALUES($1, $2)
2023-11-04T02:53:37.121831558Z 2023-11-04 02:53:37.121 UTC [158421] ERROR:  duplicate key value violates unique constraint "gf_versions_uniq_index"
2023-11-04T02:53:37.121901125Z 2023-11-04 02:53:37.121 UTC [158421] DETAIL:  Key (file_id, "timestamp")=(4667721, 1699066387) already exists.
2023-11-04T02:53:37.121925670Z 2023-11-04 02:53:37.121 UTC [158421] STATEMENT:  INSERT INTO "oc_group_folders_versions" ("file_id", "timestamp", "size", "mimetype", "metadata") VALUES($1, $2, $3, $4, $5)

Its strange, because the first 9TB I added went really fast so I am wondering if there are some file name or file types folder names (just throwing thoughts out there) that might cause this.

I will continue to look and if I find a solution post it here. I see more people then me have this problem. (slow occ scan file)

regards
Tomas