Nextcloud stale oc_filecache records and client errors

Greetings,

I am having some issues with general client (desktop and web) usage. None of the issues can be replicated on demand and are very unusual form my experience of both nextcloud and owncloud.

All of these errors imply that I have to, at least once a week, clear specific (and always distinct) oc_filecache entries generating errors on the Nextcloud client and leading to some unexpected behavior while using the nextcloud webpage for the same purpose. I usually avoid touching the database for web applications that I have not written myself, but these index/cosmetic errors are too much of an eyesore.

The most frequent errors I have gotten on the client are:

  • The file has been deleted from the server (translated from portuguese)
  • Not permitted, because you do not have permission to add files to this folder (translated from portuguese)

analysis notes

  • I can conclude that oc_filecache records are not being removed/updated after some files are moved/renamed/deleted. I have found that both the old and new filename exist on oc_filecache after a rename
  • I only remove records that show up as an error in the client and have no corresponding filesystem file on the server. I have done ~10 removals spread over 3 weeks. Maybe a lotal of 20 records.
  • after removing the bad record, the client shows a green checkmark as it should
  • running the ‘occ’ file:scan and cleanup tools has produced no results. After analyzing the code, I understood that these will only create records
    • sometimes, the file:scan tool will add a new record for the final name of a renamed file. In this case, the old record is kept
  • files were never renamed manually on the CLI of the server
  • when this happens for a folder, accessing the folder in the web interface, sends you back to the default webpage (owncloud.domain.tld)
  • when this happens for a file, accessing the file in the web interface, a 404 is returned with information: remote address and request ID. I have not been able to ma
  • I have searched this issue extensively on this issue and the most similar results are the following:
    https://github.com/nextcloud/server/issues/4786
    File was deleted from server, but still visible on web-ui (also for desktop-client)
    OCC files:cleanup - Does it delete the db table entries of the missing files?
    Fastest way to remove large number of filecache entries

Unfortunately, the most useful logs between (webserver, nginx, nextcloud, audit and client) have been those of the client. Others are barren at the times the issue happens. Audit logs shows similar information than oc_activity about the troublesome renames and removes. It is always nominal compared to other instances.

Nextcloud Server version (eg, 18.0.2): 18.0.2
Nextcloud Desktop version (eg, 18.0.2): 2.4.3
Operating system and version (eg, Ubuntu 20.04): OpenBSD 6.6
Apache or nginx version (eg, Apache 2.4.25): nginx 1.16.1
PHP version (eg, 7.1): php 7.3.16
RDBMS: postgresql 11.7
Storage: local filesystem storage, no additional features
Additional features: LDAP integration for user management and authentication only.

The issue you are facing:
I am facing various client issues with a slightly larger server that I am running. See the top of this post.

Is this the first time you’ve seen this error? (Y/N):
Yes, I started seeing this errors after migrating all the way from Owncloud 9. The first instances of the error only showed up after 3 weeks to a month. Over 2 months after the upgrade, the errors keep comming up. Other instances do not have this error.

Steps to replicate it:
Cannot replicate on demand. Issue is always related to moving or renaming files.

The output of your Nextcloud log in Admin > Logging:
No records at the time of the rename events.

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'integrity.check.disabled' => true,
  'instanceid' => 'INSTANCEID',
  'passwordsalt' => 'SALT',
  'secret' => 'SECRET',
  'trusted_domains' =>
  array (
    0 => 'owncloud.domain.TLD',
    1 => 'nextcloud.domain.tld',
    2 => 'cloud.domain.tld',
  ),
  'datadirectory' => ((php_sapi_name () == 'cli')? '/var/www': '') . '/nextcloud/data',
  'overwrite.cli.url' => 'https://owncloud.domain.tld',
  'version' => '18.0.2.2',
  'dbtype' => 'pgsql',
  'dbname' => 'owncloud',
  'dbhost' => '127.0.0.1:5432',
  'dbtableprefix' => 'oc_',
  'dbuser' => 'DBUSER',
  'dbpassword' => 'DBPASSWORD',
  'logtimezone' => 'UTC',
  'installed' => true,
  'loglevel' => 1,
  'ldapIgnoreNamingRules' => false,
  'memcache.local' => '\\OC\\Memcache\\Redis',
  'filelocking.enabled' => 'true',
  'memcache.distributed' => '\\OC\\Memcache\\Redis',
  'memcache.locking' => '\\OC\\Memcache\\Redis',
  'redis' =>
  array (
    'host' => 'localhost',
    'port' => 6379,
    'timeout' => 0,
    'password' => '',
    'dbindex' => 0,
  ),
  'mail_from_address' => 'FROMADDRESS',
  'mail_smtpmode' => 'sendmail',
  'mail_domain' => 'domain.tld',
  'ldapProviderFactory' => '\\OCA\\User_LDAP\\LDAPProviderFactory',
  'maintenance' => false,
  'default_locale' => 'pt',
  'lost_password_link' => 'disabled',
  'mail_sendmailmode' => 'pipe',
);

The output of your Apache/nginx/system log in /var/log/____:
No relevant logs in either HTTP, HTTPS or error logs. While cross-referencing other logs, I can see there are no timeouts.

The output of your audit.log:
Only records showing that the file was in fact moved or deleted.

Conclusion
I bet there is more info I could include in this post, but I cannot see what else might be relevant right now. This not a new instance as I have show, so I can include some details about its past.

The only paths I see towards a solution are:

  • reinstall the server. Not sure how this would help. My resource constraints make it more acceptable to keep fixing these errors as they come up
  • nuke and rebuild oc_filecache. Not sure how this would really help in the long run as some of the emerging problems seem to be about files being renamed
  • reinstall desktop clients. I fail to see how this would avoid these issues in oc_filecache, but it is my best bet considering that we upgraded client in-place (keeping files as documented elsewhere on these forums) between owncloud and nextcloud. This is mostly FUD speaking.

Regards,
mm

Any help at all would be appreciated. I find myself reinstalling nextcloud clients or clearing oc_filecache records very often on this particular instance. At this point I developed several procedures for dealing with this user-made spontaenous error generation

This is essentially what is happening:

Database encoding perhaps? Tried changing from SQL_ASCII to UTF8. The filesystem is UTF8. Found some limitations in the import/export between SQL_ASCII and UTF8 of the database. The only issues were on the oc_filecache table, which I was meaning to clear ever since the 404 issues started. Afterwards, I rebuilt the table with the following command and everything seems (:>) to be in its place.
occ files:scan --all

I see that some desktop clients are being more communicative, but the visibility of this is not great on the server-side.

More on these encoding issues. After a short analysis, it seems that:

  • SQL_ASCII actually means no encoding, which leads me to believe that the database has been pilling up multiple encodings.
  • the importing to the UTF8 database failed. All share-related information disappeared. Users lost acces to shares that they had access to previously

The most apparent error comes from the only error message during importing of the database;
ERROR: invalid byte sequence for encoding “UTF8”: 0xa3
CONTEXT: COPY oc_filecache, line 26646

Thus, I do not see a solution to the above issue ever residing on how the database is exported and imported using solely the postgresql CLI interface. Thus another solution is needed.

The solutions I see at this point are:

  • reinstall a server with UTF8 database backed. This is very unlikely due to:
    • burden of recreating shares
    • burden of migrating clients and making sure there is not data-loss in the process
    • Unplanned work like managing limited desktop disk space
    • loss of all types of historic data
  • partial reinstall by recreating all user accounts, files and shares only
    • I do not see many advantages to this as compared to the previous option
  • using one of the encoding conversion tools out there. Both have documented limitations.
    • iconv. There are a few variations of implementation of this CLI program
      • all of my attempts produced broken filenames ‘ção’ became ‘ção’
    • perl’s Encoding::FixLatin
      • this one seems (:>) to migrate almost everything correctly
      • there are about 500 records that have a similar issue between them. The issue seems to be a bonus string at the suffix of a directory (no more than 20 directories). This might have been a bad client so I am going to disregard or rebuild cache for these. The string is:
        • before converting: ‘otimização2<A3>o’
        • after converting: ‘otimização2£o’
      • do not know if it is important or not, but, before importing the converted dump, I changed the ‘SET client_encoding’ value to ‘UTF8’, which was set to SQL_ASCII
    • recode
      • all of my attempts produced broken filenames ‘ção’ became ‘ção’
    • will the Nextcloud desktop clients even agree with this or will they argue?
    • at this point, between all upgrades, this database has been manipulated by 2 versions of Nextcloud and at least one version of Ownloud

Thus, of the possible solutions, the only one that is not insurmountable is the one that involves converting the encoding. However, currently the task of validating that the encoding has been converted appropriately seems insurmountable.

I see the following alternatives for validating the conversion of the encoding:

  • accept the potential data-loss
  • impossible to test all characters currently in use in filenames, because there are are a mix of languages being used
    • can test a good-enough group collection of characters from the Desktop client and compare pre-conversion and post-conversion
  • creating a conversion procedure that includes:
    • integration testing
      • end-to-end validation of pre-conversion and post-conversion status of Desktop clients
        • many things to watch for: errors, missing shares, logs, if there is any re-synchronization of files
        • there may be existing errors (those in the original post) in the Desktop client and I cannot even imagine how the application will react, since I cannot replicate these
      • end-to-end validation of pre-conversion and post-conversion status of a user’s Nextcloud webpage
        • activity history
    • file statistics
      • maybe checksum-based statistics like the client does?
        • pointless on the server-side
      • file counts per user and share on a client configured pre-conversion and separately post-conversion

I would appreciate any feedback. At this point even ‘installgento’ is valid. It does not have FixLatin on the repository, but there is always CPAN.

Edit: another possibility that allowed this mess might have been an upgrade from the Owncloud to the Nextcloud client. However, some Nextcloud-only users were also faced with these issues.

Edit2: I have validated all of the topics under ‘creating a conversion procedure’ with the exception of checksum-based checks, because the other values/behaviors are within what was expected. fix_latin seems (:>) like the way to go. At least things are easier because none of the tables containing binary/bytea types have any data.

I am repliying to myself again on this post probably for the final update and to mark this issue as solved.

I ended up going ahead with the migration and testing environment as nobody wanted to chip in. The testing environment was absolutely required as the database was obliterated multiple times resulting in instantaenous client resynchronization.

Here is roughtly what was done to change the schema of the database.

=== Overview of procedure ===

This is the overview of the migration procedure. Very interesting to test this in a cloned environment, because it has the potential to make all shares inaccessible, since they will not be listed on the database.

additional problems/pitfals/observations:

  • fix_latin in this procedure should not be used when blob (bytea in postgresql) fields are in use in the database schema. This is used at least for calendar and such
  • both iconv and recode CLI tools cannot replace the fix_lating script. Both of them generate broken Latin character conversions
    • iconv -f windows-1252 -t utf8//ignore owncloud-0420.sql > converted-iconv-f-windows-1252-t-utf8.sql

stop services
do a filesystem backup
postgresql backup

  • pg_dump -U postgres owncloud -f owncloud-0420.sql
    stop services
    snapshot disks
    install cpan Encoding::FixLatin
    remove /var/postgresql
    follow pkg-readmes to recreate postgresql database
    create nextcloud database as per the official documentation for the currently installed release
  • make sure to use the same database name and users

CREATE USER owncloud WITH PASSWORD ‘OLDPASSWORD’;
CREATE DATABASE owncloud TEMPLATE template0 ENCODING ‘UNICODE’;
ALTER DATABASE owncloud OWNER TO owncloud;
GRANT ALL PRIVILEGES ON DATABASE owncloud TO owncloud;

edit sql dump encoding line from ‘SQL_ASCII’ to ‘UTF8’
fix the encoding of the file without importing:

  • fix_latin owncloud-0420.sql > converted-fix_latin.sql

diff both files
reimport database using the fixed file and check for errors while importing

fix_latin <exported.sql> | psql -U postgres owncloud

export database and diff with the previous two database files
snapshots

diff owncloud-0117-converted.sql owncloud-0132.sql | wc -l
42012

analyse the differentes and compare against:

  • this issue that came up:
    • 1170 results: diff owncloud-original.sql owncloud-converted.sql | wc -l
    • 42012 results: owncloud-converted.sql owncloud-converted-exported.sql | wc -l
      • most of the lines looked the same even if listed in the diff, because these were the ones that were changed by the ‘fix_latin’ script
    • 356 results: psql -U owncloud -c “select count(*) from oc_filecache where path like ‘%2£o/%’;”
    • 374 results: psql -U owncloud -c “select count(*) from oc_filecache where path like ‘%£o%’;”
    • maybe delete the results of the first query because they do not show up in /var/www/nextcloud/data/USERNAME/files

filesystem backup
start services
monitoring

  • CPU performance due to possible resynchronizations
  • network traffic for the previous issue

A broken-looking converted database schema import looks like the following . TODO: move this to attachment. Permission denied. A good-looking import looks the same but without these errors. The digits are also higher, because the first error stops a table from being imported

psql -U postgres owncloud < owncloud.sql ... COPY 0 ERROR: invalid byte sequence for encoding "UTF8": 0xa3 CONTEXT: COPY oc_filecache, line 266755 COPY 45394 COPY 14995 COPY 0 ...
1 Like