database vs. no database - Nextcloud vs. ownCloud fork OpenCloud

I have no support/technical question and have seen the support category. (Be aware that direct support questions will be deleted.)

on

Which general topic do you have

In Germany, the Heinlein Support GmbH has more or less forked the ownCloud software and is now branded as OpenCloud (https://opencloud.eu). Not to be confused with Open Cloud (https://open-cloud.de) from HKN.

database or no database for meta data

Translated from German with DeepL:
Independent of databases
OpenCloud has made a conscious decision not to use relational databases and instead uses files to store metadata.This decision simplifies the system considerably and at the same time helps to improve scalability and system stability.
Funktionen des OpenCloud Workspace | OpenCloud

I would like to know what is true. Are databases important for performance or not? I use SQLite on test Nextcloud installations and it works fine for one user. What do you think about to store metadata of all files only in flat file e.g. for performance and backup/restore? For small and large installations.

In my opinion, it depends.
For small applications, it can be advantageous. For larger environments or content, I don’t see why it would be advantageous to use files.

For complex structures and larger environments, I would always use databases, purely for administrative reasons.

dunno, look at Atlassian… they’re doing it too with JIRA… more are moving to filebased metastorage?

I don’t administer Jira, I just use it. So I don’t know what it looks like there.

But I find it a nightmare to create queries, etc., because maybe the exact path to the database is missing, or the API is just structured oddly.

Regarding performance, I can’t say whether it’s due to the infrastructure or Jira, but it doesn’t always perform

agree :ok_hand: its a nightmare…

Problem 1:
The in my opinion biggest downside to the no database approach is: You cannot efficiently query over multiple metadata files:
The query “give me the size of file XYZ” is relatively fast (probably still slower than a relational database), it just finds the metadata file for file XYZ, reads it and extracts the queried information.
But the query “give me all files with a size greater than 5 MB”, there is no specific metadata file to query, you would have to read and parse every single metadata file (could be tens of millions on big instances) and filter them within your backend code. Relational databases have a LOT of optimizations for this exact scenario (like indices) to answer such a query in milliseconds.

If they carefully design OpenCloud around these limitations I am sure they still can do a lot with that, but there are going to be features they will just never be able to implement, because there is no way to efficiently query the needed information.

Problem 2:

Also locking (which prevents multiple requests making simultaneous changes, that conflict which each other) with file backed no database storage is a mess: databases lock individual rows, but you cannot do that with files, you have to lock the whole file.
This may not be a problem with file metadata, since that is a lot of files, the chances of simultaneous access are not that big, but from reading their code I think they save all file shares in one single big json file :face_with_raised_eyebrow: . Not sure if they at least have some optimization, where they write to it in an atomic way, so you at least don’t have to stop everyone from reading it while an update is in progress, that would make it slightly better, but still: If your whole mission is scalability, sorry but I don’t think that is a good idea.

(Also good luck with “database” schema changes over tens of millions of files)

So why are they doing it ?
From running a rather big instance I can tell you that the oc_filecache table (which stores some metadata about files) can definitely be a pain point of the Nextcloud architecture. That table gets accessed by nearly every request and it requires work to optimize your database server and code to make that table scale to the needed query volume. But it is very much not impossible!

The one upside that I can think of of the OpenCloud architecture is that it is easier to cluster a filesystem (with Ceph for example) than is to cluster relational databases (in a performant way) (and that is also because relational databases have higher consistency guarantees).
But I just don’t think this argument is particularly convincing. YouTube scaled to 2.49 billion users with MySQL, that surely took a lot of work, but it is possible. And Nextcloud is starting to develop some of the same optimizations that made that possible, like database sharding.

2 Likes