Error after system crash: occ does not work - urgent

mbassan · October 29, 2021, 8:16am

it’s not offensive, our Enterprise customers (at least ours, I don’t know any others) don’t tolerate system downtime beyond 3-4 hours.
And it is not written by anyone that “community” and “enterprise” software are different ", obviously the support changes.
However, although the community is still very ready, there is no debugging tool that will tell in case of an error which file or folder is missing. If I tell a client of mine that a file or folder is missing and I don’t know what it is and that I have to waste time and pay hours of work to do a restore without knowing where the problem lies, it kicks me off.
But these are our customers, perhaps we have accustomed them well.
Yes of course, I also have several installations that do not have any problems: but customers pay us as professionals when there is a problem, not when everything is fine.

bb77 · October 29, 2021, 8:29am

I actually wanted to stay out of this. But what you write here doesn’t sound very “enterprisy” to me. How did this “Linux crash” even happen and why didn’t you have a backup or a snapshot of your VM in place? I really hope that your “Enterprise” customers do better and have redundant storage, backups and UPSs in place. Then something like that most likely wouldn’t happen in the first place or you could restore your installation within seconds to a pervious working state. For me it sounds like all this could have been avoided by simply following best practices.

Sorry but maybe these “Enterprises” should consider a support contract with the Nextcloud GmbH then.

mbassan · October 29, 2021, 8:46am

@bb77 , all nice what you say, but in the end any kind of version does not have a debug that tells you which files and folders are missing.
And this in our opinion, but we can be wrong, is a problem

bb77 · October 29, 2021, 9:08am

You could download the zip file extract it or do a test installation and compare it to your production instance.

I’m not saying that Netxtcloud itself couldn’t do certain things better. There’s of course always room for improvement. But at the end of the day, you have to know what a certain product can or can’t do, before you use it in production and put all your data on it. And no matter how good a product is, you need to take precautions to minimise the risk of a worst case scenario like this happening. Or if it does happen, you need to have processes in place to get back up and running as quickly as possible. If you have to “debug” missing folders on a production instance, more essential things have gone wrong already before.

I only use Nextcloud at home. But I do the following…

backup the entire VM every night to a seperate physical machine.
seperate backups of important files, which also go offsite.
take a snapshot of the VM before every Nextcloud update or major change to the Linux system.
in addition to the backups I do hourly ZFS snapshots on the storage where my VMs live.
I have a separate Nextlocud test instance on which I try out apps and test major upgrades before I install them on my productive instance.

As a business or an Enterbrise this or similiar things are the bare minimum you have to do. And it would most likely have already paid out with this one issue you are dealing with now.