What to check after a crash

treesMcGees · January 28, 2023, 3:04pm

Support intro

Sorry to hear you’re facing problems

help.nextcloud.com is for home/non-enterprise users. If you’re running a business, paid support can be accessed via portal.nextcloud.com where we can ensure your business keeps running smoothly.

In order to help you as quickly as possible, before clicking Create Topic please provide as much of the below as you can. Feel free to use a pastebin service for logs, otherwise either indent short log examples with four spaces:

example

Or for longer, use three backticks above and below the code snippet:

longer
example
here

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can

Nextcloud version (eg, 20.0.5): 25.0.2
Operating system and version (eg, Ubuntu 20.04): Debian 11
Apache or nginx version (eg, Apache 2.4.25): 2.4.54-1~deb11u1
PHP version (eg, 7.4): 7.4+76

Occasionally, my nextcloud instance becomes unreachable over HTTPS, ssh, ping, etc., and I would like to investigate what is going on. What are things to check after rebooting?

/var/log/syslog
Nextcloud log in Admin > Logging
Apache/nginx logs in /var/log
nextcloud.log in /var/www/ (or wherever it is saved according to data/config.php)
php log (not sure where this is)
dmesg

What are other places that might shed information on why the underlying system crashed?

t-cubed · January 29, 2023, 7:43am

Hi @treesMcGees

It’s an obvious point, I know, but “unreachable” != “crashed” (…at least, not necessarily)

Unreachable for who/from where?
- did you try connecting from a range of locations (in network terms)?
Unreachable for how long?
- 5 minutes? An hour? A day…? After what interval did you decide it was “permanently” unreachable?

You don’t say where Nextcloud is hosted, or how you’re able to reboot if you can’t SSH in. I’m assuming it’s running in the cloud somewhere, and you have access to something like cPanel(?) …so presumably it’s not running on-prem, and you can’t just connect a monitor and login locally to see what’s going on(?) Does whatever you’re using to reboot provide any metrics for performance monitoring? If so, can you watch what happens to them before it becomes unreachable? …Or can you SSH in and watch (e.g.) htop?

It’s difficult to give specific advice without more info about your instance (what kind of resources it has, how many simultaneous clients it’s serving etc). But - and this is just a wild guess - I would check there are no over-zealous security measures blocking access (firewalls, fail2ban, psad etc.)

hth; best of luck!

treesMcGees · January 29, 2023, 12:26pm

You’re right that I don’t know for sure that is crashed, but I highly suspect it, especially since there is nothing in the logs (not even normal cron activity) after 16:20. I powercycled it about 7 hours later when I could not get access

It is running headless on my LAN. I tried connecting to https/ssh from two different computers, from outside the lan, from VPN, and from the LAN. and tried pinging it from the router. Destination host unreachable.

It does not have a monitor, and I don’t have a long-enough HDMI cord to plug in a monitor when it becomes becomes unreachable. I might be able to bring over a monitor next time (and a keyboard) and see if the monitor outputs anything.

Right before I restarted it (by power cycle), I could see normal green/yellow lights flashing on the Ethernet port. My assumption is that the Ethernet controller did not crash even if the overall OS did.

Are there any other logs to check for more information? I am assuming it is some kind of intermittent hardware failure based on not finding any traces so far, but I would like to rule out software if I can.

Since this is a raspberry pi 4 with 4gb ram, it could be power supply issues, even though I have a good psu attached to a UPS (with NUT controlling it). On RaspiOS there is firmware that tracks low voltage events, but apparently on Debian arm64 that firmware is not available, at least easily.

Are there other logs you would look at?

t-cubed · January 29, 2023, 1:37pm

I think plugging in a monitor would be a good idea

However, as soon as you said this:

raspberry pi 4

…I wondered about corrupted flash storage; is the Pi running from a µSD card? Old cards that are dying slowly and silently have been the cause of many weird and unexplained crashes for me.

If it is running from a µSD card, I would:

Take the existing card out of the Pi, and use another device to check it (i.e. fsck)
Get a brand new card, restore from backup / image, and try running from that instead