Stack dump on NextCloudPi_02-06-18 distribution on uploading various images

I get frequent and repeatable kernel stack dumps like the one below on the NextCloudPi_02-06-18 distribution on a Raspberry Pi 3 B. The Pi seems to be in good conditon (brand new) with adequate power and Sandisk fast card.

The crash always (!) happen when trying to upload an image file. I sincerely hope that Nextcloud does not try to read or even manipulate image files, which they NEVER will be able to do correctly. There are too many (raw) image formats out there.

One repeatable crash happens when I have one file xyz.pdf and one xyz.tif, each < 50KB. The tif chrashes. When looking onto the file’s folder with the web view, the web page says the file has the name xyz.tif.pdf, and ALL othe image files (jpg, tif) gets a .pdf extension after their own names. Once the failing xyz.tif file was renamed to xyz-.tif, there were no more crashes on that folder, and the files had their correct names!

Another crash happens when uploading a mix of jpg and (compressed) Adobe dng images. The files are in pairs, one jpg and one dng with the same basename. jpg are 10-15MB, dng 40MB. It seems that the same things happen here, it crashes when there are a pair of files of different image types, but with the same basename.

Message from syslogd@nextcloudpi at Feb 20 22:04:31 …
kernel:[ 116.257718] Internal error: Oops: 5 [#1] SMP ARM

Message from syslogd@nextcloudpi at Feb 20 22:04:31 …
kernel:[ 116.280908] Process php-fpm7.0 (pid: 1348, stack limit = 0x9439a210)

Message from syslogd@nextcloudpi at Feb 20 22:04:31 …
kernel:[ 116.283526] Stack: (0x9439bca8 to 0x9439c000)

Message from syslogd@nextcloudpi at Feb 20 22:04:31 …
kernel:[ 116.286063] bca0: 80800038 ba722d44 943468f0 00000000 802e001b 00000001

etc…

BR // WB

First report I see about this.

Could you provide the full oops message from dmesg?

By design nothing userspace does should crash/oops the kernel.

Hi,

Sorry for the delay. I have put a few logs and screen dumps from the last run in DropBox:

Will you please bear with me, I’m a Windows guy with only basic Linux skills. Kindly guide me on what more logs I can provide.

The crashes seem to happen on the very same image files (dng and jpg). When removed, we sync a few more files, and then crash again.

The PI is a brand new machine, the SDcard is a brand new SanDisk, which shows no errors when bittested, the pwr supply is 2.4A, brand new. The files where we crash are perfectly valid and shows no errors when opened in various editors (Photoshop, Lightroom, Faststone Img, etc).

The image for the SD file was extracted from av perfectly valid archive, no download errors, and put on the SD with Etcher which validated the installation without errors.

Awaiting your instructions…
BR // Bo

I don’t see the crash dump in your dmesg. Try to get it after the kernel fails.

What are the shots from pi-crash.pdf?

The pdf contains dumps of the Windows NextCloud app. It shows the situation when the sync repeatably stops on the files mentioned. Every time!
I took the logs just after one of those stops/crashes.
Less than a minute or so after the app crash, but not immediatey, the PI becomes unresponsive. The NextCloud web apge is dead, the ssh channel and even direct keyboard and monitor are also dead, and the PI has to be power rebooted.

Will it be of any use to you to get the dmesg log directly after the reboot?

Hi,

I seem to have captured two consequtive kernel crashes. Please see logs and a few Windows Nextcloud app screen dumps in Dropbox: https://www.dropbox.com/sh/tm5zelpq7vqft1d/AADYEsHfpstA245H1BKX_30ea?dl=0

First - I made a new, fresh installation from the image. I then first manually formatted the external 750GB USB disk (self powered), then ran the wizard through including a new format of the external drive. No problems whatsoever. I created a new account, and set it up in Windows as well.

On the Windows side, I removed the connection to the default NextCloud folder, and added a connection to a “temp” folder .

After that, I get kernel crashes in the PI for every run! 100% repeatable.

Screen dump #2 is interesting. It reports “forbidden” from the NextCloud server when a file should be written.

Sorry for the Swedish, but I don’t find any way to change language in the Windows app.
“Kunde inte skriva” == “Could not write”
“Fortsätt svartlista” == “Continue blacklist”

BR // Bo

Late update: I reinstalled everything (5th time), and put a fast 128GB SD as external USB disk, just to see if there were any timing errors with the external disk. But nope, we crashed repeatedly again, on the very same files…

In the Dropbox (same as above) https://www.dropbox.com/sh/tm5zelpq7vqft1d/AADYEsHfpstA245H1BKX_30ea?dl=0
there are now three xxx5-ssd.xxx files with dump from the Windows app, and logs from dmesg and ncp-report.

As the micro SD card with the PI image validates OK for every Etching (and it is a reputable fast UHS-I Sandisk card), and both my external USB drives have been used and tested for a long time with my pcs, without any problems whatsoever, I can suspect possible errors in the PI itself, RAM memory or something with the USB ports?

BUT - as we crash on the very same files all the time, regardless of HW config, 100% repeatably, I maybe find a program bug in either the Windows Nextcloud app, or the PI server more likely?

The file mentioned in the Windows app screenshot is 160MB. The (alphabetically) previous file with almost the same name is 210MB. But earlier I have seen crashes on much smaller files, and the famous 100% repeatable crash with one PDF and one TIF file (which crashed) with the same basename but different extensions.

Network is wired from both pc and PI, and stable as far as I know.

Can I provide you with better logs? Should I file a real bug report?

Please see the all the messages… I have mistakenly left you out…

YES - I have found a reason for the dumps, but to me it is not nice. Network timing error, or something like that.

When switching over to Wifi, synch works beautifully, at least so far on the folders that crash 100% repeatedly when running wired.

I have switch cables, bypassed a switch, rocked and cleaned network cable contacts, etc but to no avail. Running the PI Nextcloud wired on a 1GB Ethernet simply does not work!

Running on Wifi, no problem!

With your experience, should you point to the network adapter in the PI? Or would you rather blame some crappy and unforgiving sw in the Nextcloud network path? I assume that it is Apache or PHP that throws exceptions, but someone makes a messy job handling those exceptions?

BR // Bo

Well, we finally have some kernel logs.

First, I see some BTRFS warnings, so there might be something funky with your hard drive.

Then, the kernel network errors come from the TCP stack, which does not look nice at all.

I will generate a testing image with the default vanila kernel so you guys can test if this improves the situation. ( for your case and this one )

If this were an issue though we would be seeing more reports, so I do suspect you have some hardware issue that is causing these kernel bugs to show themselves.

Dear nachoparker!

Thanks for your kind efforts, but I think I’m soon giving up on both NextCloud and the image NextCloudPi_02-06-18. It seems to me to be highly unreliable!

I got my PI replaced by the dealer.
I have switched power supplies and cables and crosschecked
I have tested running with two totally different SDHC cards
I have run both with external 3’5 HD and external USB memory as disk
I have run without external disk
I have run with and without wired network, and a few times with WiFI
I have downloaded the image again, but unpacking (7z) gives no error warnings

But the image crashes, crashes, crashes. When we started this journey, it crashed in NextCloud, somewhat predictable.

Now, whatever combination of SDHC, power, with or without external disk, or network, it crashes even at boot!

As I can’t even get the damn machine up with this image, you will have to see the boot crash here as a photo.

This was after booting a clean machine, no attached network or drives, just after flashing the SDHC card with Etcher. Only attachment were keyboard and HDMI.

AND! When I install and run the latest full Raspian image on the very same PI, with the very same network, power supply and external drives, there are absolutely no problems whatsoever! Boots perfectly, X working, all apps. No crashes… Steady as a rock!

Back to the drawing board?

BR // Bo

Just to be clear on the SDHC cards. The image has been written with Win32 Disk Imager, Etcher and Rufus on totally emptied cards without file systems. Cards tested with Rufus, NP. Absolutely no difference in the crash behaviour.

As I can run Raspian Full Desktop with network and no problems on all tested HW combinations, I can rule out HW problems.

Why would this only happen to you?

What puzzles me is that it doesn’t happen with plain Raspbian, so maybe it is because of the (small) kernel difference?

Why don’t you try this?

  • flash plain Raspbian
  • run the installation script
# curl -sSL https://raw.githubusercontent.com/nextcloud/nextcloudpi/master/install.sh | bash

This will give you NCP but without upgrading the kernel

Dear nachoparker,

Thanks for your efforts, but I have found a solution that works. I followed the instructions in

http://unixetc.co.uk/2017/11/25/automatic-nextcloud-installation-on-raspberry-pi/

and just added

sudo apt-get update
sudo apt-get upgrade

after initial setup. Nothing more.

As of now I have been synching my image archive during the night to an external 3,5HD. So far some 30-40GB, with file sizes from a few kb to 500MB Photoshop files, without even the slightest hickup!

Thanks
Bo