Puzzling minor issues with NextCloud VM installations

,

I have been using Nextcloud for a little over 6 months now. I love it. I have been using it for myself since last September and I just deployed it for a client of mine as well. In both cases I used the VM resources, although in the client’s case I actually used the VM script from HERE to deploy it on a fresh install of Ubuntu Server 20.04.2 running on an old machine that would otherwise have been retired. It was one of those things that no one said would work that I figured would work, and it did. However, I wouldn’t recommend doing the installation on a physical server this way unless you are planning to dedicate the device to NC, but we were, so it was perfect. For my own, I am using the prebuilt VM on a Windows host. I started with the NC19 VM, then recently migrated my data to a fresh NC21 VM, which was a nice upgrade. (I dumped the data out of the old one to a Windows share via CIFS, then imported it into the new one the same way.)

That is all background information. The issues I have are only annoying, and nothing that interferes with usability, but I would like to understand why they are happening and resolve them. 2 of them I believe are connected.

  1. Connecting insecurely gives the landing page with “Thank you for downloading the Nextcloud VM, you made a good choice! If you see this page, you have run the first setup, and you are now ready to start using Nextcloud on your new server. Congratulations! :)” I would have thought that this would only appear once and then it would simply forward to the secure login page.

  2. Running “sudo -i” to gain root, sometimes because I really need it and sometimes because, yes, I’m too lazy to type sudo repeatedly, triggers the first run script every time I do it. Of course it quits quickly because it realizes it has already been run, but it would be really nice to have “sudo -i” just give me root rather than griefing me with the script every time.

  3. On the most recent deployment, on the physical machine, I attempted to run the automatic backup wizard from menu.sh but it gave me the same error it did when I tried to run it from the setup script, that it can’t be run during the initial setup.

Clearly some things are not getting properly cleared away after the initial setup, and I’d like to know what they are and how to clear them. I could probably dig around for many hours and figure out what they are, but if someone knows and could spare 5 minutes to tell me, I’d be very grateful.

T&M Hansson IT AB are rockstars. The amount of energy they have put into this product and made available to the public free of charge is astonishing. “Pro bono” doesn’t mean “for free,” though that’s how it’s generally used; it means “for good,” or as a contribution to the greater good. These guys have done so much pro bono work that it is a little shocking. Thank you so much!

2 Likes

Thank you very much for your great feedback! :star_struck:
Pinging @enoch85, too.


Concerning your support points:

  1. Should work after you run the Activate TLS script: sudo bash /var/scripts/menu.sh → choose Server ConfigurationActivate TLS
    Alternatively, you can use the deSEC script: sudo bash /var/scripts/menu.sh → choose Server ConfigurationdeSEC → choose to Activate TLS at the end of the script
  2. It seems like you didn’t run the startup script to the end. You can fix this by running:
    sudo rm /var/scripts/nextcloud-startup-script.sh
  3. will be fixed with this command, too.

@szaimen is right here, the startup script finished non-clean.

Removing the script doesn’t fix the main issue, but workarounds the issue you’re facing.

An even better solution would be to rerun the whole setup again and make sure to follow all the prompts carefully. The last thing that happens is that everything is reset to “normal” mode and the startup script is removed. If that doesn’t happen in a clean way you end up in a kind of a broken state. It works, but I wouldn’t feel comfortable without knowing exactly what went wrong.

Thank you for taking the time to help here. However, your information makes this that much more puzzling. I’m guessing you didn’t read my admittedly very long original post in its entirety, as I have experienced this behavior on 3 different installations, 2 of which are still in service. In all of these, I very carefully ran the startup script all the way to the end. Whatever is failing is failing consistently for me, and redeploying both instances currently in service where I have experienced this is not on the radar right now. As I did run the script all the way to the end and everything is working, I do not believe that anything is actually broken beyond that the script failed to delete itself and the trigger that launches it sometimes.

If it helps to clarify what is happening, normal login as ncadmin does not rerun the startup script; only login as root via sudo -i.

I wonder why I have experienced this 3 times and seemingly no one else is experiencing it at all. Could it be that the machines on which I have deployed this are just slow enough that the reboot happens before cleanup is totally finished? That seems unlikely but I’m puzzled as to why this has been so consistent for me.

Am I understanding correctly that the landing page is something I cannot remove unless I activate via Let’s Encrypt? If so I’ll just have to live with it (no big deal) as neither of my use cases need or support anything other than a self-signed certificate.

Did you always installed your instances using the install_production_script? If yes, did you also run the 2nd script successfully? So vm/nextcloud_install_production.sh at master · nextcloud/vm · GitHub and vm/nextcloud-startup-script.sh at master · nextcloud/vm · GitHub?

The first two times I did it I used prebuilt VMs where the install_production script had already been run, and only the second remained to run. In the third case, yes, that was the one I used, and it installed, rebooted, and ran the second script as expected. Nothing apparently went wrong with any of it.

Did it reboot the VMs again after exiting the 2nd script?

I think so. It has been long enough that I don’t remember for certain. If it did not, what would that have broken, and is it something that I could check?

Unfortunately, there is no way to find out what went wrong when you don’t remember any obvious errors or that the script didn’t run to the end.
I can only say that the startup script gets removed at the very end of the startup script. And if it wasn’t removed, you didn’t run the startup script to the end and hence something must has gone wrong during it or you canceled it by pressing [CTRL] + [c].
Here is the line that removes the script: vm/nextcloud-startup-script.sh at 04ec101f4b85fed16f231af8d75587447c76bc5a · nextcloud/vm · GitHub

Interesting. This is helpful. I guess I never thought about reviewing the script itself. I feel confident that in at least the most recent install I saw line 538, but I don’t remember anything after that. Would there be any harm in creating a truncated version of this script that starts right after that and running it as a cleanup script? Interestingly I don’t believe in any case the set trusted domain script has ever run, as I have always had to set them manually by editing the config. I’m not sure why the script would reliably fail for me right after the update, but it looks like that may be what is happening.

If you are sure that line 538 is the last line that you’ve seen, you should be able to create a truncated version of this script starting at this line. But don’t forget to put this into any truncated script: source /var/scripts/fetch_lib.sh

Did you never see this menu and the option to Activate TLS?

I have always gotten that menu and option, yes. I was referring to line 546. Perhaps that is not interactive but merely applies previously chosen options. I didn’t ever get presented with a prompt to add trusted domains in any of my installations.

As an aside, is deSEC preferable to using noIP and adding its updater manually other than being easier to configure?

This is most likely, because adding a trusted domain is part of the Activate TLS script.

If you wanna use you own domain, you should use the Activate TLS script.
Running the deSEC script will give you your own dedyn.io subdomain and then automatically configure your server with trusted domain and lets encrypt using that domain. One advantage is that this works without opening any ports to the public internet.

That is slick and very good to know! Thank you.

Based on the discussions we’ve been having it sounds like I should just truncate the setup script and (re)run the end to finish the job. I’ll try it on my actual VM first (where I can easily do a state backup via the files on the host lol) and see how it goes before I try it on the physical appliance.

Thank you so much for helping me work through this.

1 Like

No, the setup script can only be run once. So as I mentioned before - start over from scratch. :confused:

If he knows that he has seen the line 546, he can definitely create a truncated script from the startup script starting there…

1 Like

Thanks @szaimen ! I definitely wasn’t going to start from scratch on 2 working instances if I could avoid it. I couldn’t see anything after that line which would be likely to break anything even if it had already run, so you’ve confirmed my suspicions that it should be OK. Now to try it and report how it goes. :slight_smile:

OK, I ran the truncated script to make it finish on my VM appliance, and now everything seems normal. If all seems good over the weekend I’ll do the same thing on the physical appliance.

I’m still a little baffled as to why three different instances had the same failure, but at least now I have what looks like a good solution. I have to wonder how many others have had the same experience but are just living with the problem. I still see no reason why it would have failed to complete the first time though.

Are you sure?

I’d say line 553

If you saw the text from that line you could just remove this manually:

That is the puzzling thing; I don’t remember seeing that message on any of the instances I configured. Of course it’s possible I did and don’t remember, as this kind of FYI for the non-technical is stuff I gloss, but I think I would have remembered seeing it at least once out of the three times I ran this. I do wonder how that ending looked in the NC19 appliance, as the first one I did was that. There were some changes that I noticed; all nice improvements. The NC19 one may have been the testing one from nextcloud.com, whereas the NC21 was the production one from Hannson IT.

I think I’m going to see if this behavior is reproducible again by importing it one more time and see if it goes all the way to the end on the 4th try or stops short, just for science. I’m satisfied that nothing is broken in my running instances and don’t feel the need to mess with them beyond what I did already.