Aio mastercontainer unreachable after broken backup

swifty99 · May 18, 2023, 8:39am

Support intro

Sorry to hear you’re facing problems

help.nextcloud.com is for home/non-enterprise users. If you’re running a business, paid support can be accessed via portal.nextcloud.com where we can ensure your business keeps running smoothly.

In order to help you as quickly as possible, before clicking Create Topic please provide as much of the below as you can. Feel free to use a pastebin service for logs, otherwise either indent short log examples with four spaces:

example

Or for longer, use three backticks above and below the code snippet:

longer
example
here

Some or all of the below information will be requested if it isn’t supplied; for fastest response please provide as much as you can

Nextcloud version (eg, 20.0.5): 25
Operating system and version (eg, Ubuntu 20.04): RPi 4 Model B r1.1 , Bullseye 6.1.21-v8+
Apache or nginx version (eg, Apache 2.4.25): AIO
PHP version (eg, 7.4): AIO

The issue you are facing:

Initial Install working great. Added users, created backup, also great.
Added data, started a manual backup from AIO with stopped container. Started great.

Next day, mastercontroller is not reachable.
Starting AIO containers manually reveals Nextcloud working nominal. However, AIO interface still unreachable (from direct IP and link in Nextcloud).

Restarting the AIO mastercontainer reveals no errors. Portainer reports healthy mastercontainer.

INF ts=1684396629.3821237 msg=using provided configuration config_file=/Caddyfile config_adapter=

INF ts=1684396629.3960943 msg=failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.

grep: write error: Broken pipe

[18-May-2023 07:57:09] NOTICE: fpm is running, pid 116

[18-May-2023 07:57:09] NOTICE: ready to handle connections

Starting mastercontainer update...

(The script might get exited due to that. In order to update all the other containers correctly, you need to run this script with the same settings a second time.)

Deleting duplicate sessions

[18-May-2023 08:00:33] NOTICE: Terminating ...

[18-May-2023 08:00:33] NOTICE: exiting, bye-bye!

WARNING: No memory limit support

WARNING: No swap limit support

Initial startup of Nextcloud All-in-One complete!

You should be able to open the Nextcloud AIO Interface now on port 8080 of this server!

E.g. https://internal.ip.of.this.server:8080

If your server has port 80 and 8443 open and you point a domain to your server, you can get a valid certificate automatically by opening the Nextcloud AIO Interface via:

https://your-domain-that-points-to-this-server.tld:8443

++ head -1 /mnt/docker-aio-config/data/daily_backup_time

+ BACKUP_TIME=02:00

+ export BACKUP_TIME

+ export DAILY_BACKUP=1

+ DAILY_BACKUP=1

++ sed -n 2p /mnt/docker-aio-config/data/daily_backup_time

+ '[' '' '!=' automaticUpdatesAreNotEnabled ']'

+ export AUTOMATIC_UPDATES=1

+ AUTOMATIC_UPDATES=1

+ set +x

Daily backup script has started

[18-May-2023 08:00:40] NOTICE: fpm is running, pid 107

[18-May-2023 08:00:40] NOTICE: ready to handle connections

grep: write error: Broken pipe

INF ts=1684396840.2575064 msg=using provided configuration config_file=/Caddyfile config_adapter=

INF ts=1684396840.2710838 msg=failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.

Starting mastercontainer update...

(The script might get exited due to that. In order to update all the other containers correctly, you need to run this script with the same settings a second time.)

however there seems to be not actions in mastercontainer. No processes consume relevant CPU power.
This is maybe linked, and the most active process (1,3% CPU )
’ 293953 root 20 0 14536 4248 3204 S 1.3 0.1 11:44.03 entry.sh’

Already an update:
The borgbackup uses a mount point to a NAS which went down during backup.
I restarted the mount and restarted the container.
The backup resumed.

log:

Waiting for borgbackup to stop

I will wait until this finishes and provide update.

As a wish:
a broken borgbackup should not stall mastercontainer.

thanks a ton for providing AIO.

szaimen · May 18, 2023, 9:36am

Hi, I cannot reproduce this.

Can you post the output of sudo df -h && sudo docker inspect nextcloud-aio-mastercontainer && sudo docker logs nextcloud-aio-mastercontainer ?

swifty99 · May 18, 2023, 11:26am

Hi,

thanks for help.

Meanwhile the backup finished after mounting the correct folder. NC came back automatically, so far looking good.

Tried to recreate the issue by:

uploading new data
start manual backup
remove mount point while backup

AIO master was reachable all time.
So far I can not reproduce the state above.

If it happens again, I will use the commands give above.

thx
Jan

btw: logs looking good (to me):

Initial startup of Nextcloud All-in-One complete!
You should be able to open the Nextcloud AIO Interface now on port 8080 of this server!
E.g. https://internal.ip.of.this.server:8080

If your server has port 80 and 8443 open and you point a domain to your server, you can get a valid certificate automatically by opening the Nextcloud AIO Interface via:
https://your-domain-that-points-to-this-server.tld:8443
++ head -1 /mnt/docker-aio-config/data/daily_backup_time

BACKUP_TIME=02:00

export BACKUP_TIME

export DAILY_BACKUP=1

DAILY_BACKUP=1
++ sed -n 2p /mnt/docker-aio-config/data/daily_backup_time

‘[’ ‘’ ‘!=’ automaticUpdatesAreNotEnabled ‘]’

export AUTOMATIC_UPDATES=1

AUTOMATIC_UPDATES=1

set +x
{“level”:“info”,“ts”:1684406484.4402473,“msg”:“using provided configuration”,“config_file”:“/Caddyfile”,“config_adapter”:“”}
{“level”:“info”,“ts”:1684406484.452815,“msg”:“failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.”}
[18-May-2023 10:41:24] NOTICE: fpm is running, pid 106
[18-May-2023 10:41:24] NOTICE: ready to handle connections

swifty99 · May 21, 2023, 8:32am

Alright, so I am stuck again.

Mount point for backup not reachable
Portainer reports healthy for mastercontainer, all other aio containers not active (including nextcloud-aio-borgbackup).
No reply from nc or AIO
the command obove reveals nothing (no answer)
portainer has some logs:

#7 /var/www/docker-aio/php/vendor/slim/slim/Slim/Middleware/ErrorMiddleware.php(76): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#8 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(121): Slim\Middleware\ErrorMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#9 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(65): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#10 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(199): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#11 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(183): Slim\App->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#12 /var/www/docker-aio/php/public/index.php(180): Slim\App->run()
#13 {main}
Daily backup script has started
grep: write error: Broken pipe
Starting mastercontainer update…
(The script might get exited due to that. In order to update all the other containers correctly, you need to run this script with the same settings a second time.)
Waiting for watchtower to stop
Creating daily backup…
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Waiting for backup container to stop
Starting and updating containers…
Waiting for the Nextcloud container to start
Waiting for the Nextcloud container to start
Waiting for the Nextcloud container to start
Waiting for the Nextcloud container to start
Sending backup notification…
Daily backup script has finished
Total reclaimed space: 0B
++ head -1 /mnt/docker-aio-config/data/daily_backup_time

BACKUP_TIME=02:00

export BACKUP_TIME

export DAILY_BACKUP=1

DAILY_BACKUP=1
++ sed -n 2p /mnt/docker-aio-config/data/daily_backup_time

‘[’ ‘’ ‘!=’ automaticUpdatesAreNotEnabled ‘]’

export AUTOMATIC_UPDATES=1

AUTOMATIC_UPDATES=1

set +x
Deleting duplicate sessions
NOTICE: PHP message: 404 Not Found
Type: Slim\Exception\HttpNotFoundException

4
Message: Not found.
File: /var/www/docker-aio/php/vendor/slim/slim/Slim/Middleware/RoutingMiddleware.php
Line: 76
Trace: #0 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/RouteRunner.php(56): Slim\Middleware\RoutingMiddleware->performRouting(Object(GuzzleHttp\Psr7\ServerRequest))
#1 /var/www/docker-aio/php/vendor/slim/csrf/src/Guard.php(476): Slim\Routing\RouteRunner->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#2 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(168): Slim\Csrf\Guard->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Slim\Routing\RouteRunner))
#3 /var/www/docker-aio/php/vendor/slim/twig-view/src/TwigMiddleware.php(115): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#4 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(121): Slim\Views\TwigMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#5 /var/www/docker-aio/php/src/Middleware/AuthMiddleware.php(38): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#6 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(269): AIO\Middleware\AuthMiddleware->__invoke(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#7 /var/www/docker-aio/php/vendor/slim/slim/Slim/Middleware/ErrorMiddleware.php(76): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#8 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(121): Slim\Middleware\ErrorMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#9 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(65): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#10 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(199): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#11 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(183): Slim\App->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#12 /var/www/docker-aio/php/public/index.php(180): Slim\App->run()
#13 {main}
Daily backup script has started
grep: write error: Broken pipe
Starting mastercontainer update…
(The script might get exited due to that. In order to update all the other containers correctly, you need to run this script with the same settings a second time.)
Waiting for watchtower to stop
Creating daily backup…

and nothing more for a long time (the whole night).
On the same machine other instances like pihole and other stuff were working nominally at that time.

.
After the mount comes back to live the mastercontainer responds immediately. And the nextcloud-aio-borgbackup was started automatically. The given command reveals two more lines:

Waiting for backup container to stop
Waiting for backup container to stop

My takeway:
Do not use mountpoints as backup target. Which I thought would be good idea to store the backup on a different machine right away.

For me it looks mastercontainer fails to start backupcontainer and waits infinetely. Any thaughts?

One more wish: Every log entry should have datetime. It is hard to map times of mountpoint stuff to AIO logs.

cheers
Jan

szaimen · May 21, 2023, 1:18pm

Why is it not reachable? What kind of mountpoint is this?

swifty99 · May 21, 2023, 7:40pm

My AIO Backup is configured to:
/mnt/backup-nc

This is not on the local drive, it is mapped to a NAS using:

sudo mount -t nfs4 -o proto=tcp,port=2049 192.168.142.142:/volume1/backups/nc_aio_backup /mnt/backup-nc

It works great most of the time and no need to copy the backup to an external system/drive. It also reduces the amount of (unnecessary) duplicates.

To save energy the NAS is awake only during backup times. Which usuallly is inline with AIO backup time. If not, the backup/AIO stalls.

szaimen · May 21, 2023, 9:43pm

Are you correctly unmounting and mounting the nfs drive before shutting down and after starting the NAS back up?

swifty99 · May 22, 2023, 5:01pm

So I try:

55 1 * * * sudo umount -f /mnt/backup-nc
56 1 * * * sudo mount -t nfs4 -o proto=tcp,port=2049 192.168.42.32:/volume1/backups/nc_aio_backup /mnt/backup-nc

At 01:30 NAS starts.

at 02:00 the backup shall start.

Today it happened again. One more error on my side:
Daily backups will be created at **02:00 UTC** which includes a

my Server is ready at 02:00 local time, which ist 00:00 UTC.

Which brings up another problem:
The cron is working on local time, which has daylight saving. So far no easy sync between borgbackup and cron possible (whole year round).

I really thank you for your support. So if I could wish (again):

In AIO master: test if the borgbackup database is available, otherwise ignore backup and rise alarm/message. Otherwise (in my case) in the case of a NAS failure Nextcloud would fail automatically too.
In AIO config: Allow local time for backup time.

Thank you
Jan

szaimen · May 22, 2023, 7:44pm

You could synchronize this by setting the server to UTC as well.

Btw, why are you shutting down the NAS in the first place? Isnt an NAS intended to run 24/7?

swifty99 · May 22, 2023, 7:58pm

Well,

the server has multiple purposes. I am not really confident to change time settings. Maybe I will.

About the NAS: It is more a backup storage. It is controlled by an automation system which makes sure it only runs when it is needed. That saves me about 50€ per year. And yes, this has conseqences, like overhead to sync clients.
I like to save power where I can.

Any chance about the wishes?

szaimen · May 22, 2023, 8:45pm

I’ve had a look at the logic and it looks like the behaviour is caused by the docker daemon when the nfs drive is not unmounted/mounted correctly before/after shutting down the nas. So nothing we can fix on our side.

This will not be implemented due to problems with the backup solution that will arise when the timezone gets changed.

swifty99 · May 23, 2023, 5:55am

Ok,

thank you for your help.

Jan