Move existing /data storage to a NFS (and how to scale properly)

vongolashu · March 6, 2019, 11:22pm

So I have a instance of nextcloud up and running on linux with nginx, mariadb and redis cache.
Luckily, I setuped my data directory for this instance outside the webroot during initial install (I did not know about NFS or global scale at that time).

Currently, my NC is installed in /home/cloud and my data directory is in /home/data.

Now, I wanted to test NC’s scalability and stress test it a bit (I am creating a proof of concept as part of a challenge to submit to our local government to adopt it as the digital file/document solution for their agencies/systems).

What I want to do is, at the very least scale my frontend server to 2 or 3 servers and move my existing data to a NFS, since I think that would make much more sense.
I have read about the scalability topics and have a general idea on setting up the frontend server(s) itself with HAProxy and then scale out the database to a separate server as well.

My confusion is on what particular changes would need to be done in config.php to make this move and would moving the database to a separate/remote server cause delay/lag in responses, or would caching solve this issue ?

Secondly, how do I make the NFS move and any tips on it ? Its my first time trying my hands with NFS.
And is NFS as safe as object storage like CEPH (in terms of replication/loss) if my HDD or one of my storage server crashes, the data should come online asap ?

The current usage/stress scenario is if a file or video is shared and say 1000-5000 people visit it at once to view the video or download the file.

Nudelpaj · March 7, 2019, 7:56am

Tested sftp server to server to windows its more secure.

Reiner_Nippes · March 7, 2019, 4:45pm

shout down nextcloud (nginx, php, redis, db)
move /home/data to /home/data.local
mount your nfs share to /home/data and move/copy all data from /home/data.local to /home/data. start nextcloud again.

don’t forget the redis server.
in case you need high availability everthing have to be setup as cluster. that get’s tricky and complicate.

on the other hand I read that the access to redis abd/or db server via network instead of linux socket could hit youwith a 10-20% performance penalty. but may you will this only when a lot of people access a lot of individuell small files instead of video streaming. so maybe your are better of with a bigger machine.

you may find my playbook helpful when you set up a lot of nextclouds for testing.

Reiner_Nippes · March 7, 2019, 9:07pm

that’s not a matter of nfs vs. ceph. it’s a matter of the underlying hardware. if you cluster everything or buy a Netapp Filer you’ll be “save”. it’s a matter of €/$ and your skills.

for me the problem with ceph/s3 as main storage for nextcloud is the fact that i can’t see how to recover a single file from backup. or how to recover from a database loss. when you setup a system with ceph/s3 as main storage and look at the objects in the bucket you will know what i mean. (or someone will provide the answer to my question.)

vongolashu · March 8, 2019, 10:18am

Is shutting down redis also necessary ?
And that approach indeed seems simple and easy to follow, I will do that.

Yes, in time I will distribute everything to multiple servers, but for now I will keep redis and database server on same and one instance of each, since database replication/distribution seems a bit complicated right now.

Also, which NFS do you suggest ? I was looking at GlusterFS. Do I just setup it on a separate server, and mount it as /data on all my application servers ?

Thanks again

Yep, I have tried block storage once (SWIFT) when playing with Seafile. It was a interesting experience and something new I learnt, but if it doesn’t really give any better advantages over NFS (in terms of replication/data backup), I will continue with NFS.
There is also the fact that I don’t foresee myself using Amazon or any cloud provider’s storage options (S3/block) due to their sheer costs.

Reiner_Nippes · March 8, 2019, 4:19pm

nginx should be sufficient. but to stop everything doesn’t hurt. and if you have only test data and no test user you probably don’t need to shutdown anything.

imho to setup a high available nfs cluster isn’t easier.

you can do that. of course. but if that machine is down your nfs is down your nextcloud is down.

we should first define your requirements (performance, availability, size) and your budget.

may be you should look at The Definitive Guide: Ceph Cluster on Raspberry Pi | Bryan Apperson (low budget) or AWS EFS (high budget). depends on your requirements.

ok. no AWS.

scaleway and digital ocean just provide 250/500GB for $5/5€ per month. may be you consider this for backup.

vongolashu · March 8, 2019, 4:51pm

Yep, I am a huge fan of scaleway and their costs (actually long time online dot net user) but the problem is outgoing bandwidth cost.
Since I am just setting everything up for a trial run, the costs need to be minimum since I will need to pay it out of pocket for first few months, till the concerned parties/department adopt it and roll it out (which will be on their own government servers).

I checked glusterfs docs and seems you need 2 servers minimum to make it run, though the documentation seemed easy enough. I will just do it their way as better to have replication just in case from start.
OR
I could go the CEPH route, sure would be harder to setup initially but I keep reading its more reliable and faster than normal NFS/GlusterFS.

What do you think ?

So this is what I am planning:
1 x small load balancer server (HAProxy)
2 x medium application server (nextcloud)
1 x medium database server (mariadb or should I setup galera from the getgo ?)
2 x storage servers (dedicated)

Other than storage servers, I am planning to have all other servers as scaleway cloud nodes so its easier to manage and scale faster for now.

In time, I will add 1 more database slave server and separate redis/cache server.
I am using NC’s registration addon so don’t think I need a LDAP server (as illustrated in NC scaling docs)

system · September 23, 2024, 5:14pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.