Nexcloud server sync - again in 2019 now: real usage case

Dear all,

So iā€™ve been reading a lot about different cases for multiple Nextcloud servers sync and kinda cluster way usage here on forums, on github and owncloudā€™s github also. The fanciest ways with lots of ā€œbirds languageā€ terms (sic!) like * Corosync / Pacemaker cluster and DRBD were suggested here, in that thread, dated back 2016ā€¦

Now itā€™s almost 2019 and I wonder, if something has changed, as I have a real complicated usage scenario to present and discuss:

Small company operating in multiple countries around the globe (from Singapore to Peru and Brazil, back to EU and Russia) with HQ in UAE.
All on-premises infrastructure is in HQ, but Internet in UAE is a sad joke and close to non-existent. (prices will also surprise you a bit - for dedicated 100mbps 1:1 Internet connection line with static IP Etisalat requested about 15000USD per month - kid you not)

Full Details down below

Currently that company has 2 Internet connection links:
Primary: asynchronous 300\30mbps pppoe link with dynamic IP
Mail only link: 1:1 5mbps link with static IP (also crazy price about 1000USD/month)
Now, it needs to be mentioned that despite theoretical 300\30mbps connection as per contract, real life VPN connection speeds are as such:
North America, Canada, Russia, Germany - 1.5-2.5Mbps
South America - not tested
India, Singapore - 10-15Mbps
Also, you canā€™t argue in UAE, as officially ALL VPN type connections are forbidden by law, so they are heavily rate limited as far, as I can tell. (Also no VoIP services and video\audio messengers are operational there - to give you full picture)

Now, company struggles to achieve file sharing and exchange between different countries with appropriate speeds (normally they transfer their proprietary scan data file sized 2-60GB per file), so in case of 1.5mbps VPN performance, you can imagine how long it would takeā€¦

What I suggested is to use Amazon AWS + Nextcloud.

Amazon environment config

Frankfurt: (being a central geographical location)
EC2 instance for Nextcloud
EC2 instance for RODC (Windows readonly domain controller)
RDS DB (cloud mysql instance with all fancy cloud options for DBs) for nextcloud
AWS S3 storage as external storage for Nextcloud (not connected as primary, as with primary storage it stores metadata within S3 bucket, which prevents external direct uploads\updates, etc - correct me if Iā€™m wrong)
Route53 for latency based DNS routing to different Nextcloud instances in different AWS regions (Mumbai, Singapore, Canada, Brazil, Germany) related to enduser\client current location

Now, Frankfurt being a central location hosts S3 bucket for all instances as a shared file storage, an RODC domain controller to speed up AD domain based user auth and is using RDS DB for Nexcloud - everything is located in the same AWS VPC (which is basically like local dedicated VLAN in terms of AWS)

HQ office has itā€™s poor direct VPN connection to AWS in Frankfurt (take a look at config details above) ONLY to sync RODC with AD. (or for management also)

All other geographically distributed Nexcloud instances in other mentioned regions are to share the same RDS database, the same S3 bucket and the same RODC auth server using VPC peering (another AWS term) so it will be almost no latency at all between all of it.

The limitation here is physical data upload speed from UAE to AWS.
The only usable option in that case is to use S3 acceleration (local CDN connection endpoints for S3) which really works for uploads (tested), but to be able to use that you need a local UAE Nexcloud instance, which is already deployed and operational.

Now the real question for Nextclouds sync:

  1. Would ot work with multiple Nexcloud instances around the world all sharing one RDS DB and sharing the same S3 bucket as external data storage? (instances config are the same everywhere using the same auth. backend LDAP connected to RODC in Frankfurt region so all users will be the same also)
  2. UAE Nextcloud instance will not use AWS RDS DB directly because of connection speed but will use separate local mysql DB still using same S3 bucket at external data storage and same auth backend with the same AD domain. Will that work?
  3. If option #2 will not work, could I use Nextcloud federation for UAE instance to be able to achieve seamless connectivity, file sharing ,permissions and file uploading capabilities with all other geographically distributed Nextcloud instances?

Wowā€¦ It turned out to be very long, sorry for thatā€¦ If anyone will take time to dive into itā€¦

Regards,
Vladimir.

Sounds like you need to contact Nextcloud directly and ask about Global Scale!

Yeah iā€™ve seen that, though itā€™s either heavy WIP or never haven been used in real life.
Just look at their support portal on availability - itā€™s just generic placeholder onlyā€¦
https://portal.nextcloud.com/article/scaling-across-multiple-machines-20.html
Like blah blah "you have to consider several layers:

Application layer
Database Layer
Storage Layer

And need an IP addressā€¦ "
Wow thanks guys! :))))
Also for custom consultation for enterprise standard subscription they require 3000 to 6000 EUR which I donā€™t have. Thatā€™s why I thought, may be some our opensource gurus had similar experience, or anyone at allā€¦

Questions are pretty much straight forward for people with more than mine Nextcloud experience. I guess soā€¦

Regards,
Vladimir.

sure. but beware of the latency when writing anything to the database.

you wonā€™t be able to use the ability to share files among users. you will only ā€œseeā€ the same files in the s3 bucket. but for example if you create a web link to a file it would be a different link in AWS and UAE. and also files versioning would work.

your company spend

for the internet connection but nothing left for a nextcloud subscription? well.

btw: whatā€™s missing in your post is how you create and access the files. that is to say what amount of data? frequencyof create/change? is it ok that the files appear one day later or do they have to be in time online everywhere. where are the files created? only in UAE? and read-only around the world? are the files user created or output of any kind of program?

could you use rclone.org to upload files to S3?

Yeah that what I tested and itā€™s not usable due to latency - It takes about 3 mins to browse a folder or at least list of folders, tried both VPN and direct connection to RDS DB.

What about statement #3 then? About federation for local UAE instance? so it will be local DB connection so no issues with access speed, but the question here will it allow smooth collaboration for other users from other locations? Such as file editing, uploads and ESPECIALLY share\access rights assignment for federated servers. (again, users will be the same from same AD domain).

It was a proposal only - as it stated in ā€œdetailsā€ sub-section current speed is 300\30 mbps for less than 1000USD per month. If we could afford such budget, I wouldnā€™t have asked about a solutionā€¦ Itā€™s ridiculously expensiveā€¦

So I thought that users would create files either via uploading from a browser or synchronizing specific folder.
Total amount is limited by 10TB for this bucket, update frequency - canā€™t really tell - just a common corporate department folder dataā€¦ every day multiple times. Files are created from every location. Appear latency is acceptable, but I planned to use cron task for updating filelists on each server (i also guess it built in now for external storage - there are specific options for that in external storage settings)

The question here is how exactly Nexcloud uploads a file? As iā€™ve stated in full description - common scenario is to upload 20-60GB files at onces (I will ask them to cut into 5GB parts just in case) So what will happen exactly during upload? Does it first cache locally on Nexcloud instance or uploads directly to S3 from a client session? Iā€™m asking as I only currently have 40GB free space on each Instance in every country.

I havenā€™t tried rclose, although iā€™m aware of it (and several other solutions such as AWS FileGateway appliance) - we donā€™t have a usage case for direct uploads, only on initial seed stage possiblyā€¦

Regards,
Vladimir.

I think most of the setups with global scale have quite good connections between the servers. When you share content via federated sharing, the data remains on the original host and is not cached (as far as I know). Imagine you would consider some sort of cache, if it was mainly that others have read access to the documents, this is much easier than read/write access. With read/write you could perhaps think about locking files when someone is opening it (no connection would be a problem here), this could work on the webinterface.
However, if you want to introduce some sort of caching, you probably need to get into the code and have some experts on hand, and the best you can get is probably an enterprise subscription.

If you donā€™t have the money, you can only build something yourself. The federated sharing should work for your locations with large bandwidth. Then you could imagine some sort of exchange and caching of files, e.g. have some shared folder from other sites that is then mirrored (by low-level tools like rsync) to your less well connection locations. These are just copies of the documents. If there are changes, they have to send them back to the other locations, there is perhaps some kind of workflow you could implement.

In case you are working for education or an NGO, they might have some reduced rates.

what i saw: the file getā€™s split into 10MB chunks (or maybe 1MB) and put into /upload_tmp (or another directory somewhere configured.) after upload is completed the files are copy into one and this one is moved to the final position. with this you can upload large files even if the transfer is interrupted in between. after it is copied/moved the file will appear in the gui and/or download by a desktop client will start.

Thank you for your answer, I do understand that, but in my case all Nexcloud instances regardless of location and connection speed (in my case it will be only UAE with slow connection - all others will be AWS EC2 instances - so they will be perfectly interconnected) will all user the same S3 bucket in Frankfurt as Iā€™ve described in details. So in theory data will be the same for all usersā€¦

So in case of federation I canā€™t really imagine any issues on that stage (not re configuring UAE instance) - we will see - I may also post updates here, just in case, may be someone will find it useful in the futureā€¦

Oh is that so, it that case itā€™s all fine so my instances will not overload itā€™s local small storages! Thanks! For some reason i failed to find that description in docsā€¦ may be itā€™s just meā€¦

Regards,
Vladimir.

you want to setup multiple nextcloud server sharing the same S3 bucket as main storage but with individual databases?

did you ever look at the directory structure in the data storage folder (in your case the S3 bucket)?

Yes, inside bucket itself it looks like any other folder - simple hierarchical structure, why?
In case of just an external storage on Nextcloud it stores all metadata in DB, if thatā€™s what you meant, S3 only stores metadata within fileā€™s properties when itā€™s setup as primary storage, I canā€™t tell if all remote servers will work fine in such scenario. (conflicts, file locking etc)
In case some server doesnā€™t see the updates for the files, we have cron occ for updating that. Also, as far as I understand, there is exactly that feature in external storage options, where you may select to update filelist on every accessā€¦

Iā€™m now thinking that federation is the only solution in my caseā€¦ (all locations just share the same RDS DB being different EC2 instances in different regions and one UAE instance as federated)

Regards,
Vladimir

exactly. i would say: ā€œgood luck with thisā€¦ā€

but you have no file looking. or?

but all access to a file would then go to the slow UAE server. or?

in this video the questions at the end are related to your problem. and iā€™m afraid the answer wonā€™t make you happy.

btw: why do you use nextcloud? why not use something like https://mountainduck.io/ to access the s3 storage directly. (ok. costs. but itā€™s something to buy. not r&d.)

No, as filelocking in case of Nexcloud is managed by server (or DB?) so AGAIN theoretically in case of single cloud DB for all servers it should manage file locking correctly.

Again no, as when using S3 acceleration on Nexcloud side, it increases upload speed for UAE 10-100 times, as itā€™s using local CDN endpointā€¦ Thatā€™s the whole and ONLY reason iā€™m dealing with UAE and all that at that time. Otherwise, despite of the solution, upload just wouldnā€™t be possible. With just normal non-accelerated S3 connection itā€™s same 1-3Mbps\sec and it wouldnā€™t work with large files anyhow.
Also federated server access will be relatively fast. (yes it needs to access S3 for every update, but again itā€™s kinda acceptable given that DB is local) . Not sure how filelocking is managed in federated access scenario thoughā€¦

first Iā€™m using nexcloud from itā€™s very first alphas, being owncloud back then! :slight_smile: second, sadly, no other 3rd party service suites us- itā€™s a corporate environment, AD domain, tons of permissions and usersā€¦ everything integratedā€¦ thatā€™s the reasonā€¦ Azure will kinda work - but it sucks really in terms of management, itā€™s TIMES TIMES more expensive than AWS when using 10TBs+ā€¦ also it has the very same issues with permissionsā€¦

Regards,
Vladimir.