Originally published at: https://nextcloud.com/blog/nextcloud-announces-global-scale-architecture-as-part-of-nextcloud-12/
Together with the availability of Nextcloud 12, we’re proud to announce a new architecture for scaling Nextcloud several orders of magnitude beyond its current limits, lowering costs for medium to large deployments and increasing the flexibility to spread data out between data centers. Named Global Scale, this new architecture has its first pieces included with Nextcloud 12 and further implementation will become available over the coming months.
What is Global ScaleNextcloud is used by home users and small businesses as well as large organizations like universities, big companies or government agencies. Nextcloud 11 introduced significant performance improvements and combined with further work in 12 enables customers to deploy installations up to tens of thousands of users. Above that scale, however, federation has to be employed, requiring separate instances where users have to log in.
Global Scale was designed to overcome the limitations at large scale, but also to achieve other benefits. The main goals are these:
ScalabilityAchieve several orders of magnitude greater scaling. It is hard to scale the standard architecture to instances over hundred thousand users. The shared components load balancers, hosting center uplink, database, storage and cache will sooner or later become bottlenecks. Especially the database is hard to scale beyond a 4 node Galera Cluster which limits the number of users and files. Nextcloud Global Scale aims to support up to hundreds of millions of users.
Cost efficiencySeveral big Nextcloud users raised the issue that scaling the storage becomes exponentially more expensive when dealing with a high number of petabytes. Software-defined storage and object stores are unfortunately not a solution in reality. One of our Technical University customers estimated that 60% to 80% of the cost of running a file sync and share service is caused by the storage subsystem alone. It would significantly lower the total cost of ownership if the storage could be distributed over several smaller affordable storage systems. Similar cost issues exist with other pieces of the infrastructure like load balancers and databases where free or cheap solutions are not sufficient at large scale. Nextcloud Global Scale will enable the deployment on commodity hardware and software, dramatically decreasing costs for large systems.
Global distributionA frequent need for large Nextcloud users is distributing data over multiple hosting centers in different countries or even continents. This can be due to legal requirements on where data is stored, to increase performance by bringing data closer to users or for cost, security or auditing reasons. With the current architecture, the choice is between a data replication solution for storage which hardly deals with most of these goals nor is fast and cost effective, or running multiple separate instances requiring users to log in in different portals. Nextcloud Global Scale will hide the existence of multiple data centers, making its architecture entirely transparent to users.
How does it workNextcloud Global Scale works by effectively removing the need for shared components in the existing architecture like the load balancers, hosting center uplink, database, storage, and cache. It uses multiple independent application servers, called nodes, each running on standard, inexpensive commodity hardware. Storage, database, and cache are running local on the application servers and no longer have to be kept in sync.
The NodeNodes can be located in different data centers and be as small or large as any current Nextcloud instance. A sensible scale would be at least 2 machines, providing redundancy in case of hardware failure. The machines run a web server with TLS, Nextcloud, local storage, local database and local cache. The nodes use central remote logging and central authentication like, for example, LDAP. The nodes could be managed using a standard technology like Docker containers to ease deployment and maintenance.
Global Site SelectorThe Global Site Selector (GSS) acts as a central instance that is accessed by the user during the first login, accessing it via the Web, WebDAV or REST. The GSS authenticates the user via the central user management like for example LDAP. It then looks up the node where the user is located in the lookup server and redirects the user to the right hostname. The following calls during the same session are done directly from client to the node.
Lookup ServerThe lookup server stores the physical location of a user. It can be queried using a valid user id to fetch the federated sharing id of a user. In some situations, it is important to limit queries to a certain IP space to avoid data leaks. It also keeps track of old federated sharing IDs. The lookup server stores additional data of the users like for example the required QoS metrics like storage/quota settings, speed class, reliability class and so on.
BalancerThe Balancer runs on a dedicated machine, monitoring the various nodes and their storage, CPU, RAM and network utilization. It can mark nodes as online or offline and initiate the migration of user accounts to different nodes based on data in the Lookup Server like business or legal requirements, QoS settings or user location. If for example, a user would move from the US to Europe, the Balancer would initiate a migration from their data to an EU data center to improve the quality of service.
You can learn more details on our webpage about Global Scale.
Today: Nextcloud 12Today, Nextcloud 12 is released and with it come the first components of Global Scale. It introduces federated activities so users can know what happened to their data, even if shared to another node; and we have implementations of the (stand alone) Lookup Server and a Global Site Selector app will be available for people interested in testing this in a few days. Work on user migration has started and we look to federate comments and release the Balancer over the coming months.
Need to scale? Contact us!There is much more to Global Scale which our experienced engineering team will continue to develop and implement, working closely with customers, partners and of course our community.
If you are interested in the scaling and cost benefits or would like to explore the legal consequences of being able to decide where data resides, you can get in contact with our sales team.
Nextcloud continues to deliver impressive innovations. With GS, Nextcloud drives efficiency for cost-effective private cloud deployments to service internal and external customers which can be utilized while extending the service on a large scale.-- Florian Hausleitner, Senior IT System Engineer Datacenter Services at Raiffeisen Informatik Center Steiermark.