Event-driven server

Hi team,

I want to ask about the possibility of rearchitecting Nextcloud to use an event driven architecture. This is something I’m willing to work on, and I do think it’s possible given the way Nextcloud works. Of course it’ll be a massive breaking change.

Basically I did tons of profiling in the last few week and most of the “Nextcloud is slow” hit comes from initial server boot and app registration, so I believe this would settle most performance issues once and for all.

My questions are:

  1. Has this been attempted before?
  2. If yes, then why did it fail?
  3. Regardless of the first two, would Nextcloud be willing to accept such a patch? This is the critical piece for me, since otherwise I’ve to fork Nextcloud where I’d do things differently. I just want to make sure there’s no wasted effort.

My experience with reports of slowness has been that it mostly seems to result from inadequate hardware or improper/non-standard configuration. A lot of them seem to be nginx users too.

In any case, maybe what you could do is make your own version of the docker image where it could be easily tested, perhaps even made swappable with the official docker image.

I believe I’ve done most configuration “tricks” possible (even tried OPCache preloading, which was a lot of work) to make this faster. The bottom line is PHP’s model of bootstrapping everything for each request is just a lot of overhead.

Note: I’m not complaining Nextcloud is “slow”. It’s not, in most cases. But there are times where it’s just very inefficient. Try loading a hundred pre-generated previews; that really pushes my machine hard (and that’s a ridiculously fast machine; probably some of the best that money can buy). And loading a hundred previews is not a contrived benchmark scenario at all, it’s a basic requirement e.g. for any photo app.

Didn’t get this. What does nginx have to do with this?

My resources are limited and I’m not willing to work on something that’s gonna stay as a PR forever. As I mentioned, unless Nextcloud is willing to (actively) accept something like this (and actually help fix stuff at times, not just nitpicking code reviews), I’d rather fork and not waste everyone’s time including myself.

1 Like

Hey @pulsejet

This has been attempted before, I believe. In theory I think this kind of refactoring could be well worth the time. I can’t decide this, though.

Maybe you’re referring to this? That’d be a little bit different since it provides a way for apps to have better performance vs I’m proposing running the whole thing as event-driven. I didn’t want to comment there and revive an ancient thread.

I think this is exactly what’s missing from Nextcloud. No matter how much one loves the ease PHP’s request model gives, it’s just plain wrong for this kind of application.

I’m hoping to reach some consensus here from who can (in reasonable time).

BTW, some other points (random thoughts) I missed earlier, and from what I picked up from the aforementioned PR:

  1. It’s mentioned there that bootstrapping is not a big deal. I beg to differ, after a lot of profiling Nextcloud with Xdebug.
  2. We cannot easily slim down the bootstrapping process. Apps might do things we don’t want to stop them from doing.
  3. This is a radical change and we cannot have a “one-click” upgrade with this. Which isn’t a big deal to me, since the potential performance improvements are massive. Let’s not forget this isn’t only about bootstrapping, but async io in general. This also allows more interesting things, like asynchronous hooks for apps, and in-process caching.
  4. This is my answer to ownCloud Infinite Scale. I don’t think switching languages is the right direction: PHP is extremely fast; just need to use it correctly.

My main motivation for creating this thread is #3, and the lack of interest from the team in the linked PR. I unfortunately agree with these comments. I fully understand that Nextcloud might be prioritising ease of migration and convenience for existing (possibly enterprise) setups over performance gains. Or app compatibility. At the same time, others are moving ahead. IMHO, changing the stack (not language) is the correct and only solution.

I believe that unless Nextcloud fixes this major issue, it’s just a matter of time before someone else builds something that does everything Nextcloud does, but is better/faster because it didn’t try to be “convenient” for admins and app developers. Being stuck in the past isn’t helpful.

1 Like

Hello to everyone here,

I might not get the point correctly but how would that and thing work?

In the current setup, each request from the client boots the NC server core plus bootstraps all apps. This should be rather fast unless some apps do nasty things or you have immense amounts of apps installed.

Switching to an asynchronous model seems like a major difference to me. Are you intending to run one (or more) PHP process in a continuous way that keeps running and serves request for request? Still within the PHP and with help if the symfony tools?

This would be a serious require of the core code. I am unaware of all the implications here. I fear this forum is not the best location to discuss things as this is a highly technical question. I would go for the server repo.

A side remark: you wrote that you would create it for yourself. So, you have the resources to handle this yourself? Then, you cannot lose anything, right? Either you use it for yourself or your ideas get accepted and merged into the server core. Either way, i would ask for requirements to have a chance it to be merged.

Just my 50ct
Christian

1 Like

Depends on your definition of fast. It takes ~10ms for core apps to bootstrap for me (extremely fast machine). I’d consider the latency as fairly low but keep in mind this is 10ms of actual CPU usage. With a lot of requests, bootstraping is a major chunk of the total CPU usage.

Correct. Check out Swoole if you’re unfamiliar with this. It’s ridiculously fast, btw.

Indeed. I already mentioned this is a big breaking change.

Not so sure about that. I’m looking for opinions from anyone (who has one) from the core team, not just the devs.

I’m willing to work up most of the basics (might need a while admittedly, but hopefully it’s doable). Getting all apps to work right is where I’ll definitely need more help.

On the other hand, if I know this is never going to get merged, I’d do several things differently. For a start, any kind of backward compatibility goes out of the window then. E.g. any deprecated functionality can be safely discarded. It just gets a lot easier in this case:

2 Likes

Ping @Daphne, can you please forward this discussion to the appropriate locations? I think this should be discussed with the NC core team to have their feedback.

I forwarded the discussion. The responses I received back were careful. Reworking core requires considerable effort and may take multiple years, and might never succeed. More senior colleagues wrote that it might seem doable on first sight but that everyone who ever tried to rework the core architecture bumped into lots of issues because there are a lot of hidden quirks and hacks. It’s never straightforward. So for now my assumption is that this plan is not realistic.

3 Likes

Thanks for the reply @Daphne

  1. My question was never about feasibility but intention. Whether a plan is realistic or not depends on the goals. E.g. if the goal is to keep supporting every current deployment as-is, then I’d say this is impossible to do (and remain sane). I never said it was easy.
  2. From what I understand from your reply, Nextcloud’s position is that it’s not possible to reasonably fix this issue, and so no changes will be made to the core architecture. Basically you’ve resigned to the notion that this is undoable and consider any attempts to do so as futile. Or maybe you don’t agree this is an issue in the first place, I’m not sure.
  3. I’m very disappointed not a single dev could find 5 minutes to reply even after reaching out through many different channels, even if to just say that they consider this as impossible. I’m not sure what Nextcloud’s position on performance is, but (also e.g.) IMHO it shouldn’t take almost a month to review PRs for simple but effective performance improvements. As such, my view now is that Nextcloud doesn’t value external contributors, and I won’t be making any further contributions (at least) to the server core. Sorry to say, but while I can understand ignoring unresearched support requests, completely ignoring developers proposing (and offering unpaid working hours) to make larger contributions is unprofessional and just rude regardless of how busy everyone is with “a lot of other priorities”.

EDIT: closing this thread because I guess I got my answer.

hey @pulsejet , I’m happy to continue the conversation with you in a 1:1 thread about how we may improve our response times for PR’s.

2 Likes

This text is old but for me it still outlines many of the problems core rewrites may cause:
https://web.archive.org/web/20080215223728/http://chadfowler.com/2006/12/27/the-big-rewrite

I mean… this text is so old i had to look it up on archive.org because the original does not exist anymore. Now i feel like i’m growing old aswell. :wink:

3 Likes

@pulsejet

as much as I can relate to your disappointment and as much I can really understand it, I’d like to put your attention to the fact that you indeed really GOT an answer after all (though it wasn’t the one you wanted to get).

so how many developpers are working on NC-core? hundreds, I bet. So how could anyone get to know that you are an experienced dev coming up with valid and worthy questions?
Or maybe you haven’t hit the right persons with your direct contact?
Regarding PRs… you seem to be experienced enought to understand that main devs have a roadmap to follow and hence can’t put in much of their time to any PR they need to look through? Evenmoreso if those PRs are offering core- but breathtaking new developments?

I don’t know.

I think Daphne’s offer to get into a 1:1 talk is a great offer and opportunity. Walking in your shoes I at least would grab the opportunity. I’m sure you won’t regret it.

3 Likes

Nice read, thanks for taking the time to dig this up. However, if >90% code is going to remain untouched (which can be expected in such a change), I won’t call it a rewrite at all. ownCloud is doing a rewrite with oCIS btw (just saying).

I wish. In the last six months, 13 devs had >5 commits on the PHP core. Most of them work for Nextcloud in some capacity, I believe.

root@d9963a268a62:/var/www/html# git shortlog -n -s --since="14 Jun 2022" --no-merges lib/\*\*/\*.php core/\*\*/\*.php
    52  CĂ´me Chilliet
    51  Julius Härtl
    46  Carl Schwan
    38  Robin Appelman
    34  Joas Schilling
    20  Arthur Schiwon
    13  Christoph Wurst
    13  Vincent Petry
    12  Christopher Ng
     9  Louis Chemineau
     8  John Molakvoæ
     8  szaimen
     7  Julien Veyssier

By assuming competence. How many external devs do come up with core questions anyway? I don’t see many. And btw, one shouldn’t need to be a super experienced dev to deserve a reply from another developer. If that’s the attitude then it sounds very toxic to me, and there are bigger problems to address than server core performance :slight_smile:

After posting here, at GitHub and the developer talk channel, I pretty much ran out of polite options. I’d be surprised if it didn’t reach most / all of aforementioned devs in some way.

If I may add my two cents, I hope noone minds.

If the Nextcloud company would spend less time coming up with new features to market in the next version of Nextcloud, and more time making sure that existing features work well, and also to look more at improving what we already have, then I think things would be better. In particular when a lot of the new features are sometimes just left without much maintenance and evolution once they have lost their initial marketing value.

There are so many bugs and issues that need fixing, and other types of maintenance, that there is currently a very misbalanced spending of time between new fancy marketing fluff and making the current stuff rock solid and working well. I’m sure I don’t have to pinpoint particular issues etc, as this is an ongoing situation since years. That said, it’s fine to disagree, in case someone would want to do that.

More issues and PRs could then be attended in a reasonable time (this is truly a problem today, there’s endless examples of this), and important things like what @pulsejet is querying about here can be evaluated to a larger extent. I’m not saying Nextcloud should rewrite the core to make it event driven, but I’m not sure that the thought has really been evaluated enough. Look at it from this perspective; If this is never done, then where will this leave Nextcloud in some years time? Efforts of this type now are not meant as a short term gain, it’s for a longer term purpose and perspective.

So, what I mean is simply; If someone who seems very knowledgable in regards to what they bring up and discuss suggests something that might have a long-term benefit for Nextcloud, then let’s not be afraid of how it may affect the core. Take the opportunity to spend less time on one of the fancy next marketing features and instead seriously consider what would actually be needed to make something like this work, and whether the gain of it is relevant. I haven’t been involved in the discussions, but it doesn’t come across as if this type of change has been truly looked into closely.

That said, there are of course other aspects such as depending on additional third party software, what’s needed to install to run Nextcloud, and so on. That’s part of what needs to be considered.

Some comments:

More senior colleagues wrote that it might seem doable on first sight but that everyone who ever tried to rework the core architecture bumped into lots of issues because there are a lot of hidden quirks and hacks.

This is arguably a sign that the core of Nextcloud has to be improved. Also worth noting is that whatever means that were looked at earlier to do this, might be very different from the means involved in the suggestion by @pulsejet. It might be possible to do in a much more controlled and less intrusive way now, than before. Or not.

It’s never straightforward. So for now my assumption is that this plan is not realistic.

I really am not in position to say how much of the current development efforts are put into improving the current core and related parts of Nextcloud, but as I mentioned above, at some point there will be a need for improvement, and if all the years nothing was done on that part (e.g. because of fear of breaking things), then in the end we’ll have a stale or outdated product. At that point a total rewrite might be more fruitful, but that is arguably a much bigger endeavor than what is being suggested in this thread.


All that said, let’s all be friends. Oh, and yeah I agree, @pulsejet - take the offer of a 1:1 discussion about this! I’m sure it would be both nice and fruitful, it might get everyone much more on the same page than through a forum like this. Please.

2 Likes

@pulsejet

I was only assuming since I don’t know nothing. I don’t know why you PR wasn’t reviewed and I don’t know why they didn’t get back to you at all.

THere could be bazillons of reasons and I am no spokesperson for them.

Fact seems to be: they didn’t reply. And now Daphne came up with an answer. Finally.

BTW: she invited you to a 1:1, still. Which is VERY polite, from my POV.

Fully agree :100:. At times I see strange stuff like very old deprecated APIs still being used in the core itself (without any apparent reason), which makes me question either why they were deprecated or whether the core gets much maintenance at all. Of course there might be a good reason I’m unaware about, but maybe now I expect a similar lack of response if I ask.

Just to clarify, I’m not saying that either. The purpose of this post was to discuss if this is something like this could be considered if reasonably implemented. As I mentioned there will be trade-offs. Just that IMO, these would pay off in the long run, which is why I proposed this.

Good point I missed. With too many quirks and hacks you end up with Internet Explorer. Maybe it’s worth the investment to actually fix these.

As I said it doesn’t answer my question, and isn’t open for further discussion. Btw I disagree with the notion of “this can’t be fixed” in general. I’ve worked with massive legacy projects much larger and older than Nextcloud; everything can be fixed unless you’re too scared to try.

For the record, I already did. I’m open for a technical discussion, but it seems I’m alone.

1 Like

All contributions and future ideas are valuable in my view. That’s why I think I can understand your frustration at your idea seemingly not being considered. There should be a space where future visions can be discussed and eventually moved forward, I agree.

That being said, what happens to me sometimes is that I get very excited about an improvement I learned about and I want to apply it absolutely everywhere. I’m not saying this is your situation. But this discussion reminds me of that.

I believe you that you know what you’re doing and that implementing an event-driven approach could be a good future improvement to Nextcloud. It’s certainly something that would be a reasonable improvement and the underlying limitations of PHP are well-known (and they can be a superpower, too).

As a community building this software (and Nextcloud as a company plays a huge role also of course, but I cannot speak for them), I believe that we have different goals we would like to address.

And this is where I’m not sure if we can combine all these goals with what you suggest. If we look at the project you link, I can read from its documentation that it’s “installed as a PHP extension” (https://openswoole.com/docs). This excludes a part of the user base from the start. Because one of the qualities of Nextcloud I personally make use of is the fact it can literally run on a 1$ shared hosting with just MySQL and PHP.

This has always been Nextcloud’s strength in my view, using established technologies for the base and enhancing where needed for larger setups (see Redis cache, Talk backend, office etc.)

Another aspect at Nextcloud’s core is security. Any rewrite or breaking change can naturally open up new security issues that were previously contained by the tried & tested foundations where changes are applied conservatively.

However, we should keep in mind that requirements, the ecosystem, technology and communities are changing all the time. Who knows, maybe some day soon event-driven PHP will be more widespread and we’ll see Nextcloud successfully adopt it. Just because there’s no crowd of developers responding to your ideas it doesn’t mean that they are invalid or “impossible”. Personally, I’d encourage you to try out a fork and provide some first hand experience of how Nextcloud can perform using your suggestion – maybe as a proof of concept. But, to answer your starting question, nobody can guarantee you that your changes will ever be merged into the main project. It’s a bet you’re making, that most other developers currently not seem interested in making.

Generally, I want to encourage you (as others in this thread) to keep the conversation going. I also like your PR about frontend bundling improvements and I can see how it’s frustrating to have to push a lot to get these obvious (to us!) improvements merged. We always need to take into account that we’re trying to collaborate here, so we naturally need to explain things, iterate on things and align our views before we can reach a conclusion. And we need to assume that others here intend well, the same way we do.

3 Likes

Hi @te-online

Clearly we disagree on a lot of things, so I’ll be brief (well I tried).

This is not something I “learned about and want to apply”. If you read my first post, I did a lot of profiling before arriving at the conclusion that this would be helpful.

I think I mentioned multiple times already that this is a tradeoff. That said, theoretically it’s still possible (and quite easy) to still support running with PHP-FPM. Most async code can be run synchronously very easily; it’s just not very efficient to do this (and it can’t be less efficient than it is). As long as the support is properly implemented, I don’t see why it can’t work in all environments with swoole being an optional dependency.

So what are you suggesting? Never change the core because you’re too afraid that you might introduce new bugs?

What’s widespread is subjective. There are relatively few web applications that even have to care about performance in the first place. A big transition to better technology is not going to happen magically :slight_smile:. And without discussing it in the first place, it’s not going to happen. Meanwhile the world is moving ahead.

Well I don’t have much of a choice anyway. As I mentioned in a previous post, I’d do things differently in a fork. For instance, your use case of running it in a $1 shared host is precisely the kind of thing I won’t care about. I didn’t want a “guarantee” that the code will be merged (this is obviously impossible), but a general discussion of how this would be merge-worthy if implemented, for example. Or just other pitfalls to be aware about if implementing this.

Finally we agree. That’s why I closed this thread: I got my answer that Nextcloud isn’t interested in discussing this, let alone merge it in the future.

The whole discussion is awkward in the sense that you want to provide significant help/work and do not get the required/assumed support by the company. I can understand both positions in some way.

First, I would voice to not have any hard feelings. In fact, there is not yet anything implemented w.r.t. the core rework, so the time invested so far is bounded, am I wrong? We are still in the discussion phase and in an open discussion, we must keep an open mind.

I am not so sure about that. At least for me, it sounded not as fixed as you, @pulsejet, imply in your responses. I might be wrong if there was more discussion in private in the 1:1 conversation. I understand it like this:
The NC company sees that there is in fact a benefit in running things asynchronously. There even have been some persons that tried to implement such an architecture. They stumbled at various points. If your suggestion was able to solve these points, they would be glad to have a proof of concept. At least, to verify the performance impact, this can be used. To have a productive system there might be some more work at hand to iron out some wrinkles. I have never read an absolute no to your request.

The current variant with sync PHP execution is working and is a rock-solid solution (I am talking about the infrastructure, not the NC server or even the frontend!). It might not be the fasted one but hey, we are not programming in C/C++/… either. The question is: how much benefit does it bring to the user/customer (both end-user and admin) and the devs (both core + apps) to change something?

  • Adding complexity to the development will increase the risk of bugs. I know it is doable in general and shiny theory. But not all app devs are senior devs.
  • The same holds true for admins: Having to install a NC seems to impose quite a burden on some admins right now. Just look at the forum topics to see the average level of problems people face. With the new system (however it works) this will not be any better. I fear, it will require quite some additional knowledge from these users.
  • The end-user might be the one who has the most benefit of this change.
    • If the time to fetch the data from the server is reduced by 10% from 100ms to 90ms, this will not make a significant difference to the end-user for a single page. When this happens 20 times in a row (page loading with resources) that takes 20s, only 2s are saved. The user will most probably already be impatient and not even register the speed-up mentally.
    • If the change will save like 50 to 70% of the time, the user will have a subjective impression of speed-up.
    • For more savings the user will consider the new system super-fast and kiss your feet/ass.

You said that you had some measurements and the registration process was in the range of 10ms. What was the total processing time? That would be a crucial question to evaluate the potential of such an undertaking. I suspect that there is only 10 to 20% of the time used in the registration that could be saved by the async processing. Please prove me wrong, that could be a real game-changer if I was guessing wrongly.
With that in mind, you might understand the reserved behavior of the company.

On the other side, I see some points in your suggestion:

  • On low-end devices (think raspberry pi) that could lead to an acceptable performance if the impact is sufficient.
  • For the high-end (think Kubernetes and cloud providers), this might reduce the maintenance fees and be a strong selling argument for the company. Thus, this is the strongest argument to sell your idea to the NC company.
  • A general rework of the core might (or might not be) a good idea. At least some housekeeping would be beneficial, I think. Your work might trigger/enforce this as a pre-step.

I want to add one more comment: Please separate the core from the infrastructure. The core is mostly the classes in OC and OCP. With infrastructure I mean the dependencies used to build the core like doctrine and Symfony. We do not want to reinvent these, I suspect.
Apart from Swoole there might be solutions based on Symfony that do not need any additional PHP extensions as far as I know. There, the difference is, that you run dedicated workers as separate processes and have one central dispatcher PHP process. One has to evaluate what infrastructure might be the simplest one for us.


Having said all that I see only a few options to go on from here.

  • You forget about cooperation and do your own fork. This will be the worst case in my opinion as no synergies can be used between your work and the core team. The ideas will be lost for the big community and there might even be some competition might even arise.
  • Using some more in-depth measurements, you can prove that your idea is fool-proof and will speed up the server by a significant factor. This might raise the interest of the company and other community devs.
  • You build a proof of concept yourself with as least time as possible. Then, you can compare the end-user feeling and eventually use your proof-of-concept to gain attention to your idea.
  • You start to build small PRs to do this housekeeping as a pre-step. Then, you can piece-by-piece add more PRs to steer the core in that direction you suggested.
    That will require that you have some sort of global architecture in mind. Eventually, @Daphne can bring you in contact with devs or define a way to get the PRs merged in a timely manner. This will ensure as well that the current structure is not broken but extended by the async way of using the core.

Sorry for the long post. I know it tends to get lengthy at times.
Christian

6 Likes