AI and Photo's 2.0 - In-Depth Explanation of Nextcloud Recognize and How it Works


I was very happy and suprised to see the new Photo’s 2.0 app, that sounds very great seeing functionalities which are missed over a long time. Great to see this improvement!! Makes me very happy again.

There was also spoken about AI and face recognition. Frank said that the face recognition can be used by each system, even a raspberry pi. I was suprised by that. For new uploaded photo’s I can imagine that it works, but what happends with already existing photo’s ? Like now I have an archive of thousands of photo’s which all need to be gone through. How does that work ?

For example, when I now update my instance to 25 how when will all existing photo’s gone through by the face recognition ? And how long does it take ? How is prevented that it slow down the system to much so it is not usable ?

Just a guess, but I assume, that existing photos will be analyzed in chunks via cron job, step by step :man_shrugging:

1 Like

As I intended to shed some light on this furing my conference talk but ended up getting sick, I’ll use this opportunity instead to explain how recognize works:

How did we pull this off?

We’re not reinventing the wheel, but standing on the shoulders of giants. For most of the recognize features we’re using neural network models trained by Google. (A model in this case is a blob of math with lots of numbers, called weights. If you put in data on one end the math will calculate a result for you. That’s it.)

These are the models we’re using:

• FaceRecognizerNet by Davis King (dlib)
(99.38% Acc on Labeled Faces in the Wild dataset)
• EfficientNet v2 by Google
(83.9% top-1 accuracy on Imagenet, SOTA: 90%)
• Landmarks v1 by Google
(94,000 landmarks, accuracy varying greatly)
• MoViNet by Google
(82% top-1 accuracy on kinectics 600, SOTA: 91%)
• Custom MusiCNN
based on a paper by Jordi Pons, Xavier Serra

How do we run models in Nextcloud?

PHP does not play very nice with stuff that machine learning community comes up with, as that is mostly python code, but luckily there’s TensorFlow’s TensorFlow.js, a deep learning framework which runs in Node.js. And Node.js is a self-contained JavaScript runtime.

The recognize app comes with Node.js and executes a Node.js script from a PHP background job. That script then boots up Tensorflow with the relevant model and preprocesses the files it was passed by the PHP job and passes the result back to it. By default the model is executed in native speed using libtensorflow, but if the machine it runs on doesn’t support that we can run the model in WASM, which even works on a Raspberry Pi.

How to integrate results in Nextcloud?

How do we integrate these results in the Nextcloud UI? Most models output categories as a result, so tags lend themselves naturally as an integration point. For faces we created a webDAV endpoint that offers clusters of photos with the same face as DAV collections. This API is used by the photos app.

How do we make sure it scales?

And it needs to scale both up and down. For this we’ve implemented a cascade of background jobs that process both new files as well as existing files, broken up into mount points for more granular processing – the more machines you have that execute background jobs, the faster recognition will be.


Lastly, let me say a few words about ethics. AI is hip and cool, but it is also a huge responsibility and sometimes scary. AI depends on big data collections which often infringe the privacy of service users. And AI is also often a black box: The people whose data is processes often don’t know how or why a certain outcome happens. We’re conscious of these issues and think that Nextcloud has a unique advantage for users to benefit from AI, because Open Source offers technological transparency, where users can learn how the system works and change it if they want, and Nextcloud also offers privacy, as we went out of our way to make sure your data doesn’t leave your server. Nonetheless, we’re always open for criticism, so if you have concerns, don’t hesitate to get in touch.

I hope this gives some more insight into how recognize works.


Thanks @marcelklehr great to read this and how it works. I will definitely check it out!

Just for clarification: So the Photos2 app uses the Recognize app and this again uses the dlib for face detection? And these results are displayed in the Photos2 app as “Persons”?

I have two nextclouds installed where we (family) auto upload our pictures from the smartphones. In the past years, we collected quite a mass of pictures here, including the ones we make with professional cameras.

Both nextclouds run the 25 public release now, whereas the bigger one (Intel NUC core i3, Ubuntu 22.04) runs the app facerecognition and the Raspberry Pi4 (Debian bullseye arm64) runs the Recognize app.

The results are worlds apart! The Photos2 app is mostly unusable (also due to bugs and speed), if you add a new person to an existing one, you have to reload all Photos again, otherwise you cannot add another unknown person, the lightbox simply does not appear. The false rate i roughly guessed around 60-70 percent even, where it mixes pictures of my, my wife and our kids into a single person.
The facerecognize app has way less errors in terms of recognition of faces and persons and does not hog the complete system (even if I set the number of CPUs to use to 1 on Recognize).

I really love the approach of having the recognize in the Photos2 app directly, but I think it would make sense to give users the opportunity to adjust a few settings, like it is also possible in Matias Delellis facerecognition, for example cluster confidence, pictures size, thresholds and maybe even the models. As it is right now, Recognize seems to be unusable for at least the face recognition, even though it seems to run the same lib for this job. On a Raspberry Pi, it consumes a lot of computing power over days and weeks with results that are far from satisfaction.
Apart from that, Matias Delellis has a modular approach, where the dlib part can be run on a different machine via docker. This feature greatly speeds up my Raspberry and also my Intel NUC, while I use my editing machine with two Nvidia cards (still working on that cuda compiled dlib), but it makes use of the raw processing power. A thought through approach in many ways, that even produces usable output.

Stil, I am happy to test the next releases and am really looking forward to see a usable AI within nextcloud, on both instances. :slight_smile:

ps: Another speed up approach would be a machine check, if there is cuda installed along with GPUs that support it and make use of a very high speed decoder/encoder/transcoder setup for the video parts by using h264 or h265 _nvenc or -qsv vid-codecs, most ffmpeg installs on Linux OSes are already compiled with the cuda or intel qsv support. Maybe also check if it can be used to speed up the Recognize AI :slight_smile:

I’m still quite confused by the face recognition and tagging, but I guess I have to be a bit more patient until these things get more mature.

Did you read the previous post where the developer gave a detailed breakdown of how it works, including photos?

Yes, but some things are still not clear.
One example: on photos/tags page(Tags) there are tags with people’s names, but these are not the same ones as the ones photos/faces(People).

1 Like