CE Next Cloud for Small Legal Office

Starfish · September 27, 2018, 5:45am

So something to remember, is it is all proportional. If you have a million or 10 million files, then yeah, you need a beast. But for 60 000+ files, I would think a quad core with 8Gig Memory should suffice in building your indices. The compromise for money will be speed, so if you are willing to wait a little bit longer during the initial build, it does not have to be a beast. As I said the initial build might take a bit longer, but after that the machine would stand idle if you don’t upload 1000+ each day.

How it works in Nextcloud, from what I have seen in my PoC environment, is that you build the initial index with Elasticsearch, and this can take a long time, or not, depending on your amount of files. I had 5000 files in my PoC, of different types, PDF, doc, odt, excel etc. The indexing took roughly a minute to complete. This was with an ElasticSearch machine with 4 cores and 8GB memory. What happens after this, is that during the normal cron job for Nextcloud which runs every 15 minutes, the index will be updated with all the files added in the last 15 minutes. See this small thread for more info.

I will only presume Tesseract will work the same, but I am not sure.

I hope my above explanation covered this. Please note, the result of the ElasticSearch will only make full text search capable, nothing more. So if you want to do more with it, I think you will need to run a thirdparty (middleware) application outside of Nextcloud for this. HTH.