XML files indexing in fulltextsearch / elasticsearch



Before explaining what is this topic about, I would like to say that I’ve recently installed the nextCloud platform to test it and honestly I’ve been impressed! You have done an excellent job! Congrats and thank you for sharing!!

To the point:
I’ve managed to setup the fulltextsearch with elasticsearch in the backend and it works pretty fine for normal text files and pdf, however I cannot see results from .xml files… and I have +20K of them.

I know that elastcsearch and XML is not an ideal combination, however I would expect to see some index failure messages. I’ve enabled debug mode on elasticsearch:

curl -X PUT “my_user:my_password@localhost:9200/_cluster/settings” -H ‘Content-Type: application/json’ -d’ {“transient”:{“logger._root”:“DEBUG”}}’

and there are no error messages from elasticsearch. The messages I get when it comes to .xml files are like this:

[2019-02-27T13:43:50,462][INFO ][t.b.r.a.ACL ] [nextcloud] ALLOWED by { name: ‘Accept requests from cloud1 on my_index’, policy: ALLOW, rules: [groups, indices]} req={ ID:1761111980-1231334325#6666, TYP:IndexRequest, CGR:N/A, USR:iec-sva, BRS:false, KDX:null, ACT:indices:data/write/index, OA:, A:, IDX:my_index, MET:PUT, PTH:/my_index/standard/files%3A723, NT:{“share_names”“admin”:“Documents/cispr16-1-3{ed2.0}en_meta_data.xml”},“owner”:“admin”,“users”:[],“groups”:],“circles”:],“links”:],“metatags”:“files_local”],“subtags”:],“tags”:],“hash”:"",“provider”:“files”,“source”:“files_local”,“title”:“Documents/cispr16-1-3{ed2.0}en_meta_data.xml”,“parts”:[],“content”:""}, HDR:{Accept=application/json, Authorization=Basic aWVjLXN2YTppZWNjb3Jl, Content-Length=318, Content-Type=application/json, Host=}, HIS:[Accept requests from cloud1 on my_index->[indices->true, auth_key->true]] }

I think that there is no indexing at all on .xml files.

Is there a way to enforce somehow the indexing of XML?

Thanks and Kind Regards,


How to reset everything regarding fulltextsearch

No solution so far and on top I went through the usual errors “Index already running” no matter how many times I tried to stop with:

sudo -u apache ./nextcloud/occ fulltextsearch:stop
sudo -u apache ./nextcloud/occ fulltextsearch:reset
sudo -u apache ./nextcloud/occ fulltextsearch:index

Uninstall/Install didn’t really work, so I had to dig into the DB and remove all the tables and related records.

If someone is interested in completely uninstalling fulltextserach, here is how:

  1. Remove all related packages from the web application manager
  2. Get a backup of your mariadb/mysql database :slight_smile:
  3. Go to your mariadb/mysql instance

mysql -h <your_hostname_or_ip_of_mariadb> -u <your_user> -P 3306 -p

  1. Switch to your DB

use your_nextcloud_db;

  1. Drop the 2 tables below

drop table oc_fulltextsearch_indexes;
drop table oc_fulltextsearch_ticks;

  1. Cleanup the oc_appconfig table

delete from oc_appconfig where appid=‘fulltextsearch’;
delete from oc_appconfig where appid=‘fulltextsearch_elasticsearch’;
delete from oc_appconfig where appid=‘files_fulltextsearch’;
delete from oc_appconfig where appid=‘files_fulltextsearch_tesseract’;

After that, installing all the appz and running

sudo -u apache ./nextcloud/occ fulltextsearch:test
sudo -u apache ./nextcloud/occ fulltextsearch:index

worked fine.