XML files indexing in fulltextsearch / elasticsearch

Hello!

Before explaining what is this topic about, I would like to say that I’ve recently installed the nextCloud platform to test it and honestly I’ve been impressed! You have done an excellent job! Congrats and thank you for sharing!!

To the point:
I’ve managed to setup the fulltextsearch with elasticsearch in the backend and it works pretty fine for normal text files and pdf, however I cannot see results from .xml files… and I have +20K of them.

I know that elastcsearch and XML is not an ideal combination, however I would expect to see some index failure messages. I’ve enabled debug mode on elasticsearch:

curl -X PUT “my_user:my_password@localhost:9200/_cluster/settings” -H ‘Content-Type: application/json’ -d’ {“transient”:{“logger._root”:“DEBUG”}}’

and there are no error messages from elasticsearch. The messages I get when it comes to .xml files are like this:

[2019-02-27T13:43:50,462][INFO ][t.b.r.a.ACL ] [nextcloud] ALLOWED by { name: ‘Accept requests from cloud1 on my_index’, policy: ALLOW, rules: [groups, indices]} req={ ID:1761111980-1231334325#6666, TYP:IndexRequest, CGR:N/A, USR:iec-sva, BRS:false, KDX:null, ACT:indices:data/write/index, OA:127.0.0.1, A:127.0.0.1, IDX:my_index, MET:PUT, PTH:/my_index/standard/files%3A723, NT:{“share_names”“admin”:“Documents/cispr16-1-3{ed2.0}en_meta_data.xml”},“owner”:“admin”,“users”:,“groups”:],“circles”:],“links”:],“metatags”:“files_local”],“subtags”:],“tags”:],“hash”:“”,“provider”:“files”,“source”:“files_local”,“title”:“Documents/cispr16-1-3{ed2.0}en_meta_data.xml”,“parts”:,“content”:“”}, HDR:{Accept=application/json, Authorization=Basic aWVjLXN2YTppZWNjb3Jl, Content-Length=318, Content-Type=application/json, Host=127.0.0.1:9200}, HIS:[Accept requests from cloud1 on my_index->[indices->true, auth_key->true]] }

I think that there is no indexing at all on .xml files.

Is there a way to enforce somehow the indexing of XML?

Thanks and Kind Regards,

Stavros

No solution so far and on top I went through the usual errors “Index already running” no matter how many times I tried to stop with:

sudo -u apache ./nextcloud/occ fulltextsearch:stop
sudo -u apache ./nextcloud/occ fulltextsearch:reset
sudo -u apache ./nextcloud/occ fulltextsearch:index

Uninstall/Install didn’t really work, so I had to dig into the DB and remove all the tables and related records.

If someone is interested in completely uninstalling fulltextserach, here is how:

  1. Remove all related packages from the web application manager
  2. Get a backup of your mariadb/mysql database :slight_smile:
  3. Go to your mariadb/mysql instance

mysql -h <your_hostname_or_ip_of_mariadb> -u <your_user> -P 3306 -p

  1. Switch to your DB

use your_nextcloud_db;

  1. Drop the 2 tables below

drop table oc_fulltextsearch_indexes;
drop table oc_fulltextsearch_ticks;

  1. Cleanup the oc_appconfig table

delete from oc_appconfig where appid=‘fulltextsearch’;
delete from oc_appconfig where appid=‘fulltextsearch_elasticsearch’;
delete from oc_appconfig where appid=‘files_fulltextsearch’;
delete from oc_appconfig where appid=‘files_fulltextsearch_tesseract’;

After that, installing all the appz and running

sudo -u apache ./nextcloud/occ fulltextsearch:test
sudo -u apache ./nextcloud/occ fulltextsearch:index

worked fine.

Cheers,
St./

1 Like

Thanks Sva. Your process allowed me to cleanly remove my old index and recreate it on a new separate three node elasticsearch cluster.

I know this is a topic from a long time ago, thanks for your instructions but after following all your instructions and doing a:
sudo -u www-data php ./occ fulltextsearch.test I still get these errors, as if the tables haven’t been deleted:

In ExceptionConverter.php line 47:

An exception occurred while executing a query: SQLSTATE[42S02]: Base table or view not found: 1146 Table ‘nextcloud.
oc_fulltextsearch_ticks’ doesn’t exist

In Exception.php line 26:

SQLSTATE[42S02]: Base table or view not found: 1146 Table ‘nextcloud.oc_fulltextsearch_ticks’ doesn’t exist

In Statement.php line 82:

SQLSTATE[42S02]: Base table or view not found: 1146 Table ‘nextcloud.oc_fulltextsearch_ticks’ doesn’t exist

When I delete the tables and cleanup the appconfig table in Mysql I also get the “table does not exsist” error.

any idea where to look for?

@sva
I followed your instructions to do a clean remove/reinstall cycle, but no success. When doing fulltextsearch:test i get

Testing search platform. ok  
Locking process fail 
In RunningService.php line 86:
                            
  Index is already running  
                            

fulltextsearch:test [--output [OUTPUT]] [-j|--json] [-d|--platform_delay PLATFORM_DELAY]

Is your instance actually running?

I have nextcloud-fulltext-elasticsearch-worker.service running and by coincidence found it to be the cause of the error. When i stop the service the test successfully runs. So may be when the service is running the error is expected. I don’t know. If yes, sorry for the excitement.
My apologies