FullTextSearch using AWS Elastic Service as the end point

Hi,

I am attempting to get the fulltextsearch app working. It needs an elastic search instance now and it looks like NextAnt using SOLR has been discontinued or as it says on the WIKI is it EOL, which assume means end of life?

I have a Nextcloud 12 instance running on AWS EC2 Ubuntu machine. I wish to add “full text search” so basically an Elastic Search end point. I created this using AWS Elastic Service. It appears to have the ingestion add on as standard so it should work. In my NC instance I’ve added this Elastic Service endpoint to the configuration as seen here:

I’m not entirely sure how I can debug it but there are not any errors but it’s clear only the standard fill name search is working.

Are there limitations, as in you must install elastic search instance locally? or should I be able to use an AWS elastic service as my endpoint?

Any advice would be great.
Regards

Full text search should work on remote instance of Elasticsearch. Do you have any error while running your first index ?

Thanks @Cult

Appreciate the quick response. I thought it would work also.

I get the following and I know my instance is available and I have made sure my EC2 can see my Elastic service instance through ip white listing!

Thanks again for trying to help me on this. Really appreciate it.
Thanks
Colm

@Cult

Any suggestions on a way to really test my NC to Elastic Service connection other than me doing some basic IP testing?

Thanks

@Cult

For your information when I click through to my elasticsearch service enpoint I get this which looks like it’s active:

In Kibana link I get this:

Maybe that’ll give you some ideas when helping out.
Thanks

Some additional information.

I have made my remote instance open to ALL while I test this and I will lock it down once I find out what is wrong.

As such my config in NC 12 looks like this without a username and password as the AWS Elasticsearch service does not have that sort of authentication. Also where it says name of index. Does this need to be matching something inside my ES instance? I have just given this a value “search”.

Hope all this helps!

@Cult

I have found this about the AWS Elasticsearch service running on port 80 not over port


To quote this guy:

I see port 9200 in the debug output, but I think the AWS elasticsearch service runs over port 80. Can you explicitly set the port to 80 in your $host’s array? The client defaults to 9200 if you don’t specify a port, since that is the default for Elasticsearch (and most hosted ES services too).

Seems to be throwing the same error as me at least. I’ll keep trying.

In the last hour I spun up a new AWS Elasticsearch service and stepped through all of the configuration and tested against this second instance with no joy.

I’m guessing I’m doing some wrong regarding access to the service but right now it is set to open to call. Happy to share over private message the actual open end point if that helps @Cult ?

Cheers
Colm

I have no idea of the port for the AWS Elasticsearch, or if you need authentication, but you can try a request from the box that host your nextcloud using links or curl:

links http://your-endpoint:9200/

(in case it runs on port 9200)
Note that I am using http in this example.

1 Like

Also, the port can be 443 in case of https

1 Like

@Cult

Thanks mate.

So basically I seem to have it working now. See screen:

So will this index now periodically which is triggered from the Nextcloud instance end? Let me know if there is something I need to do to make sure it re-indexes new files.

The Solution.

For others to know the solution if you are using Amazon Elasticsearch Service (https://aws.amazon.com/elasticsearch-service/)

When specifying the Elasitc Search Instance endpoint you must use http not the https on the url and also you must specify the port explicitly as 80.

Example screenshot:

Obviously you replace this with your AWS service endpoint but in AWS they give you the https version, strip it off and give a http and port 80.

Then when you go back into NC server to run your first index it’ll work.

Caveat:

I am investigating why txt files seem to only index the first line of text and not anything further down for me. IT is not an issue with .docx or pdf, they seem to index fine. Separate issue so marking this as resolved.

Thanks @Cult - if you know anything about text file issues that have line breaks let me know mate. Thanks.

1 Like

do you use any authentication on your Elasticsearch ?

Hi @Cult

I will be restricting access to NC instance inside my AWS VPC and by IP Address. As it is AWS Elasticsearch service I have to use the policy template option to limit users, I can’t do this by “user” as I could if it was elastic search running on my own EC2 server instance.

I will lock down the permissions shortly when done testing. New documents I added last night did not get indexed. Is there something I’m missing on terms of the setting a periodic index?

Thanks

Is your cron configured in the Nextcloud admin settings page ?

1 Like

Hi @Cult

Yes, I think I have it setup correctly like so:

Is there something I’m missing?

It is definitely not running every 15 mins beause I setup the “Cron test.txt” a couple of hours ago and it didn’t seem to index and when I ran it manually I can see the output of the index of it, so the cron isn’t running.

Any tips?

Thanks
C

I would have thought changing my hourly cron end to be the “www-data” user would have been enough but maybe this is not how I should go about it?

Thanks

So, the cron is running every hour. So you should wait at least an hour and check if your new files are indexed.

However, there is 2 things that should be tested instead of waiting so long:

try running the live index:

sudo -u www-data ./occ fulltextsearch:live

and see if when you edit a file or upload a new file, changes appears within the next 30 seconds.

If it’s fine, try running the cron.php manually from the nextcloud basedir (after editing/uploading a file):

sudo -u www-data php -f cron.php
1 Like

Apologies @Cult , I have it working just now.

I made a mistake in my crontab, I set one up for the www-data user and it’s running every 15 minutes now just fine without issue.

Thanks for responsiveness. :+1::+1:

Cheers

Colm

1 Like