Is it possible to add file types to files_fulltextsearch? (specifically .eml)

Hello Everyone,

I’m evaluating search platforms for my files. As I run a nextcloud anyway, using fulltextsearch would be awesome. However, I need to be able to search through .eml E-Mail files as well.
Is there a simple way to add file extensions or such to files_fulltextsearch? It should work pretty much out of the box if I’m not mistaken as it behaves like a plain text file.

I also found this github issue where it says that .eml support was added but it seems there’s been a major refactor of the app and .eml support has been dropped?

Best Regards

Hello,

Please send me some example of eml files to maxence@nextcloud.com

The issue with the current state of elasticsearch/tika and the fulltextsearch app is that it will index the raw content of the file. Which means that the HTML and the headers will mess up with the search.

This should be done with an app (like the files_fulltextsearch_tesseract) that parse the raw content using mailparser (or other rfc822 lib) and extract the real content (and why not, the attached files).

This would be a great improvement, however I do not have much time working on this, but if anyone want to have a look, I am available to answer any question regarding how to catch a file before its indexing

This would be a descent base. Working on NC15.

Thank you very much for the quick response and for looking into the matter.

I agree that it would be a great improvement, unfortunately I lack neccessary skills to contribute :frowning:

Don’t worry, there is a lot of way to contribute to the project; documentation, support, helping other users !

Still needs some improvement, but this is what will be available in Nextcloud 15:

1 Like

I forked the old NC 15 app and completely rebased it for Nextcloud 25.0.2

Just got it back to working again, so I will look into publishing it on the App Store for anyone else looking forward to it (still needs some cleaning up to be done - no tests or anything).

Searching works against the subject, from, to and text.

There are no settings or anything else added yet if ever … because that’s all I was looking at for now.

Maybe searching for specific attachments could be usefull tought …

I will test it against my collection of over 100k .eml files first tought …