I’m evaluating search platforms for my files. As I run a nextcloud anyway, using fulltextsearch would be awesome. However, I need to be able to search through .eml E-Mail files as well.
Is there a simple way to add file extensions or such to files_fulltextsearch? It should work pretty much out of the box if I’m not mistaken as it behaves like a plain text file.
I also found this github issue where it says that .eml support was added but it seems there’s been a major refactor of the app and .eml support has been dropped?
The issue with the current state of elasticsearch/tika and the fulltextsearch app is that it will index the raw content of the file. Which means that the HTML and the headers will mess up with the search.
This should be done with an app (like the files_fulltextsearch_tesseract) that parse the raw content using mailparser (or other rfc822 lib) and extract the real content (and why not, the attached files).
This would be a great improvement, however I do not have much time working on this, but if anyone want to have a look, I am available to answer any question regarding how to catch a file before its indexing