I’m evaluating search platforms for my files. As I run a nextcloud anyway, using fulltextsearch would be awesome. However, I need to be able to search through .eml E-Mail files as well.
Is there a simple way to add file extensions or such to files_fulltextsearch? It should work pretty much out of the box if I’m not mistaken as it behaves like a plain text file.
I also found this github issue where it says that .eml support was added but it seems there’s been a major refactor of the app and .eml support has been dropped?
Please send me some example of eml files to firstname.lastname@example.org
The issue with the current state of elasticsearch/tika and the fulltextsearch app is that it will index the raw content of the file. Which means that the HTML and the headers will mess up with the search.
This should be done with an app (like the files_fulltextsearch_tesseract) that parse the raw content using mailparser (or other rfc822 lib) and extract the real content (and why not, the attached files).
This would be a great improvement, however I do not have much time working on this, but if anyone want to have a look, I am available to answer any question regarding how to catch a file before its indexing
This would be a descent base. Working on NC15.
Thank you very much for the quick response and for looking into the matter.
I agree that it would be a great improvement, unfortunately I lack neccessary skills to contribute
Don’t worry, there is a lot of way to contribute to the project; documentation, support, helping other users !
Still needs some improvement, but this is what will be available in Nextcloud 15: