Full text search working for documents added through DAV or occ files:scan

martingwb · August 10, 2023, 11:54am

Hi,
I am adding pdf documents through a scanning app which is using DAV for uploading the files to nextcloud and further using a workflow_script to ocr the file.
What needs to be done to make fulltextsearch:

indexing file content (ocr’ed pdf) when adding through DAV
indexing file content (ocr’ed pdf) when adding through occ files:scan

I assume it has to do with adding a provider but the documentation seems to be incomplete or I just do not understand it. Fulltextsearch/elasticsearch is working fine (including ocr’ed text from pdfs) for existing files and files added through the web UI or desktop sync on NC 27.0.1 for me.

Thank you for help and suggestions!

devnull · August 10, 2023, 12:05pm

Sorry i can not really help you.

occ files:scan is not needed if you use Nextcloud with WebDAV.
It is needed if you use something like cp, scp,sftp, rsync, …

I can’t say anything about the subsequent steps regarding OCR.

martingwb · August 10, 2023, 12:35pm

Yes, I know. The 2 points in my list are meant to be viewed independently (not additive).

Cult · August 10, 2023, 2:06pm

Where are those files located ?

Is this an issue with files uploaded into any folder, external filesystem or even at the root of your home directory ?

martingwb · August 10, 2023, 2:19pm

The pdf files are uploaded to a “regular” (I assume this is what you mean by “any”) folder in my users hierarchy by the app (SwiftScan). The file is recognized by fulltextsearch:live but not indexing the ocr content.
If I do this with an un-ocr’d pdf the workflow is using ocrmypdf and occ files:scan on the new ocr’d pdf file.
Both ways (dav and occ files:scan) do not seem to trigger fulltextsearch to index the pdf ocr information for the search.

Anything I need to configure?