Hi, I am trying to get tesseract OCR working with fulltextsearch. So far, I think I’m able to index pdf files (since I can search them, lmk if there is a way to check for sure). But I am not able to search for text in images. I tried fulltextsearch:index and also uploaded new photos to check, but I’m not able to search them. Does the OCR addon not support image OCR? I thought it does because tesseract is able to ocr images just fine through cli.
Edit: It doesn’t seem to be working. Apparantly, those were the default features of the fulltextsearch docker. After I disabled those and tried it, OCR does not work at all.
If you run the OOC command php occ fulltextsearch:check
, you’ll see
- Content Providers:
Files 28.0.0
{
"files_local": "1",
"files_external": "1",
"files_group_folders": "1",
"files_encrypted": "0",
"files_federated": "0",
"files_size": "102488",
"files_pdf": "1",
"files_office": "1",
"files_image": "0",
"files_audio": "0",
"files_chunk_size": "2",
"files_fulltextsearch_tesseract": {
"version": "27.0.0",
"enabled": "1",
"psm": "",
"lang": "eng",
"pdf": "1",
"pdf_limit": ""
}
}
While I haven’t tested it, I would think changing the value for “files_image” to 1 may do it.
That setting has been intentionally left out of the NC admin settings page, which makes me think that feature currently has bugs/issues.
Certainly not a confirmed solution for you, but hopefully that puts you on the right path.
I’ll try to confirm the content actually get’s indexed, but when I run the php occ fulltextsearch:index
command, I do see image files showing up in the documents being scanned (even with the “files_image” value mentioned above set to 0).
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.