Hello community!
I’ve been trying to get through the initial FullTextSeach index for about 2 weeks now.
Everytime I run php occ fulltextsearch:index
, it stops/freezes at the same file:
Memory: 49 MB
┌─ Indexing ────
│ Action: fillDocument
│ Provider: Files Account: my_username
│ Document: 3062935
│ Info: application/pdf
│ Title: Shop Manuals/Isuzu Rodeo TF 1988 - 2002/Holden Rodeo TF 6VD1 99-02 SM.pdf
│ Content size:
│ Chunk: 255/1277
│ Progress: 5/37
└──
┌─ Results ────
│ Result: 0/0
│ Index:
│ Status:
│ Message:
│
│
└──
┌─ Errors ────
│ Error: 1/1
│ Index: files:3033570
│ Exception: Elastic\Elasticsearch\Exception\ClientResponseException
│ Message: unknown error
│
│
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause
It is an auto mechanic shop PDF and it’s quite large (about 400 pages). It’s only one of a few dozen I have, and most of those are similar and also just as large. Others seem to index just fine, but this particular file causes the indexing to freeze.
I’ve let it sit for several days to see if it will progress any further and it never does.
Overall, I have about 600GB worth of files to index and it’s only getting about 1/5th of the way through, stopping at this same file every time.
I do get the one error shown above, but it’s for a different file. There are no other errors to speak of, but I do get the following notices when I try to run the index.:
openjpeg warning: unspec CS. 1 component so assuming gray.
Dereference of free object 3, next object number as offset failed (code = -18), returning NULL object.
openjpeg warning: unspec CS. 3 components. Assuming data RGB.
Every time I try to run the initial index, those messages change. Sometimes I don’t get any, and sometimes I get a lot. They seem more informational and not like hard-fault errors though.
Every time it gets to this file and freezes, I’ve tried to end the index with php occ fulltextsearch:stop
, which fails to actually stop the index because if I try to run the index again, it errors out saying index is already running. I have to actually kill and restart PHP to start a new initial index.
I’ve considered just changing the file extension on this one file to see if I can get through the initial index, but I’d prefer for it to actually complete successfully without manipulation.
I’ve tried various settings related to PDF indexing in the NC admin page, but same issue each time. Here’s the current setup:
php occ config:list | less
"fulltextsearch": {
"app_navigation": "1",
"cron_err_reset": "1712500204",
"enabled": "yes",
"installed_version": "28.0.1",
"search_platform": "OCA\\FullTextSearch_Elasticsearch\\Platform\\ElasticSearchPlatform",
"types": ""
},
"fulltextsearch_elasticsearch": {
"analyzer_tokenizer": "standard",
"elastic_host": "http:\/\/INTERNAL_IP_ADDRESS:9200",
"elastic_index": "nc_indexnextcloud",
"enabled": "yes",
"installed_version": "28.0.1",
"types": ""
},
"files_fulltextsearch": {
"enabled": "yes",
"files_audio": "0",
"files_encrypted": "0",
"files_external": "1",
"files_federated": "0",
"files_group_folders": "1",
"files_image": "0",
"files_local": "1",
"files_office": "1",
"files_pdf": "1",
"files_size": "1024",
"installed_version": "28.0.0",
"types": "filesystem"
},
"files_fulltextsearch_tesseract": {
"enabled": "yes",
"installed_version": "27.0.0",
"tesseract_enabled": "1",
"tesseract_lang": "eng",
"tesseract_pdf": "1",
"tesseract_pdf_limit": "",
"tesseract_psm": "",
"types": ""
},
php occ fulltextsearch:check
Full text search 28.0.1
{
"search_platform": "OCA\\FullTextSearch_Elasticsearch\\Platform\\ElasticSearchPlatform",
"app_navigation": "1",
"provider_indexed": "",
"cron_err_reset": "1712500204",
"tick_ttl": "1800",
"collection_indexing_list": "50",
"migration_24": "1",
"collection_internal": "local"
}
- Search Platform:
Elasticsearch 28.0.1 (Selected)
{
"elastic_host": [
"http://INTERNAL_IP_ADDRESS:9200"
],
"elastic_index": "nc_indexnextcloud",
"fields_limit": "10000",
"es_ver_below66": "0",
"elastic_logger_enabled": "1",
"analyzer_tokenizer": "standard",
"allow_self_signed_cert": "false"
}
- Content Providers:
Files 28.0.0
{
"files_local": "1",
"files_external": "1",
"files_group_folders": "1",
"files_encrypted": "0",
"files_federated": "0",
"files_size": "1024",
"files_pdf": "1",
"files_office": "1",
"files_image": "0",
"files_audio": "0",
"files_chunk_size": "2",
"files_fulltextsearch_tesseract": {
"version": "27.0.0",
"enabled": "1",
"psm": "",
"lang": "eng",
"pdf": "1",
"pdf_limit": ""
}
}
Thank you in advance for any assistance with this!