Some files are not indexed, content field is empty

Hi,

I’ve migrated my private Seafile cloud to NC18.01 and have deployed NC for my sports club and company as well. All setups are running on Ubuntu Server 18.04 and I’m happy so far.

One thing I see on all servers with Elasticsearch: During indexing a number of files are not indexed correctly. Not a huge number and I can live with that, but it’ll be good to know any reasons why this happens.

These documents do all give this error in the indexer:

[2020-02-19T05:10:30,832][DEBUG][o.e.a.b.TransportBulkAction] [EqJD9yI] failed to execute pipeline [attachment] for document [jaycloud/standard/files:108110]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]
	at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:195) ~[elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:134) ~[elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:100) ~[elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.ingest.IngestService.innerExecute(IngestService.java:473) ~[elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.ingest.IngestService.access$100(IngestService.java:68) ~[elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.ingest.IngestService$4.doRun(IngestService.java:402) [elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.6.jar:6.8.6]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.6.jar:6.8.6]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]

When I have a look at the document index via occ command, the “content” field of these documents is always empty.

I’m right now re-indexing my private cloud as I’ve added some external storage dirs via davfs2, out of right now 92730 files 267 do have this issue.

Is there any reason why some documents do not work?