Documents with no file extension are not indexed

Environment details

Distro: Debian GNU/Linux 11 (bullseye)
Kernel: 5.10.0-11-arm64 aarch64
Apache: 2.4.52
php: 7.4.28
database: mariadb 10.5.12
elasticsearch: 7.17.0
Nextcloud: 23.0.2

Database and elasticsearch cluster are on separate servers all running the same OS and patched to the same level. Storage is provided by a TrueNAS NFS share.

So I noticed that the contents of some text based files were not being returned in my full text search.

After some testing and monitoring of the process with php occ fulltextsearch:live I found the following.

If I create an empty file and open it in my text editor then paste in some dummy text I see the following.

2022-02-24_21-29

The content size is shown as zero. If I search for a word in the file I get no result.

2022-02-24_21-30

Now if I add .txt to the file I get the following.

2022-02-24_21-32

Now the process sees their is content in the file and if I search for the same word I get a result.

2022-02-24_21-32_1

I get the same result with a .odt file for example. My desktop PC can still open the file without a file extension as I would expect in Linux.

My question is this. Is this expected behavior? Do files seen as ‘octet-stream’ just not get ingested / indexed?