Rebuild Index On Large Instance Fails

Nextcloud version (eg, 20.0.5): 24.0.11
Operating system and version (eg, Ubuntu 20.04): Ubuntu 20.04.6
Apache or nginx version (eg, Apache 2.4.25): nginx 1.18.0
PHP version (eg, 7.4): 8.0.28
Elasticseach version: 7.17.9

The issue you are facing:

To index all files takes a long time. Normally this takes two or three days. After I moved the Nextcloud instance to a new server, I have to rebuild the index. But that doesn’t work any more. The Elasticsearch instance have a JVM heap size of 15g. Is the following error a Nextcloud Fulltextsearch bug or should a tune the Elasticsearch?

Is this the first time you’ve seen this error? Y

Steps to replicate it:

  1. php fulltextsearch:stop
  2. php fulltextsearch:reset
  3. php fulltextsearch:index

Error Output

┌─ Indexing  ────
│ Action: compareWithCurrentIndex
│ Provider: Files                Account: user-14987
│ Document: 1712499
│ Info: /path/to/file
│ Title:
│ Content size:
│ Chunk:     63/72
│ Progress:      0/18
└──
┌─ Results ────
│ Result:  75322/75322
│ Index: files:1712499
│ Status: ok
│ Message: {"_index":"nextcloud","_type":"_doc","_id":"files:1712499","_version":1,"result":"created","_shards":{"total":2,"successful"
│ :1,"failed":0},"_seq_no":71327,"_primary_term":1}
│
└──
┌─ Errors ────
│ Error:     13/13
│ Index: files:8878201
│ Exception: Elasticsearch\Common\Exceptions\BadRequest400Exception
│ Message: org.apache.xmlbeans.XmlException: error: The prefix "p" for element "p:ph" is not bound.
│
│
└──
## x:first result ## c/v:prec/next result ## b:last result
## f:first error ## h/j:prec/next error ## d:delete error ## l:last error
## q:quit ## p:pause
An unhandled exception has been thrown:
TypeError: OCA\Files_FullTextSearch\Model\FilesDocument::setMimetype(): Argument #1 ($type) must be of type string, bool given, called in /srv/http/cloud.example.com/nextcloud/apps/files_fulltextsearch/lib/Service/FilesService.php on line 479 and defined in /srv/http/cloud.example.com/nextcloud/apps/files_fulltextsearch/lib/Model/FilesDocument.php:124
Stack trace:
#0 /srv/http/cloud.example.com/nextcloud/apps/files_fulltextsearch/lib/Service/FilesService.php(479): OCA\Files_FullTextSearch\Model\FilesDocument->setMimetype()
#1 /srv/http/cloud.example.com/nextcloud/apps/files_fulltextsearch/lib/Service/FilesService.php(324): OCA\Files_FullTextSearch\Service\FilesService->generateFilesDocumentFromFile()
#2 /srv/http/cloud.example.com/nextcloud/apps/files_fulltextsearch/lib/Provider/FilesProvider.php(269): OCA\Files_FullTextSearch\Service\FilesService->getFilesFromUser()
#3 /srv/http/cloud.example.com/nextcloud/apps/fulltextsearch/lib/Service/IndexService.php(183): OCA\Files_FullTextSearch\Provider\FilesProvider->generateIndexableDocuments()
#4 /srv/http/cloud.example.com/nextcloud/apps/fulltextsearch/lib/Command/Index.php(416): OCA\FullTextSearch\Service\IndexService->indexProviderContentFromUser()
#5 /srv/http/cloud.example.com/nextcloud/apps/fulltextsearch/lib/Command/Index.php(279): OCA\FullTextSearch\Command\Index->indexProvider()
#6 /srv/http/cloud.example.com/nextcloud/3rdparty/symfony/console/Command/Command.php(255): OCA\FullTextSearch\Command\Index->execute()
#7 /srv/http/cloud.example.com/nextcloud/core/Command/Base.php(168): Symfony\Component\Console\Command\Command->run()
#8 /srv/http/cloud.example.com/nextcloud/3rdparty/symfony/console/Application.php(1009): OC\Core\Command\Base->run()
#9 /srv/http/cloud.example.com/nextcloud/3rdparty/symfony/console/Application.php(273): Symfony\Component\Console\Application->doRunCommand()
#10 /srv/http/cloud.example.com/nextcloud/3rdparty/symfony/console/Application.php(149): Symfony\Component\Console\Application->doRun()
#11 /srv/http/cloud.example.com/nextcloud/lib/private/Console/Application.php(211): Symfony\Component\Console\Application->run()
#12 /srv/http/cloud.example.com/nextcloud/console.php(100): OC\Console\Application->run()
#13 /srv/http/cloud.example.com/nextcloud/occ(11): require_once('...')
#14 {main}

The output of your Nextcloud log in Admin > Logging:
Nothing

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'passwordsalt' => 'SECRET',
  'secret' => 'SECRET',
  'trusted_domains' =>
  array (
    0 => 'cloud.example.com',
  ),
  'datadirectory' => '/srv/http/cloud.example.com/data',
  'dbtype' => 'pgsql',
  'version' => '24.0.11.1',
  'overwrite.cli.url' => 'https://cloud.example.com',
  'dbname' => 'nextcloud',
  'dbhost' => 'localhost',
  'dbport' => '',
  'dbtableprefix' => '',
  'dbuser' => 'nextcloud',
  'dbpassword' => 'SECRET',
  'installed' => true,
  'instanceid' => 'SECRET',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'theme' => 'wechangetheme',
  'social_login_auto_redirect' => 'True',
  'default_language' => 'de',
  'default_locale' => 'de',
  'allow_user_to_change_display_name' => 'False',
  'allow_local_remote_servers' => 'True',
  'overwriteprotocol' => 'https',
  'wechange_plattform_root' => 'https://example.com',
  'wechange_nc_root' => 'https://cloud.example.com',
  'wechange_csp_domains' =>
  array (
    0 => 'example.com',
    1 => '*.example.com',
  ),
  'wechange_piwik_enabled' => 'False',
  'wechange_piwik_site_id' => '0',
  'wechange_nc_app_title' => 'WE-Cloud',
  'wechange_nc_company_name' => 'wechange eG',
  'wechange_nc_slogan' => 'die Cloud von wechange',
  'wechange_nc_primary_color' => 'rgb(216, 0, 67)',
  'wechange_firstrun_main_text' => '<p>Die WE-Cloud bietet deinem Team eine flexible Dateiablage mit umfangreichen Office-Funktionen, um gemeinsam an Dokumenten, Tabellen und Präsentationen zu arbeiten. Sie basiert auf Nextcloud und OnlyOffice. So stehen dir zwei Apps zur Verfügung, mit denen du die Cloud auch unterwegs nutzen und Dateien mit deinem Desktop-Computer synchronisieren kannst. </p> <p>Gruppen und Projekten stehen jeweils maximal 1 GB zur Verfügung. Jede*r Nutzer*in hat davon unabhängig 100 MB privaten Speicherplatz - im Startfenster der WE-Cloud könnt ihr euch einfach einen privaten Ordner anlegen. Benötigt ihr mehr Speicherplatz, schreibt an
    <a href="mailto:support@example.com" target="_blank" rel="noreferrer noopener">support@example.com</a>.
</p> <p>Falls ihr eure Dateien mit der Windows-Desktop-App von Nextcloud synchronisieren wollt, bitte beachtet, dass Ordnernamen mit Sonderzeichen (&lt; &gt; : " / \\ | ? *) nicht erkannt werden und zuerst bereinigt werden müssen. </p>
',
  'wechange_firstrun_warning_enabled' => 'True',
  'wechange_firstrun_warning_header' => 'Achtung!',
  'wechange_firstrun_warning_text' => 'Wenn ihr gemeinsam mit OnlyOffice an einer Datei arbeitet, könnt ihr diese jederzeit manuell speichern (Shortcut strg+s), um die Datei zu synchronisieren - so könnt ihr auch später auf vorherige Versionen der Datei zurückgreifen. Ungeachtet dessen speichert OnlyOffice die Dateien regelmäßig automatisch ab.',
  'wechange_firstrun_learn_more_label' => 'Mehr erfahren',
  'loglevel' => 0,
  'maintenance' => false,
  'default_phone_region' => 'DE',
  'mail_smtpmode' => 'smtp',
  'mail_smtphost' => 'mailsdisabled',
  'mail_sendmailmode' => 'smtp',
  'mail_smtpport' => '1',
  'app_install_overwrite' =>
  array (
    0 => 'fulltextsearch_elasticsearch',
    1 => 'onlyoffice',
  ),
  'wechange_firstrun_learn_more_url' => '',
);

The output of your Apache/nginx/system log in /var/log/____:
No logs available

Output errors in nextcloud.log in /var/www/ or as admin user in top right menu, filtering for errors. Use a pastebin service if necessary.

No useful information available

LInk to GitHub.com: nextcloud/fulltextsearch_elasticsearch: Issue 249