Search in external files

HEllo a question, i have enabled in the settings that nextant should search in the external files too.
I run a new index with “occ nextant:index” - ni can see that it will try to make an index of the new files (Because now are 800 files more and the the new files take much longer (maybe because of the SMB share))
In the admin console of solr i can see that now are 1800 files are present in the index.
But if i search for some words inside the document nothing will be found (Online in the files of the internal storage will be searched)

What can i check?

Can you give me more details about the SMB configuration ? is it a global share (everyone can access it from your cloud) or just a private one (only one user) ?

Its is a global share (SMB/CIFS) with Username and password set in the share config. The shared foldername is hidden (with a leading $)
and it is available for 3 users
The share itself works fine.

are the files encrypted on the external storage ?

no the files are not encrypted.
I think the index is correct, i have enabled the searchsuggestions, and i am sure that i get now suggestions of words which only exists in the external files.

I’ve got almost the same problem. I use Google Drive as external storage. After indexing, the contents of files in the Google Drive cannot be searched. (The number of files being indexed is correct. No error is found during indexing.) Contents of local files can be searched, however.

@cheukyung
@Andreas_Steibl

There is a new tool in the last release of Nextant (1.0.1)

Can you please paste the result of this tool on a file in your external storage ?

do you mean this?

root@cloud:/var/www/html# sudo -u www-data php occ nextant:pick 30282
nextant_extracted -> true
id -> files_30282
nextant_path -> /Andreas_Steibl/files/2015_10_Protokoll.txt
nextant_owner -> __global
nextant_mtime -> 1463813868
nextant_source -> files
nextant_share -> Andreas_Steibl
nextant_deleted -> false
nextant_attr_stream_size -> 368
nextant_attr_x_parsed_by -> org.apache.tika.parser.DefaultParser, org.apache.tika.parser.txt.TXTParser
nextant_attr_stream_content_type -> application/octet-stream
nextant_attr_stream_name -> /tmp/oc_tmp_fHKvIE-.txt
nextant_attr_stream_source_info -> content
nextant_attr_content_encoding -> windows-1252
nextant_attr_resourcename -> oc_tmp_fHKvIE-.txt
nextant_attr_content_type -> text/plain; charset=windows-1252
_version_ -> 1554156277588819968
score -> 7.0989265

i tried thi
sudo -u www-data php occ nextant:pick --search Guter 30282
This says “fail” the word “Guter” exists in the document

und now i copied the same file to the cloud and tried it again

root@cloud:/var/www/html# sudo -u www-data php occ nextant:pick --search Guter 31222
nextant_extracted -> true
id -> files_31222
nextant_path -> /Andreas_Steibl/files/2015_10_Protokoll.txt
nextant_owner -> Andreas_Steibl
nextant_mtime -> 1482397702
nextant_source -> files
nextant_deleted -> false
nextant_attr_stream_size -> 368
nextant_attr_x_parsed_by -> org.apache.tika.parser.DefaultParser, org.apache.tika.parser.txt.TXTParser
nextant_attr_stream_content_type -> application/octet-stream
nextant_attr_stream_name -> /media/data/Andreas_Steibl/files/2015_10_Protokoll.txt
nextant_attr_stream_source_info -> content
nextant_attr_content_encoding -> windows-1252
nextant_attr_resourcename -> 2015_10_Protokoll.txt
nextant_attr_content_type -> text/plain; charset=windows-1252
_version_ -> 1554406684164620288
score -> 7.2274203

* Searching 'Guter' in that document: OK

I think it depend on the nextant_attr_stream_name … on the external storage it shows to the tmp dir, and the file didn’t exists … maybe only at the index process, but now not …

1 Like

I’ve uploaded a file fruits.txt (ID: 534) to Google Drive with contents including the word orange.


root@xxxxxxxx:/var/www/html/nextcloud# sudo -u www-data ./occ nextant:pick 534 --search orange
nextant_extracted -> true
id -> files_534
nextant_path -> /admin/files/goog/fruits.txt
nextant_owner -> __global
nextant_mtime -> 1482401022
nextant_source -> files
nextant_share -> admin
nextant_deleted -> false
nextant_attr_stream_size -> 47
nextant_attr_x_parsed_by -> org.apache.tika.parser.DefaultParser, org.apache.tika.parser.txt.TXTParser
nextant_attr_stream_content_type -> application/octet-stream
nextant_attr_stream_name -> /tmp/oc_tmp_y0Ribb-.txt
nextant_attr_stream_source_info -> content
nextant_attr_content_encoding -> windows-1252
nextant_attr_resourcename -> oc_tmp_y0Ribb-.txt
nextant_attr_content_type -> text/plain; charset=windows-1252
version -> 1554410205643538432
score -> 5.1397123

  • Searching ‘orange’ in that document: fail
1 Like

Any idea on the failure of searching, please?

Should be fixed next release

fine thanks :smiley:

next version of? nextant?
… and do i need a full new index?

next version of Nextant (1.0.2)

1 Like

When will it be out, so we can test it?

you can already test it:

replace those 2 files in the folder containing the app (nextant):

lib/Service/SolrAdminService.php
lib/Service/SolrService.php

(Right clic; Save as)

then, fix your solr schema:

 ./occ nextant:check --fix

(execute the command 2/3 times until every lines are green)

re-index your files:

 ./occ nextant:index --force --debug

and it should be working ! (i hope)

2 Likes

Yeah it seems to work :smiley:
I mean i find now many files from the external files

Applied the hotfix. It works. Now contents in Google Drive files can be searched. Thanks.