Indexing shared folders: for each user individually?

I have 119 users, who all use one very large shared folder. Does elasticsearch have to index the same folder 119 times, for each of the users, or is there a shared index for shared folders? Asking because when running the initial index, it looked like the folder is indexed for each user individually.

Yes, all files of your users will be listed as indexeable, but if the current version of a file is already indexed in elasticsearch, it will not be indexed again.

I means that the first user that have access to a shared folder will be used to index the files from the folder, then when another user have access to the same shared folder, files will be detected but not indexed a second time.

Hi,
I’ve got the situation vasyugan described.
fulltextsearch+elasticsearch indexes group shared files (over 14Tb) for each user all over again.
OS: Ubuntu 18.04
Elasticsearch 6.8.15 (also tried 6.0.1, same result)
Mysql 8
Php 7.4
NC: 21.0.1
Full text search 21.0.1
Full text search - Elasticsearch Platform 21.0.0
Full text search - Files 21.0.1

When I run occ fulltextsearch:index it just indexes same files for each user. One fullscan for one user takes more that two days :frowning:

/var/www/nextcloud/occ fulltextsearch:test                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
.Testing your current setup:                                                                                                                                                                                                                                                      
Creating mocked content provider. ok                                                                                                                                                                                                                                              
Testing mocked provider: get indexable documents. (2 items) ok                                                                                                                                                                                                                    
Loading search platform. (Elasticsearch) ok                                                                                                                                                                                                                                       
Testing search platform. ok                                                                                                                                                                                                                                                       
Locking process ok                                                                                                                                                                                                                                                                
Removing test. ok                                                                                                                                                                                                                                                                 
Pausing 3 seconds 1 2 3 ok                                                                                                                                                                                                                                                        
Initializing index mapping. ok                                                                                                                                                                                                                                                    
Indexing generated documents. ok                                                                                                                                                                                                                                                  
Pausing 3 seconds 1 2 3 ok                                                                                                                                                                                                                                                        
Retreiving content from a big index (license). (size: 32386) ok
Comparing document with source. ok
Searching basic keywords:
 - 'test' (result: 1, expected: ["simple"]) ok
 - 'document is a simple test' (result: 2, expected: ["simple","license"]) ok
 - '"document is a test"' (result: 0, expected: []) ok
 - '"document is a simple test"' (result: 1, expected: ["simple"]) ok
 - 'document is a simple -test' (result: 1, expected: ["license"]) ok
 - 'document is a simple +test' (result: 1, expected: ["simple"]) ok
 - '-document is a simple test' (result: 0, expected: []) ok
 - 'document is a simple +test +testing' (result: 1, expected: ["simple"]) ok
 - 'document is a simple +test -testing' (result: 0, expected: []) ok
 - 'document is a +simple -test -testing' (result: 0, expected: []) ok
 - '+document is a simple -test -testing' (result: 1, expected: ["license"]) ok
 - 'document is a +simple -license +testing' (result: 1, expected: ["simple"]) ok
Updating documents access. ok
Pausing 3 seconds ^[[A1 2 3 ok
Searching with group access rights:
 - 'license' - [] -  (result: 0, expected: []) ok
 - 'license' - ["group_1"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_1","group_2"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_3","group_2"] -  (result: 1, expected: ["license"]) ok
 - 'license' - ["group_3"] -  (result: 0, expected: []) ok
Searching with share rights:
 - 'license' - notuser -  (result: 0, expected: []) ok
 - 'license' - user2 -  (result: 1, expected: ["license"]) ok
 - 'license' - user3 -  (result: 1, expected: ["license"]) ok
Removing test. ok
Unlocking process ok
php7.4 /var/www/nextcloud/occ fulltextsearch:check
Full text search 21.0.1

- Search Platform:
Elasticsearch 21.0.0 (Selected)
{
    "elastic_host": [
        "http://elastic:********@127.0.0.1:9200"
    ],
    "elastic_index": "nextcloud",
    "fields_limit": "10000",
    "es_ver_below66": "0",
    "analyzer_tokenizer": "standard"
}

- Content Providers:
Files 21.0.1
{
    "files_local": "1",
    "files_external": "0",
    "files_group_folders": "1",
    "files_encrypted": "0",
    "files_federated": "0",
    "files_size": "1024",
    "files_pdf": "1",
    "files_office": "1",
    "files_image": "0",
    "files_audio": "0"
}

Any piece of advice?
Thanks!

I’m noticing the same thing now too. Would be good to have a way to index shared folders once so it doesn’t repeat per user. Even if the index isn’t rewritten each time, at lest it would reduce the time to complete the indexing run.

Meanwhile, here is a big caveat.

I’ve moved to the dockerised ES8. While trying to rebuild the index I found that only one user of a particular groupfolder (the first one alphabetically) has the access to index, so the successful search results are shown only to him/her, but not others!

same here. thanks for sharing!

don’t know how to overcome that, I’ve put more detailed description of the incident at Again about reindexing · Issue #269 · nextcloud/fulltextsearch_elasticsearch · GitHub

One of the thoughts, which came in my mind may be connected with the renaming of group folders.

What I see in the log:

  "hits" : {
    "total" : {
      "value" : 618,
      "relation" : "eq"
    },
    "max_score" : 6.726201,
    "hits" : [
      {
        "_index" : "elasticsearch",
        "_id" : "files:406102",
        "_score" : 6.726201,
        "_ignored" : [
          "content.keyword"
        ],
        "_source" : {
          "owner" : "user0",
          "groups" : [
            "GroupA"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "Title of some document",
          "users" : [ ],
          "content" : "Content of the document"
          "tags" : [ ],
          "attachment" : {
            "date" : "2023-04-03T09:35:00Z",
            "content_type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "author" : "nailyanag@gmail.com",
            "modifier" : "Vik Kyryliuk",
            "modified" : "2023-05-23T16:39:00Z",
            "language" : "uk",
            "content_length" : 13129
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
             "user0": "GroupA/Docs/Text1.docx"
             "user1": "GroupA/Docs/Text1.docx"
          },
          "hash" : "d658249fc4930add52064fdeea9d1ebe"
        }
      },


So, that group “GroupA” has been renamed later in Cyrillic. Will that affect?
eg,
$ ./occ groupfolders:list
gives

+-----------+-----------------------------+----------------------------------------------------------------+-----------+-----------+----------------------+-----------------------------+
| Folder Id | Name                        | Groups                                                         | Quota     | Size      | Advanced Permissions | Manage advanced permissions |
+-----------+-----------------------------+----------------------------------------------------------------+-----------+-----------+----------------------+-----------------------------+
| 1         | Група А                     | Група А (GroupA): read, write, share, delete                   | Unlimited | 9.9 GB    | Disabled

I have checked, how NC treats groups, and found that the groupID is not numeric (!!!):

$ ./occ group:info GroupA
  - groupID: GroupA
  - displayName: Група А
  - backends:
    - Database

I see that this is not OK.