You have been redirected from an outdated version of the article. Below is the content available on this topic. To view the old article click here.

Fulltext indexing

If the Text Search module (Fulltext Indexing and Search) is enabled, all files uploaded to Structr are indexed automatically using the fulltext indexing engine Apache Tika (https://tika.apache.org/). If Tesseract OCR (https://github.com/tesseract-ocr) is installed, Structr even indexes the textual content extracted from images.

Language identification

Structr tries to identify the actual language of the indexed document and uses a language-dependent list of stop words that are ignored when indexing so that only the important content of the document is indexed.

Example

Files can be retrieved by querying the indexedWords field like in the following example:

GET http://localhost:8082/structr/rest/files/ui?indexedWords=example?loose=1

The result will look roughly like this:

{
    "query_time": "0.001493350",
    "result_count": 1,
    "result": [
        {
            "id": "d7ac4f78e25141f199d3e39eb7ae3676",
            "type": "File",
            "name": "test.txt",
            "contentType": null,
            "size": 18,
            "url": null,
            "owner": {
                "id": "f02e59a47dc9492da3e6cb7fb6b3ac25",
                "type": "User",
                "name": "admin",
                "isUser": true
            },
            "path": "/test.txt",
            "isFile": true,
            "visibleToPublicUsers": false,
            "visibleToAuthenticatedUsers": false
        }
    ],
    "serialization_time": "0.000328395"
}

Search Context

Structr provides a method to retrieve the context of a fulltext search hit, i.e. the paragraph or text block that contains the match. The search context can be retrieved using the following REST call, assuming the d7ac4f78e25141f199d3e39eb7ae3676 is the ID of one of the files that was returned using the above search query.

POST files/d7ac4f78e25141f199d3e39eb7ae3676/getSearchContext { searchTerm: "test", contextLength: 10 }'

The result of this call will look like this:

{
    "result_count": 1,
    "result": {
        "context": [
            "Dies ist ein Test"
        ]
    },
    "serialization_time": "0.000040438"
}

Search results for "Fulltext indexing"

Actions Section

Name The name under which the folder is to be mounted into Structr’s virtual file system.
Mount Target The full path to the directory on Structr’s host operating system.
Do Fulltext Indexing If the files should be full text indexed.
Scan Interval(s) The rate at which the mounted directory should be refreshed.
Mount Target Folder Type The Schema Type under which Folders in the directory are to be mounted.
Mount Target File Type The Schema Type under which Files in the directory are to be mounted.
Enabled Checksums List of checksum types which are automatically computed on file creation.
Watch Folder Contents Bidirectional synchronization of the files in the mounted directory.