Link Search Menu Expand Document
Start for Free

Semantic Search

This page discusses Stardog’s semantic search capabilities.

Page Contents
  1. Overview
  2. Service Setup
  3. Enabling Semantic Search
  4. Integration with SPARQL
    1. Searching over Variable Bindings
  5. Customization of Indexing
    1. Data Types to be Indexed
    2. Transactional updates
    3. Force-rebuilding of semantic search index

Overview

Stardog’s semantic search capability resembles full-text search in a sense that it provides a way to find similar natural-language texts. However, it differs from full text-search in several key aspects.

At the heart of semantic search is the notion of word embeddings: words or phrases are mapped into multi-dimensional vector space, and proximity of vectors in this space defines similarity of texts.

Moreover, this process is powered by language models which aim to capture semantic meaning of texts, in addition to grammatical structure. These models can be generic or domain-specific, so the search can be customized for domain-specific use cases.

From the integration standpoint, semantic search relies on an external service. Stardog system is not responsible for the lifecycle of the external semantic search system. The external service needs to be started separately and be accessible to Stardog. Stardog will update the contents of the vector database automatically as the contents of the Stardog database is updated. But if the external service is unavailable or inaccessible by Stardog then transactions will fail.

Currently an implementation based on txtai is supported in Beta state. Integration with other vector databases and semantic search systems will be provided in the future based on input and feedback from Stardog users.

Service Setup

txtai service runs as an external process and is not managed by Stardog.

The simplest way to run it is as a web application within Uvicorn web server.

Installation

pip3 install duckdb 'txtai[api]'

Running

CONFIG=/path/to/config.yml uvicorn txtai.api:app

By default, the server listens on localhost:8000. This can be overriden:

CONFIG=/path/to/config.yml uvicorn --host 127.0.0.1 --port 9000 txtai.api:app

Example config.yml file:

# Index file path
path: /tmp/index

# Allow indexing of documents
writable: True

# Enbeddings index
embeddings:
  path: sentence-transformers/all-MiniLM-L6-v2

See Configuration section of txtai documentation for more details.

From CLI

$ stardog-admin db create -o semantic.search.enabled=true -o semantic.search.api.endpoint=http://localhost:8000 -n myDb

Using Java

Semantic search can be enabled when creating database programmatically by setting semantic.search.enabled and semantic.search.api.endpoint options, for example:

    dbms.newDatabase("embeddingsTest")
        .set(EmbeddingsOptions.EMBEDDINGS_SEARCHABLE, true)
        .set(EmbeddingsOptions.EMBEDDINGS_API_ENDPOINT, "http://localhost:8000")
        .create();

Integration with SPARQL

Unlike full text search, only service form is supported for Semantic search. Example:

prefix fts: <tag:stardog:api:search:>

SELECT * WHERE {
  service fts:semanticMatch {
      [] fts:query 'city' ;
         fts:threshold 0.4 ;
         fts:result ?result ;
         fts:score ?score ;
         fts:limit 10 ;
  }
}

Semantic search service is identified by tag:stardog:api:search:emanticMatch URI and takes the following parameters:

Parameter Name Description
query string to query over a search index
result results received from the search index for a query
score calculated score between a query and a hit result
threshold threshold to include results with scores above or equal
limit limit of the size of the hit results

Searching over Variable Bindings

Similarly to full text match, fts:query parameter can be specified as a variable so that the input to semantic search is coming from other graph patterns in the query

prefix fts: <tag:stardog:api:search:>

SELECT * WHERE {
  # descriptions of places, bound to ?description variable,
  # will be used as inputs to semantic search
  ?place a :Place; :description ?description .

  service fts:textMatch {
      [] fts:query ?description ;
         fts:score ?score ;
         fts:result ?similarDescription ;
  }
}

Customization of Indexing

Data Types to be Indexed

Semantic search uses the same option as full text search to control which datatypes are indexed: search.index.datatypes.

From CLI

$ stardog-admin db create -o semantic.search.enabled=true -o semantic.search.api.endpoint=http://localhost:8000 -o search.index.datatypes=urn:String,urn:Date -n myDb

Using Java

// Create a database with semantic search index with specific data types
List<IRI> dataTypeList = Lists.newArrayList(
        Datatype.STRING.iri(),
        Datatype.DATE.iri(),
        Datatype.DATETIME.iri());

dbms.newDatabase("embeddingsTest")
    .set(EmbeddingsOptions.EMBEDDINGS_SEARCHABLE, true)
    .set(EmbeddingsOptions.EMBEDDINGS_API_ENDPOINT, "http://localhost:8000")
    .set(SearchOptions.INDEX_DATATYPES, dataTypeList)
    .create();

Transactional updates

Since semantic search integrates with external system, this system needs to be up to date with Stardog’s internal index.

By default, when semantic search is enabled, literals added during a transaction are indexed in a semantic search system as well. This can be disabled by setting semantic.search.reindex.tx to false. Doing so will cause semantic search to return incomplete results until the index is rebuilt explicitly.

From CLI

stardog-admin db create -o semantic.search.enabled=true -o semantic.search.api.endpoint=http://localhost:8000 -o semantic.search.reindex.tx=false -n myDb

Using Java

dbms.newDatabase("embeddingsTest")
    .set(EmbeddingsOptions.EMBEDDINGS_SEARCHABLE, true)
    .set(EmbeddingsOptions.EMBEDDINGS_API_ENDPOINT, "http://localhost:8000")
    .set(EmbeddingsOptions.EMBEDDINGS_REINDEX_TX, false)
    .create();

To rebuild semantic search index, an optimize command must be issued explicitly:

From CLI

stardog-admin db optimize myDb

Using Java

dbms.optimize("embeddingsTest");

Force-rebuilding of semantic search index

The optimize.search option forces semantic search index to be rebuilt by optimize command no matter what state the database is.

From CLI

stardog-admin metadata set -o optimize.search=true -- myDb
stardog-admin db optimize myDb

Using Java

dbms.optimize("embeddingsTest", Metadata.of(SearchOptions.OPTIMIZE, true));