By combining vector databases with pre-trained large language models, you can deliver unprecedented user experiences, merging the capabilities of LLMs with the context of your specific data.

In this tutorial, I introduce Weaviate, an open-source vector database, with the thenlper/gte-base embedding model from Alibaba, through Hugging Face's transformers library.

The example project for this blog post demonstrates how to embed texts into vectors, store them in Weaviate, and perform semantic search to find the most contextually similar documents to the input query. All of the code for this blog post can be found on GitHub at the companion code repository.

Setup and Preparing the Embedding Model

Before getting into the application code, we need to have a working Weaviate server running.

Setting Up Weaviate Locally with Docker Compose

Running Weaviate locally for development can be streamlined using Docker Compose.

The following section explains how to utilize Docker Compose to spin up a Weaviate instance, configuring it according to our needs, and ensuring that data is persisted across restarts by mounting a local directory.

Docker Compose Configuration

We use the following docker-compose.yml, taken directly from the Weaviate Docker Compose docs to define our service. I suggest copying this file into the root of a new directory for this project:

version: "3.4"
services:
  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - "8080"
      - --scheme
      - http
    image: semitechnologies/weaviate:1.21.2
    ports:
      - 8080:8080
    volumes:
      - ./data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
      DEFAULT_VECTORIZER_MODULE: "none"
      # While these are the default enabled modules from the Weaviate docs,
      # we won't be using these but instead our custom embedding model, GTE-base
      ENABLE_MODULES: "text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,ref2vec-centroid,reranker-cohere,qna-openai"
      CLUSTER_HOSTNAME: "node1"

Running Weaviate with Docker Compose

Once the docker-compose.yml file is set up, navigate to your directory containing this file and run the following command to start the Weaviate server:

docker-compose up weaviate

This command pulls the specified Weaviate Docker image (if not already local), creates the container, and starts it with the specified settings. Your Weaviate instance should now be accessible at http://localhost:8080.

With our Weaviate server running, we can move onto the Python code of our application.

embedding_util.py

Let's continue with understanding how we are encapsulating our embedding model, the GTE base text embedding model, in our embedding_util.py python module.

Importing Necessary Libraries

We need to import the necessary libraries and modules first:

transformers: To use pre-trained models.
torch and torch.nn.functional: For tensor operations and functional API.
os: To manipulate the Python runtime environment.
warnings: To manage warnings during runtime.

The companion code repository for this blog post includes a requirements.txt file, for installing these Python dependencies.

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
from torch import Tensor
import os
import warnings

Handling Warnings and Parallelism

To avoid unnecessary warnings from the transformers library and manage parallelism surrounding usage of our tokenizer, warnings of category ResourceWarning are ignored, and tokenizers parallelism is disabled for simplicity. Our application is single-threaded, so we will only have one thread calling the tokenizer.

# The transformers library internally is creating this warning, but does not
# impact our app. Safe to ignore.
warnings.filterwarnings(action='ignore', category=ResourceWarning)


# We won't have competing threads trying to use our tokenizer in this example app
os.environ["TOKENIZERS_PARALLELISM"] = "false"

Initializing Tokenizer and Model

With our imports out of the way, let's create our tokenizer and model instances:

tokenizer = AutoTokenizer.from_pretrained('thenlper/gte-base')
model = AutoModel.from_pretrained('thenlper/gte-base')

These lines initialize the tokenizer and model using the thenlper/gte-base pre-trained model from Alibaba.

Defining Utility Functions

I've defined two functions that implement the functionality of embedding_util.py:

average_pool: A function to pool the last hidden states of the model, using masking and averaging.
generate_embeddings: A function that tokenizes the input text, generates embeddings using the pre-trained model, and normalizes them.

def average_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(
        ~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]


def generate_embeddings(text):
    inputs = tokenizer(text, return_tensors='pt',
                       max_length=512, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)

    attention_mask = inputs['attention_mask']
    embeddings = average_pool(outputs.last_hidden_state, attention_mask)

    # (Optionally) normalize embeddings
    embeddings = F.normalize(embeddings, p=2, dim=1)

    return embeddings.numpy().tolist()[0]

While this code may look complicated, just understand that you pass a single text string to generate_embeddings, and you get a list of floats back - our vector embedding created by the GTE-base model. If you want to dig deeper, these functions are "heavily inspired" by the GTE-base model card.

Interacting with Weaviate

With our embedding utility module implemented, it's time to move onto the app.py module, the core of our demo project.

app.py

app.py imports

We'll import the weaviate library to create our Weaviate client instance, the json module for creating printable strings from our Python dicts, and our generate_embeddings function for creating embeddings to pass to Weaviate (later):

import weaviate
import json
from embedding_util import generate_embeddings

Setting Up Weaviate Client

A Weaviate client is initialized by providing the endpoint URL, http://localhost:8080 for our local Weaviate server. This client will allow us to interact with Weaviate, perform CRUD operations on data objects, and query the database.

client = weaviate.Client(url="http://localhost:8080")

Health Check

A simple health check ensures that Weaviate is ready and operational. This line simply illustrates for you how to verify the readiness of the Weaviate server:

print('is_ready:', client.is_ready())

Creating a Schema

A schema is defined, creating a custom class named "DocumentSearch". This specific name doesn't matter, but acts as an identifier for Weaviate, as you'll see how we reference it later. The vectorizer is set to "none" since the vectorization is done externally using our embedding model.

class_obj = {"class": "DocumentSearch", "vectorizer": "none"}
client.schema.create_class(class_obj)

Adding Data to Weaviate

A batch is configured to add multiple data objects to Weaviate simultaneously, setting the batch size equal to the length of the documents list for this tutorial:

# Test source documents
documents = [
    "A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures.",
    "The mathematician found solace in numbers, deciphering the hidden patterns of the universe.",
    "The robot, with its intricate circuitry and precise movements, assembles the devices swiftly.",
    "The chef, with a sprinkle of spices and a dash of love, creates culinary masterpieces.",
    "The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past.",
    "The detective, with keen observation and logical reasoning, unravels the intricate web of clues.",
    "The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea.",
    "In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night.",
    "The dancer, with graceful moves and expressive gestures, tells a story without uttering a word.",
    "In the quantum realm, particles flicker in and out of existence, dancing to the tunes of probability."]

client.batch.configure(batch_size=len(documents))

In the batch process, for each document:

Embeddings are generated using generate_embeddings(doc).
An object with the original text and its corresponding embedding vector is added to Weaviate.

with client.batch as batch:
    for i, doc in enumerate(documents):
        properties = {"source_text": doc}
        vector = generate_embeddings(doc)
        batch.add_data_object(properties, "DocumentSearch", vector=vector)

Querying Weaviate

A query is embedded using the same model to ensure semantic compatibility.

query = "Give me some content about the ocean"
query_vector = generate_embeddings(query)

When a query is performed against Weaviate:

We retrieve "DocumentSearch" objects with "source_text" as the selected property.
with_near_vector specifies the vector and a minimum certainty for filtering results.
with_limit(2) restricts the result to the two most similar documents.
with_additional(['certainty', 'distance']) includes additional information in the results, the level of certainty and the cosine distance (cosine similarity can be calculated simply as 1 - cosine distance).

result = client.query.get("DocumentSearch", ["source_text"]).with_near_vector({
    "vector": query_vector,
    "certainty": 0.7
}).with_limit(2).with_additional(['certainty', 'distance']).do()

Finally, the result is printed in a pretty JSON format using the json module, presenting the retrieved documents and additional information.

print(json.dumps(result, indent=4))

Your output should look something like this:

{
  "data": {
    "Get": {
      "DocumentSearch": [
        {
          "_additional": {
            "certainty": 0.9004524648189545,
            "distance": 0.19909507
          },
          "source_text": "The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea."
        },
        {
          "_additional": {
            "certainty": 0.8804855942726135,
            "distance": 0.23902881
          },
          "source_text": "The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past."
        }
      ]
    }
  }
}

Conclusion

In this tutorial, we walked through how to use a custom embedding model (thenlper/gte-base from Alibaba) with Weaviate to perform semantic search on text data.

The combination of pre-trained language models and vector databases unlocks potent capabilities in developing intelligent, language-understanding applications. From building a semantic search engine to developing knowledge graphs, the synergy between embedding models and Weaviate opens up possibilities that were impossible before.

Here's the link again to the companion code repository for this blog post, available on GitHub.

Questions or comments? Feel free to contact me or connect on social media!

How to use Weaviate to store and query vector embeddings

Table of contents