txtai is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

The 9.0 release adds first class support for sparse vector models (i.e. SPLADE), late interaction models (i.e. ColBERT), fixed dimensional encoding (i.e. MUVERA) and reranking pipelines ✨

The embeddings framework was overhauled to seamlessly support both sparse and dense vector models. Previously, sparse vector support was limited to keyword/term indexes. Now learned sparse retrieval models such as SPLADE are supported. These models can help improve the accuracy of retrieval/search operations, which also improves RAG and Agents.

Support for late interaction models, such as ColBERT, were also added to the embeddings framework. Unlike traditional vector models that pool outputs into single vector outputs, late interaction models produce multiple vectors. These models are paired with the MUVERA algorithm to transform multiple vectors into fixed dimensional single vectors for search.

LLMs are quickly converging to produce similar outputs for similar inputs and becoming standard commodities. The retrieval or context layer makes or breaks projects. This is known as putting the R in RAG!

Standard upgrade disclaimer below

While everything is backwards compatible, it's prudent to backup production indexes before upgrading and test before deploying.

Install dependencies

Install txtai and all dependencies.

pip install txtai[ann,vectors]

Sparse vector indexes

The first major change added with this release is learned sparse retrieval (aka sparse vector indexes) models. This effort was multi-faceted in that it required both changes to how vectors were generated as well as how they are stored.

txtai uses approximate nearest neighbor (ANN) search for it's vector search operations. The default library is Faiss. There is support for other libraries but in all cases the existing ANN backends only supported dense (i.e. NumPy) vectors.

There aren't many options out there for sparse ANN search that supports txtai requirements, so IVFSparse was introduced. IVFSparse is an Inverted file (IVF) index with flat vector file storage and sparse array support. There is also support for storing sparse vectors in Postgres via pgvector.

Let's see it in action.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(sparse=True, content=True)
embeddings.index(data)
embeddings.search("North America", 10)

[{'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.019873601198196412},
 {'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.018737798929214476}]

Late interaction models

Late interaction models encode data into multi-vector outputs. In other words, multiple input tokens map to multiple output vectors. Then at search time, the maximum similarity algorithm is used to find the best matches between the corpus and a query. This algorithm has achieved excellent results on retrieval benchmarks such as MTEB.

The downside of this approach is that it produces multiple vectors as opposed a single vector for each input. For example, if a text element tokenizes to many input tokens, there will be many output vectors vs a single one as with standard pooled vector approaches.

Starting with the 9.0 release, late interaction models are supported with embeddings instances. Late interaction vectors will be transformed into fixed dimensional vectors using the MUVERA algorithm. See below.

from txtai import Embeddings

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(path="colbert-ir/colbertv2.0", content=True)
embeddings.index(data)
embeddings.search("North America", 10)

[{'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.04216160625219345},
 {'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.029944246634840965},
 {'id': '3',
  'text': 'The National Park Service warns against sacrificing slower friends in a bear attack',
  'score': 0.015931561589241028}]

Reranking pipeline

Another major new component in this release is the Reranker pipeline. This pipeline takes an embeddings instance, a similarity instance and uses the similarity instance to rerank outputs. This is a key component of the MUVERA paper - using the standard vector index to retrieve candidates then reranking the outputs using the late interaction model.

from txtai import Embeddings
from txtai.pipeline import Reranker, Similarity

# Works with a list, dataset or generator
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Create an embeddings
embeddings = Embeddings(path="colbert-ir/colbertv2.0", content=True)
embeddings.index(data)

similarity = Similarity(path="colbert-ir/colbertv2.0", lateencode=True)

ranker = Reranker(embeddings, similarity)
ranker("North America")

[{'id': '1',
  'text': "Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg",
  'score': 0.3324427008628845},
 {'id': '0',
  'text': 'US tops 5 million confirmed virus cases',
  'score': 0.24423550069332123},
 {'id': '3',
  'text': 'The National Park Service warns against sacrificing slower friends in a bear attack',
  'score': 0.16353240609169006}]

Notice that while the outputs are the same, the scoring and order is different.

Let's try a more interesting example.

from txtai import Embeddings
from txtai.pipeline import Reranker, Similarity

# Create an embeddings
embeddings = Embeddings()
embeddings.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")

similarity = Similarity(path="colbert-ir/colbertv2.0", lateencode=True)

ranker = Reranker(embeddings, similarity)
ranker("Tell me about ChatGPT")

[{'id': 'ChatGPT',
  'text': 'ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other multimodal models to create human-like responses in text, speech, and images. It has access to features such as searching the web, using apps, and running programs. It is credited with accelerating the AI boom, an ongoing period of rapid investment in and public attention to the field of artificial intelligence (AI). Some observers have raised concern about the potential of ChatGPT and similar programs to displace human intelligence, enable plagiarism, or fuel misinformation.',
  'score': 0.6639302968978882},
 {'id': 'ChatGPT Search',
  'text': 'ChatGPT Search (originally SearchGPT) is a search engine developed by OpenAI. It combines traditional search engine features with generative pretrained transformers (GPT) to generate responses, including citations to external websites.',
  'score': 0.6477508544921875},
 {'id': 'ChatGPT in education',
  'text': 'The usage of ChatGPT in education has sparked considerable debate and exploration. ChatGPT is a chatbot based on large language models (LLMs) that was released by OpenAI in November 2022.',
  'score': 0.5918337106704712}]

Wrapping up

This article gave a quick overview of txtai 9.0. Updated documentation and more examples will be forthcoming. There is much to cover and much to build on!

See the following links for more information.

💡 What's new in txtai 9.0