LLM

Summary :

Traditionally what we had in natural language processing was the concept of encoding , which was the process of converting text into numbers which was in machine readable format.General encoding doesn't care about relationships.
Embedding was introduced later on , the vector representation of which was comparatively denser as well as captured the semantic meaning more than syntactic , For eg - apple and carrot are closer in embedding space than carrot and hammer.
The embedding can be further classified into 2 types -Static and Contextual . In static , we had one vector for a word ( eg - word2Vec and glove) .But this technique had its flaws as apple would have just 1 vector but apple is Both a fruit as well as a tech company.
So context matters when looking for the closest vector for our query vector.
This is where contextual embedding comes into picture like BERT and GPT. Then came the transformers which changed everything. Unlike Word2Vec , the embedding of a word isn't done in isolation ,Instead whole sentence is taken into consideration wherein at granular level , word is taken and given a contextualised vector , instead of taking account of every word , we care for the most

relevant word.
To determine this we have the concept of self - attention. Each word vector is then taken and averaged to give a vector representing the whole sentence.Each word is turned into Query (Q) , Key (K) , Value (V) , for every word’s Q is compared to other word’s K .The dot product of k and q is taken and then passed into a softmax function to keep it in the range of 0 to 1.
If the attention score is closer to 1 , it means that a word’s k has more similarity to a word’s Q than the word’s k whose attention is more closer to 0. A weighted average of the Value vectors is calculated , and this becomes the new, context-aware vector for that word. In this way we do the processing for all words and hence a sentence-level embedding is obtained.
For ANN based search , we have FAISS method , which works in batches.Once the query vector is given instead of comparing it to lets say 100k data points , we divide the 100k datapoints to 1k clusters and each cluster is given a form of identification and only the centroid of that cluster is compared with

the query vector , once the most similar clusters are detected , only those clusters are then fully

explored (here each cluster shall have 1k points each) .
This reduces the time complexity drastically from O(n) operations where n is 100k. Faiss is Vector based search wherein an input query is given as an input and the most similar vector search is done than getting the exact match.
Even though this approach is good , but as stated earlier , this works in batches.For eg -

we give a query and then the work of searching is done (not instantly I.e in real time) rather than

that the results are given after some time.Vectraflow tries to change this by working on real

time.
Lets take an example wherein a person is giving hand gestures via video call and each video

frame is converted into the vector format using something like CLIP. If it would have been FAISS ,

our vector matching would have been done in batches.But what we want is the real time vector

matching wherein the system gets the frames say in like every 0.1 seconds which is then

converted into a vector.
Then we would have to determine which hand gesture is our frame closest to. So we compare the incoming vector (our frame) with stored vectors i.e vectors in a. Vector DB (which are gestures known to us).FAISS may be giving precise results , but vectraflow ensures faster matched results even though it may not give as precise results as FAISS initially.
Faiss doesn't have the concept of time in it , whereas vectraflow has a time frame within which it has to

find the matching gesture. In VectraFlow: the base stream could be holding vectors from incoming

tweets, video frames, etc., saved temporarily or continuously updated . In FAISS on the other

hand , the data is not changed often and I static for most of the time.VectraFlow also maintains

the event time , i.e. when the event took place.
But it isn’t necessary for the data to be protean in the case of vectraflow , data can either be of static vectors (Storing known examples) ) and also can process new live incoming vectors.So there are sort of these live vectors or input stream which may be changed over time with the influx of new vectors and arrive constantly and are compared with the stored vectors(static ) or base stream regularly which are stored and may be updated over time.
In systems like FAISS , if new vectors are added , the indexing has to be done again i.e if we are

doing clustering by k means , that means we need to make the clusters again , which takes

time.Indexing in vector DB can help you search fast , but to update it may take some time which

is not fruitful in a real time environment.
We would have to create the clusters again which would be time consuming , vectraflow uses live buﬀering , which has a window which keeps a track of most recent incoming vectors instead of keeping account of a million vectors , the buﬀer is time bound , so with Time it shrinks and eschews the old vectors and expands and consumes new ones. So even with the influx of new vectors we aren’t reindexing anything.
A stream in vectraflow means the constant influx of vectors i.e a live incoming feed of vector data , if we refer to previousimage example , the influx would be the image embeddings we are constantly receiving i.e frame of the webcam’s video capture. Each stream has - its own timestamp (time of event) , stream can be filtered joined as well as compared with other streams.Its not that vectraflow doesn't do reindexing , its that it does , but not as frequently as FAISS , it only reindexes if the base stream gets changed , and as said earlier that happens not often. It’s not completely true that VectraFlow doesn’t reindex at all.
In reality, it does ; but only for the base stream, and not frequently unless there are significant

changes. In vectraflow we use operators similar to the ones used in SQL , but instead of

comparing lets say strings or numbers , these operators compare the vectors with the intent to

find similar ones using cosine or dot product. V-topk() filters the incoming stream and returns the

top k vectors (from stored vectors) which are similar to the incoming vector. Iv-topk() is the

inverse of this i.e it returns the top k similar vectors from the incoming stream that are similar to

the selected base vector.
Vectraflow also exercises the option of using filters , which makes sure to

filter and find only those vectors that are similar to our query vector. V-filter () filters out the

stored vectors in the memory ( as stated before there are some vectors stored already) and iv-

filter() (filters vectors which are incoming and filters out even before processing them).

Vectraflow also leverages the power of joins wherein 2 or more vectors are combined if they are semantically similar ( i.e convey the similar meaning) and the ones which arrived close in same time together. V-join() joins the 2 or more streams .

Working :

Firstly , the base stream vectors are clustered.Benefit of clustering , we know that we

reduce the number of comparisons and only search inside the closest cluster.Inside those clusters

we use something called as ordered posting list OPLIST.

We do search on multiple vectors inside the pertinent cluster but we avoid searching it all to make it a faster process overall , vectraflow makes sure to search the points closest to the centroid first than the ones far way from it , thereby giving a sort of priority / ranking to the vectors , with the help of OPLIST , we get a pre-sorted list of vectors which are sorted on how closed they are , once a good match is found (we determine the best match on the basis of cosine similarity ) the search stops. A centroid of a cluster is not a real vector from the dataset but is the average of all vectors from Dataset , so we don’t return the centroid itself.

Performance Evaluation:

VectraFlow’s performance was evaluated using a prototype

implementation tested on 500,000 base vectors and 100,000 streaming inputs.
Compared to brute-force and HNSW, the Centroid OPList method achieved almost 10×

latency, a strong recall of around 0.95 was maintained .
So accuracy wasn’t compromised.Against HNSW, Centroid OPList was also faster and easier to maintain in streaming conditions, making it an optimal choice for real time conditions

Future Enhancements

• Instead of just comparing vectors with vectors , at the granular level , eventually every input shall be converted into vectors but in method like v-topk() , we have comparison with raw embedding. In Future enhancement operator like P-TopK() will work on LLM-generated prompts and their context i.e for an incoming prompt , find the most relevant previous prompt or answer.

• Model driven operators like M-Aggregate() and M-cluster() they leverage the power of LLMs with the existing vector concept introduced. M-Cluster() will cluster incoming data based on semantic meaning with the help of LLMs whereas M-Aggregate() summarises the input. So we are not just comparing vectors with each other but gaining insights on trends and label predictions.

• Agentic LLM workflow introduces the concept like a LLM constantly monitoring the stream of input data , makes decisions based on patterns and suggests an action.

Conclusion :

VectraFlow revolutionises vector search by introducing new operators and newer indexing method with Centroid OPList. Unlike generic systems like FAISS, it avoids redundant re-indexing and also is faster without abating the accuracy. Future plans include LLM-based operators and LLM based agentic workflows.Overall , VectraFlow puts the foundation for systems that can work on vector data in real time.

For the full paper link - https://vldb.org/cidrdb/papers/2025/p23-lu.pdf

VectraFlow