How Does Search Work?


Crawling/Ingestion
Indexing
The fundamental data structure involved in search is the inverted index.
Retrieval
Keyword Search
Vector Search
There are 2 common approaches for performing vector search in a vector store. Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF). These approaches allow for quick nearest neighbour search of vectors to identify those that are similar based on cosine similarity or
Re-ranking and Rank Fusion
Personalisation
Collaborative Filtering
Collaborative filtering works on the principle of finding what users like the target user prefer, and recommending those content
Content-based Filtering
Content-based filtering
Evaluation
Search is generally ranked along recall, which is whether the right content is returned in the top K results, and relevance, which has to do with the position of the right content in the top K results.
Recall can be measured via a simple ratio:
$$\frac{\sum \text{Results returned}}{\sum \text{Queries made}} \times 100\%$$
Relevance can be measured with Mean Average Rank (MAR), which looks at the position of relevant results in the returned top K results and calculates a weighted average based on position. This penalises irrelevant results
A better metric is Normalized Discounted Cumulative Gain (NDCG), which accounts for both recall and relevance, and penalises relevant results that are not ranked the top more heavily.
References
- How search works by Google
Subscribe to my newsletter
Read articles from Vishnu S directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
