Why Misspellings Still Work: Fuzzy Search using PostgreSQL


When you type chit gpt and still get results for ChatGPT how’s that even possible ?
Hey all! Welcome to the third blog of my Behind the Search blog series.
Today, we’re going to build a system that does fuzzy/partial matching efficiently—and ranks the results too. This is considered as an Advanced feature in any search system where you get relevant results even when your spelling sucks 💀
Let’s take a sample dataset to understand fuzzy matching.
Artists Data:
Id | Name | Followers |
1 | Travis Scott | 71 M |
2 | The Weekend | 140 M |
3 | Ariana Grande | 90 M |
4 | Blackpink | 100 M |
5 | Ed Sheeran | 110 M |
Implementation
We’re going to use Trigram Similarity, a fuzzy matching technique in PostgreSQL that helps match slightly misspelled or partial words. As the name suggests a trigram is just a group of 3 consecutive characters.
Steps:
Each word is padded with 2 spaces at start and 1 at end.
It’s then sliced into chunks of 3 characters (trigrams).
The same thing is done for the search input.
Then, the system compares trigrams of the input and the database values.
A similarity score is calculated based on the overlap.
Similarity = (number of common trigrams) / (total unique trigrams)
We can apply this on multiple columns too but we need to apply individually and then combine the data in the query itself. For simplicity, we will apply only for Name column here.
Lets assume user searches for ‘Trevis’ so its trigrams will be generated after padding with spaces.
‘Trevis’ → ' T'
, ' Tr'
, 'Tre'
, 'rev'
, 'evi'
, 'vis'
, 'is '
Now for each data of Name, trigrams will be generated of the whole string
Name | Trigrams |
Travis Scott | ' T' , ' Tr' , 'Tra' , 'rav' , 'avi' , 'vis' , 'is ' , 's S' , ' Sc' , Sco' , 'cot' , 'ott' , 'tt ' |
The Weekend | ' T' , ' Th' , 'The' , 'he ' , 'e W' , ' We' , 'Wee' , 'eek' , 'eke' , ken' , 'end' , 'nd ' |
Ariana Grande | ' A' , ' Ar' , 'Ari' , 'ria' , 'ian' , 'ana' , 'na ' , 'a G' , ' Gr' , Gra' , 'ran' , 'and' , 'nde' , 'de ' |
Blackpink | ' B' , ' Bl' , 'Bla' , 'lac' , 'ack' , 'ckp' , 'kpi' , 'pin' , 'ink' , 'nk ' |
Ed Sheeran | ' E' , ' Ed' , 'Ed ' , 'd S' , ' Sh' , 'She' , 'hee' , 'eer' , 'era' , ran' , 'an ' |
Row 1 Similarity :
common = 4, unique = 10
So similarity = 4/10 = 0.4
Now for every row such score is calculated and based on the similarity score filtering and ranking is done. This is the internal implementation of the fuzzy matching.
Code
Create Table
CREATE TABLE artist( id INT PRIMARY KEY, Name varchar, Followers varchar );
Insert Data
INSERT INTO artist (id,Name, Followers) VALUES (1,'Travis Scott','71 M'), (2,'The Weekend','140 M'), (3,'Ariana Grande','90 M'), (4,'Blackpink','100 M'), (5,'Ed Sheeran','110 M');
Enable Extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Query :
SELECT *, similarity(Name, 'trevis Scott') AS score FROM artist WHERE Name % 'trevis Scott' ORDER BY score DESC LIMIT 20;
The % operator applies an filter in backend -
If similarity(Name, 'trevis Scott') >= pg_trgm.similarity_threshold then select it
The default value of pg_trgm.similarity_threshold is 0.3. But this can be easily override based on your application use-case.
Lowering it (e.g., to 0.2) will match more results, even if they’re less similar.
Increasing it (e.g., to 0.5) will return fewer but more precise matches.
Setting it too low may lead to irrelevant or noisy results.
SET pg_trgm.similarity_threshold = 0.2;
Optimisation
To optimise the search we can use a GIN index. GIN stands for Generalized Inverted Index, I’ve discussed about this in my first blog already, but in short it’s a inverted map like structure which performs search in O(logN) instead of O(N)!
But gin_trgm_ops
specifically is for text
/varchar
columns only
CREATE INDEX trgm_idx ON search USING gin (Name gin_trgm_ops);
PostgreSQL uses this index automatically when running % or similarity() queries. No need to change the query.
Try this out, and you’ll realize how close you are to building a search experience like Google. 🚀
If this article was worth your time pls like and subscribe to my newsletter.
Thank you ! 💫
Subscribe to my newsletter
Read articles from Vedhas Naik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vedhas Naik
Vedhas Naik
I'm Vedhas Naik, a passionate full-stack developer with hands-on experience in the MERN stack, Blockchain technologies, and Data Structures & Algorithms. Currently, I'm interning at DSP Mutual Funds, where I'm working on backend systems using Spring Boot and Microservices architecture.