Build a Search That Tolerates Typos: PostgreSQL Trigram Matching

When you type chit gpt and still get results for ChatGPT how’s that even possible ?
Hey all! Welcome to the third blog of my Behind the Search blog series.

Today, we’re going to build a system that does fuzzy/partial matching efficiently—and ranks the results too. This is considered as an Advanced feature in any search system where you get relevant results even when your spelling sucks 💀

Let’s take a sample dataset to understand fuzzy matching.

Artists Data:

Id	Name	Followers
1	Travis Scott	71 M
2	The Weekend	140 M
3	Ariana Grande	90 M
4	Blackpink	100 M
5	Ed Sheeran	110 M

Implementation

We’re going to use Trigram Similarity, a fuzzy matching technique in PostgreSQL that helps match slightly misspelled or partial words. As the name suggests a trigram is just a group of 3 consecutive characters.

Steps:

Each word is padded with 2 spaces at start and 1 at end.
It’s then sliced into chunks of 3 characters (trigrams).
The same thing is done for the search input.
Then, the system compares trigrams of the input and the database values.
A similarity score is calculated based on the overlap.

Similarity = (number of common trigrams) / (total unique trigrams)

We can apply this on multiple columns too but we need to apply individually and then combine the data in the query itself. For simplicity, we will apply only for Name column here.

Lets assume user searches for ‘Trevis’ so its trigrams will be generated after padding with spaces.

‘Trevis’ → ' T', ' Tr', 'Tre', 'rev', 'evi', 'vis', 'is '

Now for each data of Name, trigrams will be generated of the whole string

Name	Trigrams
Travis Scott	`' T'`, `' Tr'`, `'Tra'`, `'rav'`, `'avi'`, `'vis'`, `'is '`, `'s S'`, `' Sc'`, `Sco'`, `'cot'`, `'ott'`, `'tt '`
The Weekend	`' T'`, `' Th'`, `'The'`, `'he '`, `'e W'`, `' We'`, `'Wee'`, `'eek'`, `'eke'`, `ken'`, `'end'`, `'nd '`
Ariana Grande	`' A'`, `' Ar'`, `'Ari'`, `'ria'`, `'ian'`, `'ana'`, `'na '`, `'a G'`, `' Gr'`, `Gra'`, `'ran'`, `'and'`, `'nde'`, `'de '`
Blackpink	`' B'`, `' Bl'`, `'Bla'`, `'lac'`, `'ack'`, `'ckp'`, `'kpi'`, `'pin'`, `'ink'`, `'nk '`
Ed Sheeran	`' E'`, `' Ed'`, `'Ed '`, `'d S'`, `' Sh'`, `'She'`, `'hee'`, `'eer'`, `'era'`, `ran'`, `'an '`

Row 1 Similarity :

common = 4, unique = 10

So similarity = 4/10 = 0.4

Now for every row such score is calculated and based on the similarity score filtering and ranking is done. This is the internal implementation of the fuzzy matching.

Code

Create Table

 CREATE TABLE artist(
     id INT PRIMARY KEY,
     Name varchar,
     Followers varchar 
 );

Insert Data

 INSERT INTO artist (id,Name, Followers)
 VALUES 
 (1,'Travis Scott','71 M'),
 (2,'The Weekend','140 M'), 
 (3,'Ariana Grande','90 M'), 
 (4,'Blackpink','100 M'), 
 (5,'Ed Sheeran','110 M');

Enable Extension

 CREATE EXTENSION IF NOT EXISTS pg_trgm;

Query :

 SELECT *,
   similarity(Name, 'trevis Scott') AS score
 FROM artist
 WHERE Name % 'trevis Scott'
 ORDER BY score DESC
 LIMIT 20;

The % operator applies an filter in backend -

If similarity(Name, 'trevis Scott') >= pg_trgm.similarity_threshold then select it

The default value of pg_trgm.similarity_threshold is 0.3. But this can be easily override based on your application use-case.

Lowering it (e.g., to 0.2) will match more results, even if they’re less similar.
Increasing it (e.g., to 0.5) will return fewer but more precise matches.

Setting it too low may lead to irrelevant or noisy results.

SET pg_trgm.similarity_threshold = 0.2;

Optimisation

To optimise the search we can use a GIN index. GIN stands for Generalized Inverted Index, I’ve discussed about this in my first blog already, but in short it’s a inverted map like structure which performs search in O(logN) instead of O(N)!

But gin_trgm_ops specifically is for text/varchar columns only

CREATE INDEX trgm_idx ON search USING gin (Name gin_trgm_ops);

PostgreSQL uses this index automatically when running % or similarity() queries. No need to change the query.

Try this out, and you’ll realize how close you are to building a search experience like Google. 🚀

If this article was worth your time pls like and subscribe to my newsletter.

Thank you ! 💫

Why Misspellings Still Work: Fuzzy Search using PostgreSQL

Implementation

Code

Optimisation

Subscribe to my newsletter

Vedhas Naik

Vedhas Naik