Fuzzy Password Recovery: Using Similarity Tokens for a More Forgiving Password System


In today's digital landscape, password security is paramount. However, traditional systems that store only a one-way cryptographic hash of your password come with a drawback. If you forget your exact password, there's no built-in way to “fuzzily” recover your account by guessing a similar password. In this blog post, we’ll explore an innovative approach to address this issue by using similarity tokens and fuzzy hashing.
Motivation
A week ago, while writing this blog post, a friend of mine shared with me the problem, it goes like this.
“Hello Devs! How can a new password and an old password be compared for similarities during password updates? Remember the old password is hashed in the DB“
I asked the same question on the X app, but all the responses were not that great, so I embarked on research, and came up with this.
The Challenge with Traditional Password Hashing
Most modern applications use strong cryptographic algorithms such as bcrypt, Argon2, or PBKDF2 to hash passwords. The benefits are clear:
One-Way Functionality: A hash cannot be reversed to reveal the original password.
Security: Even if attackers obtain the hash, they cannot easily deduce the actual password.
However, this same property introduces a problem: once a password is hashed, you lose any information about its structure or patterns. This makes it impossible to compare two passwords for similarity after they have been hashed.
The Idea: Similarity Tokens
Imagine a system where, in addition to the secure hash, you also compute and store a similarity token derived from the plaintext password. This token would be created using a fuzzy hashing algorithm—such as simhash or TLSH that preserves some information about the password’s structure without exposing it.
How Does It Work?
At Password Setup:
Standard Hashing: You hash the password using a secure algorithm like bcrypt.
Compute a Similarity Token: Before hashing, compute a token that captures the structure of the plaintext password. This token can later be used to determine if a new password guess is “close enough” to the original.
During Recovery:
The user may not remember the exact password but might recall something similar.
The system computes the similarity token of the guess.
By comparing the new token with the stored one using a metric such as Hamming distance, the system can decide if the guess is similar enough to be acceptable for account recovery.
A Closer Look at Fuzzy Hashing
Fuzzy hashing differs from cryptographic hashing in one key aspect: it allows you to compare two inputs for similarity. For example, if two passwords differ by only a small change, their fuzzy hashes (or similarity tokens) will be similar, whereas traditional cryptographic hashes will be entirely different.
Simhash is one example of such an algorithm. It converts the input into a binary fingerprint where similar inputs produce fingerprints with only a small Hamming distance (i.e., a few differing bits).
Implementing the Concept in TypeScript
Below is an example of how you might implement this approach in TypeScript. In the demo, I use:
bcrypt for secure password hashing.
simhash for generating a similarity token.
As well as an express server, nothing serious tho.
Step 1: Installation
First, install the necessary packages:
npm install bcrypt simhash
Step 2: The Code
Here’s a detailed, annotated example:
import bcrypt from 'bcrypt';
// @ts-ignore - Simhash is not a recognized package, but it can be installed via npm.
import SimHash from 'simhash';
// Keep this secret and secure to prevent unauthorized access.
const SALT_ROUNDS = process.env.SALT_ROUNDS ? parseInt(process.env.SALT_ROUNDS) : 10;
const SECRET_KEY = process.env.SECRET_KEY;
// Compute a secure bcrypt hash of the password.
const computePasswordHash = async (password: string): Promise<string> => {
return bcrypt.hash(password, SALT_ROUNDS);
}
/**
* Compute for salted password
* This scrambles the password with a secret key to add extra security.
* Instead of appending the salt to the password, we interleave the characters of the password and salt.
* This method is not recommended for general use, but it demonstrates how to apply a custom salt.
*/
const applySalt = (password: string, salt: string): string => {
let saltedPassword = "";
const maxLength = Math.max(password.length, salt.length);
for (let i = 0; i < maxLength; i++) {
if (i < password.length) {
saltedPassword += password[i]; // Add character from password
}
if (i < salt.length) {
saltedPassword += salt[i]; // Add character from salt
}
}
return saltedPassword;
};
/**
* Compute a similarity token using simhash.
* Here, we "key" the password by appending a secret to add extra security.
*/
const computeSimilarityToken = (password: string): string => {
if (!SECRET_KEY) {
throw new Error("SECRET_KEY is required to compute similarity tokens.");
}
const saltedPassword = applySalt(password, SECRET_KEY);
const simhash = new SimHash();
return simhash.hash(saltedPassword);
}
/**
* Calculate the Hamming distance between two hexadecimal hash strings.
* This distance measures how many bits differ between the two tokens.
*/
const hammingDistance = (hash1: string, hash2: string): number => {
if (hash1.length !== hash2.length) {
throw new Error('Hashes must be of equal length');
}
let distance = 0;
for (let i = 0; i < hash1.length; i++) {
const digit1 = parseInt(hash1[i], 16);
const digit2 = parseInt(hash2[i], 16);
let xor = digit1 ^ digit2;
// Count the number of 1's in the XOR result.
while (xor) {
distance += xor & 1; // Add the least significant bit (LSB) to distance
xor >>= 1; // Right-shift the XOR value to process the next bit
}
}
return distance;
}
// Register a new password by computing both its secure hash and similarity token.
const registerNewPassword = (plaintextPassword: string) => {
const passwordHash = await computePasswordHash(plaintextPassword);
const similarityToken = computeSimilarityToken(plaintextPassword);
// Store both the `passwordHash` and `similarityToken` in your database.
return { passwordHash, similarityToken };
}
/**
* Check if a recovery password guess is similar enough to the original.
* @param storedToken - The similarity token stored from the original password.
* @param recoveryGuess - The password guess provided by the user.
* @param threshold - The maximum Hamming distance allowed.
* @returns true if the guess is similar enough.
*/
const isRecoveryGuessAcceptable = (storedToken: string, recoveryGuess: string, threshold: number = 10): boolean => {
const guessToken = computeSimilarityToken(recoveryGuess);
const distance = hammingDistance(storedToken, guessToken);
console.log('Hamming distance:', distance);
return distance < threshold;
}
/* ---------------- Example Flow ---------------- */
// Simulate a user registering with an initial password.
const exampleRecoveryFlow = async () => {
const originalPassword = 'MyComplexP@ssw0rd!';
const { passwordHash, similarityToken } = await registerNewPassword(originalPassword);
console.log('Stored bcrypt hash:', passwordHash);
console.log('Stored similarity token:', similarityToken);
// Later, when the user tries to recover their account:
const recoveryGuess = 'MyComplexP@ssw0rd1'; // A slight variation of the original
const isAcceptable = isRecoveryGuessAcceptable(similarityToken, recoveryGuess);
if (!isAcceptable) {
console.warn("Recovery guess is not similar enough. Additional verification is required.");
// Deny recovery or require further verification.
return;
}
console.log("Recovery guess is similar enough. Proceed with recovery.");
// Continue with a secure recovery process, such as allowing a password reset.
}
exampleRecoveryFlow().catch(console.error);
Code Walkthrough
Hashing & Token Generation:
TheregisterNewPassword
function generates both a secure bcrypt hash and a similarity token. The similarity token is computed by appending a secret key to the plaintext password, then running it through simhash.Hamming Distance:
ThehammingDistance
function computes the difference between two similarity tokens. A low Hamming distance means the passwords are very similar.Recovery Check:
TheisRecoveryGuessAcceptable
function uses the similarity token of a user's recovery guess and compares it with the stored token. If the difference (Hamming distance) is below a defined threshold, the guess is accepted as similar enough.
Considerations & Caveats
While this approach can be innovative, it also comes with important security trade-offs:
Security Risks:
Storing a similarity token even when derived using a secret key reveals some structural information about the password. An attacker with access to these tokens might narrow down possible password patterns.Threshold Tuning:
Choosing the right Hamming distance threshold is critical. If too lenient, it could allow insecure near-matches; if too strict, genuine users might be unfairly locked out.User Experience:
Traditional password recovery methods (email, SMS, or MFA-based resets) are well understood and secure. A fuzzy recovery mechanism must be thoroughly tested and communicated to users.Implementation Complexity:
Maintaining two representations of the password (secure hash and similarity token) increases system complexity and demands a rigorous security review.
Conclusion
The concept of using similarity tokens for fuzzy password recovery is both fascinating and challenging. It attempts to bridge the gap between strong security practices and user-friendly recovery mechanisms. While the idea is technically feasible with additional computational work (like using simhash and comparing Hamming distances), it comes with significant risks that must be carefully managed.
If you choose to implement this approach, ensure you perform thorough security assessments, fine-tune your similarity thresholds, and consider combining it with traditional recovery methods for a balanced solution.
Happy coding, always keep security at the forefront of your design decisions! Link to the source code here.
Subscribe to my newsletter
Read articles from mahdi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

mahdi
mahdi
JS & TS