Building Askie, your friendly helper to explain you things in Layman terms


Yes, you heard it right.
We are going to build an AI assistant which will answer your all kind of queries in Layman terms.
You’ve got any text query, we’ve got you covered.
You’ve got any image to analyze, we’ve got you covered.
You’ve got any PDF to summarize, we’ve got you covered.
You’ve got any URL to extract data, we’ve got you covered. 😉
Motive behind the project
So the core reason behind this theme was our internal hackathon, where we as a team competed individually to build something unique on different themes. I picked this theme because It had a lot of new things that I could learn throughout the journey such as vector DB, text embeddings and semantic search etc.
What are we building
So we’ll be building a platform or we say an assistant which can take any kind of inputs and answer your queries on that and the most important thing, it will be having Context Memory, Yes you read it right, context awareness is the most important aspect of this assistant.
Tech Stack
Frontend:
React Typescript
Tailwind for stylings
Shadcn for UI Elements
Supabase for authentication
Backend:
Node JS runtime with ExpressJS and Typescript
Gemini Models for querying and vector embeddings
Supabase for DB
Firecrawl for scraping URLs
Tesseract JS for OCR
PDF Parse library for reading PDFs
Prerequisites
We won’t be convering the setup related things like Supabase project creation, enabling email auth or frontend/backend project setup. We’ll cover the code functionalities of the product.
User Journey
Above is the detailed workflow how an input is processed from client to backend to DB and then response going back to client.
Cool, lots of theory, lets go into the tech side of it. We’ll start with Authentication from frontend.
Since we’re already using Supabase for DB, we’ll be using the same for auth as well. It’s one of the simplest auth ever.
Create a re-usable supabase client inside lib folder, it’ll help us to avoid creating config everytime we need supabase instance.
import { createClient } from "@supabase/supabase-js";
const supabaseUrl =
import.meta.env.VITE_SUPABASE_URL || "https://your-url.supabase.co";
const supabaseAnonKey =
import.meta.env.VITE_SUPABASE_ANON_KEY || "your-anon-key";
export const supabase = createClient(supabaseUrl, supabaseAnonKey);
we’ll be using email sign-in, so we’ll create only 3 input fields: Name, Email and password to keep it simple. On receiving the user input, we grab supabase instance and call signup function.
import { supabase } from "../lib/supabaseClient";
const { data, error } = await supabase!.auth.signUp({
email,
password,
options: {
data: {
full_name: name,
},
},
});
In response we get two fields, data and error. if signup is successful, we get a data object with user info or we get error object with the error response. Depending on type of response, we inform user to take respective actions.
If Signup is successful, supabase saves this user info into Authentication table in your supabase project with a check whether the email entered is verified or not, after signup, Supabase sends your a verification link on email, until you verify your email, you can’t login (please use the same browser to verify).
Now lets see the login flow if you’ve verified your email.
import { supabase } from "../lib/supabaseClient";
const { error } = await supabase!.auth.signInWithPassword({
email,
password,
});
if (error) {
toast.error(error.message);
} else {
toast.success("Welcome back!");
}
Yes, that’s all. You don’t need to do anything else, Supabase handles the session management etc itself very beautifully.
Frontend UI
Frontend side is a simple react application built on top of React Vite with Tailwind stylings, we have a landing page and the main chat application where we show a whatsapp kind of UI to present the chats. Which is pretty simple, you can get the code in my repository.
Now, the Backend is where the magic happens.
Another Supabase config, here we use Supabase service role key in the instance, to have Admin authorization. Remember SUPABASE_SERVICE_ROLE_KEY
must only be used on backend side.
import { createClient } from "@supabase/supabase-js";
export const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
);
The Basic idea of flow is, we take user input, add a system prompt in it and pass it to our model. we’re processing 3 kind of inputs, so lets see the approach for all three one by one.
Text Input
Its the simplified format, we take user input pass it to our backend, backend calls the Gemini model with system prompt, gets the answer and returns back to client.
File Input
When we receive an image on backend, we need to parse the file to Tesseract.js
for performing OCR, it brings up the text data written on the image, now we can simply merge it in the main prompt and pass to the model.
If we’ve received a pdf file, we need to parse it with pdfParse library.
Remember we need to configure Multer to receive files on backend.
const files = req.files as Express.Multer.File[];
let extractedText = "";
const supportedTypes = [
"image/png",
"image/jpeg",
"image/jpg",
"application/pdf",
];
const validFiles = files.filter((file) =>
supportedTypes.includes(file.mimetype)
);
for (const file of validFiles) {
try {
if (file.mimetype === "application/pdf") {
const pdfData = await pdfParse(file.buffer);
extractedText += `\n [Content from attached PDF]: ${pdfData.text}`;
} else {
const {
data: { text },
} = await Tesseract.recognize(file.buffer, "eng");
extractedText += `\n[Text from attached Image]: ${text}`;
}
} catch (err) {
console.error("File processing error:", err);
extractedText += `\n[${file.originalname}]: Failed to extract text.`;
}
}
We need to save the files in our DB in the respective chat, so let’s create a utility function which can help us.
//it saves all files to supabase storage bucket and gives us the pulic URL,
//which we can save in chat message table.
export async function uploadFiles(
userId: string,
files: Express.Multer.File[]
): Promise<string[]> {
const uploadedUrls = await Promise.all(
files.map(async (file) => {
const fileExtension = file.mimetype === "application/pdf" ? "pdf" : "jpg";
const filePath = `chat_uploads/${userId}/${Date.now()}-file.${fileExtension}`;
const { data, error } = await supabase.storage
.from("chat-files")
.upload(filePath, file.buffer, {
contentType: file.mimetype,
upsert: false,
});
if (error) {
throw new Error(`File upload failed: ${error.message}`);
}
const { data: publicUrlData } = supabase.storage
.from("chat-files")
.getPublicUrl(filePath);
return publicUrlData.publicUrl;
})
);
return uploadedUrls;
}
URL Input
When we receive URL input, we need to scrape the URL data, for that we’ll be using Firecrawl which helps us in getting metadata of any webpage, optimized for LLMs.
//before install "@mendable/firecrawl-js" package
import FirecrawlApp, { ScrapeResponse } from "@mendable/firecrawl-js";
const scrapedData = await scrapeUrl(url);
if (scrapedData.error || scrapedData.warning) {
res.status(500).json("Failed to fetch URL, please try again.");
}
Semantic Search
Now when we’ve prepared our inputs in text format, we’re ready to pass it to our LLM model but we need context information as well. So we need to pass some of the previous queries from the chat to LLM as a memory, we pick those messages which are semantically close to our query along with some recent messages.
To get the semantically close queries, we’ll perform semantic search on this data, we need to convert our input text to vector embeddings, we'll be using Gemini gemini-embedding-001
model for generating it.
// utils/generateEmbedding.ts
import { GoogleGenerativeAI, TaskType } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
export async function generateEmbedding(text: string): Promise<number[]> {
const model = genAI.getGenerativeModel({ model: "gemini-embedding-001" });
const result = await model.embedContent({
content: { role: "user", parts: [{ text }] },
taskType: TaskType.RETRIEVAL_DOCUMENT,
title: text.slice(0, 100), // Use first 50 chars as title
});
return result.embedding.values;
}
Semantic Search
Vector Embedding
Here’s a great article on how Spotify uses Semantic Search if you are more curious about it.
Now we’ve generated the text embeddings on the input received, we need to run semantic search on this data and our existing data set from history of this chat. Below is an RPC of supabase to do this.
//Supabase RPC to run semantic search on vector embeddings:
create or replace function match_chat_messages(
query_embedding vector,
chat_id uuid,
user_id uuid,
match_count int default 5
)
returns table (
id uuid,
message text,
role text,
created_at timestamp,
similarity float
)
language sql stable
as $$
select
id,
message,
role,
created_at,
1 - (vector <=> query_embedding) as similarity
from chat_messages
where
chat_messages.chat_id = match_chat_messages.chat_id
and chat_messages.user_id = match_chat_messages.user_id
and created_at >= now() - interval '15 minutes'
order by query_embedding <=> vector
limit match_count;
$$;
How It Works?
We pass the newly generated Vector Embedding with some restrictions to keep the search on current chat only. In our chat_messages
table, we store all vector embeddings, on that existing data we run the similarity search using <=>
operator which is a Postgres pgvector's distance function and it compares the cosine
distance between vectors.
There are more types of pgvector distance functions:
Euclidean (L2) Distance (
<->
operator)Inner Product (
<#>
operator)L1 Distance
1 - (vector <=> query_embedding)
turns that distance into a similarity score (1 = identical, 0 = no similarity).
Also we’re restricting the time range to last 15 minutes, so it returns the vectors which closest and are created in last 15 minutes.
Now when we’ve got the most similar messages, we need to get most recent messages also just to be on the safe side of not missing the context, we’ll pick last 5 messages and combine both while handling duplicates.
//we call the supabase RPC we created for semantic search
const { data: similarMessages, error: contextFetchError } =
await supabase.rpc("match_chat_messages", {
query_embedding: vector,
user_id: user.id,
chat_id: chatId,
match_count: 5, // Top N similar results
});
//recent messages
const { data: recentMessages, error: recentError } = await supabase
.from("chat_messages")
.select("*")
.eq("chat_id", chatId)
.eq("user_id", user.id)
.eq("role", "user")
.order("created_at", { ascending: false })
.limit(5);
//filter duplicates:
const recentSet = new Set(
recentMessages?.map((m) => `${m.role}:${m.message}`) || []
);
const combinedContext = [
...(recentMessages || []).reverse(),
...(similarMessages || []).filter(
(m: any) =>
m.role === "user" && !recentSet.has(`${m.role}:${m.message}`)
),
];
const context = combinedContext
.map(
(msg: any) =>
`${msg.role}: ${msg.message} ${
msg.metadata
? `and metadata from file or scraped url: ${msg.metadata}`
: ""
}`
)
.join("\n");
Now we need to build the prompt to pass the LLM with query and some system inputs.
//System Instructions:
const systemInstruction = (username: string) => `
You're Askie, an AI buddy helping ${username}. Follow these principles:
- Assumt the end user is always a 5 years old kid, explain things in such a manner that a kid can understand.
- Greet warmly on greetings, as someone would do with a kid.
- Don't use Jargons, if necessary, explain them afterwards.
- Add stories and examples everywhere needed.
- If it's a simple factual question, answer it directly — no fluff, no intros or outros.
- Use emojis only if they help with clarity 🎨
- Be positive and encouraging 😊
- Never make things up. If unsure, just say "Sorry, I don't know."
- If a code snippet is provided, explain what it does in simple terms, focusing on the main functionality. Also provide examples if needed. And make sure you wrap the code snippets in tripple backticks, start with 3 backticks, mention programming language name put a \n then code snippet and end with 3 backticks. Please don't put too much comments,very minimal as needed. instead after the code snippet, explain it in text.
- Context input is provided to help you understand the conversation better, but use it only if it adds value to your response and always prioritize the main query, context should be an add-on knowledge, if they don't match, ignore context.
- If something related to Law, constitution is asked, provide the references from constitution and articles/sections as required.
`;
//prompt creation function
export function promptEngine(
query: string,
extractedText: string = "",
context: string = "",
scrapedData: string = "",
username: string = "Ajeet"
): string {
const userPrompt = `Help ${username} with this input. Understand what the input is — it could be a question, a code snippet, a document, a legal text, a URL summary, etc. Explain it clearly.`;
const infoParts = [];
if (extractedText) {
infoParts.push(`Here’s some extracted text from a file:\n${extractedText}`);
}
if (scrapedData) {
infoParts.push(`Here’s scraped data from a webpage:\n${scrapedData}`);
}
if (context) {
infoParts.push(`Here’s additional context from earlier:\n${context}`);
}
const finalInfo = infoParts.length ? `\n\n${infoParts.join("\n\n")}` : "";
return `${systemInstruction(
username
)}\n\n${userPrompt}\n\nUser Input:\n${query}${finalInfo}`;
}
Now we can call these utility functions in our main API and pass to LLM, yes that moment is here. 🫣
const prompt = promptEngine(
query, //actual text query
extractedText, //data read from image/pdf (optional)
context, //context
scrapedData?.markdown, //scraped data from URL (optional)
user.user_metadata.full_name //user's name
);
const result = await generateWithGemini(prompt);
Now we’ve the response, put proper error handling and save the input and result both in DB. 🎉
After returning from backend, we receive it on client side and we render it there.
UI Component
Additional Enhancements that we can do on client side is to format the assitant response that is coming, because it can return code, it can return semantic elements, so to properly render, we’ve created this component, which parses the response and creates a well structured component.
interface LLMResponseRendererProps {
response: string;
className?: string;
}
const LLMResponseRenderer: React.FC<LLMResponseRendererProps> = ({
response,
className = "",
}) => {
const parseResponse = (text: string) => {
const elements: JSX.Element[] = [];
let currentIndex = 0;
let elementKey = 0;
// Regex to match code blocks with optional language
const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
let match;
while ((match = codeBlockRegex.exec(text)) !== null) {
const beforeCode = text.slice(currentIndex, match.index);
// Process text before code block
if (beforeCode.trim()) {
elements.push(
<div key={elementKey++} className="prose">
{parseTextContent(beforeCode)}
</div>
);
}
// Add code block
const language = match[1] || "";
const code = match[2].trim();
elements.push(
<CodeBlock key={elementKey++} code={code} language={language} />
);
currentIndex = match.index + match[0].length;
}
// Process remaining text after last code block
const remainingText = text.slice(currentIndex);
if (remainingText.trim()) {
elements.push(
<div key={elementKey++} className="prose">
{parseTextContent(remainingText)}
</div>
);
}
return elements;
};
const parseTextContent = (text: string) => {
// Split by double newlines for paragraphs, but preserve single newlines within paragraphs
const paragraphs = text.split(/\n\s*\n/);
const elements: JSX.Element[] = [];
let elementKey = 0;
paragraphs.forEach((paragraph, paragraphIndex) => {
const lines = paragraph.split("\n");
lines.forEach((line) => {
const trimmedLine = line.trim();
if (trimmedLine === "") {
// Skip empty lines within paragraphs
return;
}
// Check for headings
if (trimmedLine.startsWith("### ")) {
elements.push(
<h3
key={elementKey++}
className="text-lg font-bold mt-4 mb-2 text-gray-800"
>
{parseTextWithInlineCode(trimmedLine.substring(4))}
</h3>
);
} else if (trimmedLine.startsWith("## ")) {
elements.push(
<h2
key={elementKey++}
className="text-xl font-bold mt-6 mb-3 text-gray-800"
>
{parseTextWithInlineCode(trimmedLine.substring(3))}
</h2>
);
} else if (trimmedLine.startsWith("# ")) {
elements.push(
<h1
key={elementKey++}
className="text-2xl font-bold mt-6 mb-4 text-gray-800"
>
{parseTextWithInlineCode(trimmedLine.substring(2))}
</h1>
);
} else if (trimmedLine.match(/^\d+\.\s/)) {
// Handle numbered lists
elements.push(
<p
key={elementKey++}
className="mb-2 text-gray-700 leading-relaxed ml-4"
>
{parseTextWithInlineCode(trimmedLine)}
</p>
);
} else {
// Regular text - preserve line breaks within paragraphs
elements.push(
<p
key={elementKey++}
className="mb-3 text-gray-700 leading-relaxed"
>
{parseTextWithInlineCode(line)}
</p>
);
}
});
// Add spacing between paragraphs (except for the last one)
if (paragraphIndex < paragraphs.length - 1) {
elements.push(<div key={elementKey++} className="mb-4" />);
}
});
return elements;
};
const parseTextWithInlineCode = (text: string) => {
// Handle inline code first, then bold text
const codeRegex = /(`[^`]+`)/g;
const parts = text.split(codeRegex);
return parts.map((part, index) => {
if (part.startsWith("`") && part.endsWith("`")) {
// Inline code
return (
<code
key={index}
className="px-1.5 py-0.5 bg-gray-100 text-red-600 rounded text-sm font-mono border"
>
{part.slice(1, -1)}
</code>
);
} else {
// Handle bold text in non-code parts
return parseBoldText(part, index);
}
});
};
const parseBoldText = (text: string, baseKey: number = 0) => {
const parts = text.split(/(\*\*.*?\*\*)/g);
return parts.map((part, index) => {
if (part.startsWith("**") && part.endsWith("**")) {
return (
<strong key={`${baseKey}-${index}`} className="font-semibold">
{part.slice(2, -2)}
</strong>
);
}
return part;
});
};
return (
<div className={`max-w-none ${className}`}>{parseResponse(response)}</div>
);
};
export default LLMResponseRenderer;
Code Blocks UI Component
import React, { type JSX } from "react";
import { Copy, Check } from "lucide-react";
interface CodeBlockProps {
code: string;
language: string;
}
const CodeBlock: React.FC<CodeBlockProps> = ({ code, language }) => {
const [copied, setCopied] = React.useState(false);
const handleCopy = async () => {
await navigator.clipboard.writeText(code);
setCopied(true);
setTimeout(() => setCopied(false), 2000);
};
const highlightCode = (code: string) => {
const lines = code.split("\n");
return lines.map((line, lineIndex) => {
const tokens = tokenizeLine(line);
return (
<div key={lineIndex} className="leading-relaxed">
{tokens.map((token, tokenIndex) => (
<span key={tokenIndex} className={getTokenClass(token.type)}>
{token.value}
</span>
))}
</div>
);
});
};
const tokenizeLine = (line: string) => {
const tokens: Array<{
type: string;
value: string;
start: number;
end: number;
}> = [];
// Define regex patterns in order of precedence
const patterns = [
{ type: "comment", regex: /\/\/.*$|\/\*[\s\S]*?\*\//g },
{ type: "string", regex: /(['"`])((?:(?!\1)[^\\]|\\.)*)(\1)/g },
{ type: "inlineCode", regex: /`[^`]+`/g },
{
type: "keyword",
regex:
/\b(const|let|var|function|class|interface|type|import|export|from|if|else|for|while|return|async|await|try|catch|finally|throw|new|this|super|extends|implements|public|private|protected|static|readonly)\b/g,
},
{ type: "number", regex: /\b\d+\.?\d*\b/g },
{ type: "operator", regex: /[+\-*/%=<>!&|^~?:;,.]/g },
{ type: "bracket", regex: /[(){}[\]]/g },
];
// Find all matches with their positions
const allMatches: Array<{
type: string;
value: string;
start: number;
end: number;
}> = [];
patterns.forEach((pattern) => {
const regex = new RegExp(pattern.regex.source, pattern.regex.flags);
let match;
while ((match = regex.exec(line)) !== null) {
allMatches.push({
type: pattern.type,
value: match[0],
start: match.index,
end: match.index + match[0].length,
});
}
});
// Sort matches by position and remove overlaps
allMatches.sort((a, b) => a.start - b.start);
// Remove overlapping matches (keep the first one)
const nonOverlappingMatches = [];
let lastEnd = 0;
for (const match of allMatches) {
if (match.start >= lastEnd) {
nonOverlappingMatches.push(match);
lastEnd = match.end;
}
}
// Build tokens from non-overlapping matches
let currentIndex = 0;
nonOverlappingMatches.forEach((match) => {
// Add text before match
if (match.start > currentIndex) {
const textValue = line.slice(currentIndex, match.start);
if (textValue) {
tokens.push({
type: "text",
value: textValue,
start: currentIndex,
end: match.start,
});
}
}
// Add the matched token
tokens.push(match);
currentIndex = match.end;
});
// Add remaining text
if (currentIndex < line.length) {
const remainingText = line.slice(currentIndex);
if (remainingText) {
tokens.push({
type: "text",
value: remainingText,
start: currentIndex,
end: line.length,
});
}
}
return tokens;
};
const getTokenClass = (type: string) => {
switch (type) {
case "comment":
return "text-green-400 italic";
case "string":
return "text-blue-400";
case "inlineCode":
return "text-yellow-300 bg-gray-800 px-1 rounded";
case "keyword":
return "text-purple-400 font-medium";
case "number":
return "text-orange-400";
case "operator":
return "text-gray-400";
case "bracket":
return "text-gray-300 font-medium";
default:
return "text-gray-200";
}
};
return (
<div className="my-4 rounded-lg border border-gray-300 bg-gray-900 overflow-hidden shadow-sm">
<div className="flex items-center justify-between px-4 py-2 bg-gray-800 border-b border-gray-700">
<span className="text-sm font-medium text-gray-300 capitalize">
{language || "code"}
</span>
<button
onClick={handleCopy}
className="flex items-center gap-1 px-2 py-1 text-sm text-gray-400 hover:text-white hover:bg-gray-700 rounded transition-colors"
>
{copied ? <Check size={14} /> : <Copy size={14} />}
{copied ? "Copied!" : "Copy"}
</button>
</div>
<pre className="p-4 overflow-x-auto bg-gray-900">
<code className="text-sm font-mono text-gray-200 whitespace-pre block">
{highlightCode(code)}
</code>
</pre>
</div>
);
};
Here’s a UI sneak peak what this custom component gives us after parsing LLM response.
Rest are the basic things like setting up the routes, APIs for chats CRUD, Message components and context setup for client application which you can bulid as per your UI choices.
Thank you for reading!!
Love you all <3000 ❤️
Future Enhancements
Introduce team workspace, where >1 people join a chat and can ask/read together to brainstorm ideas or solve problems.
Image response
Video Input/Output
Chat Sharing
Subscribe to my newsletter
Read articles from Ajeet Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ajeet Patel
Ajeet Patel
Developer by Choice, exploring the different horizons of software engineering.