Askie: Simplifying Complex Concepts for You

Yes, you heard it right.

We are going to build an AI assistant which will answer your all kind of queries in Layman terms.

You’ve got any text query, we’ve got you covered.

You’ve got any image to analyze, we’ve got you covered.

You’ve got any PDF to summarize, we’ve got you covered.

You’ve got any URL to extract data, we’ve got you covered. 😉

Motive behind the project

So the core reason behind this theme was our internal hackathon, where we as a team competed individually to build something unique on different themes. I picked this theme because It had a lot of new things that I could learn throughout the journey such as vector DB, text embeddings and semantic search etc.

What are we building

So we’ll be building a platform or we say an assistant which can take any kind of inputs and answer your queries on that and the most important thing, it will be having Context Memory, Yes you read it right, context awareness is the most important aspect of this assistant.

Tech Stack

Frontend:

React Typescript
Tailwind for stylings
Shadcn for UI Elements
Supabase for authentication

Backend:

Node JS runtime with ExpressJS and Typescript
Gemini Models for querying and vector embeddings
Supabase for DB
Firecrawl for scraping URLs
Tesseract JS for OCR
PDF Parse library for reading PDFs

Prerequisites

We won’t be convering the setup related things like Supabase project creation, enabling email auth or frontend/backend project setup. We’ll cover the code functionalities of the product.

User Journey

Above is the detailed workflow how an input is processed from client to backend to DB and then response going back to client.

Cool, lots of theory, lets go into the tech side of it. We’ll start with Authentication from frontend.

Since we’re already using Supabase for DB, we’ll be using the same for auth as well. It’s one of the simplest auth ever.

Create a re-usable supabase client inside lib folder, it’ll help us to avoid creating config everytime we need supabase instance.

import { createClient } from "@supabase/supabase-js";

const supabaseUrl =
  import.meta.env.VITE_SUPABASE_URL || "https://your-url.supabase.co";
const supabaseAnonKey =
  import.meta.env.VITE_SUPABASE_ANON_KEY || "your-anon-key";

export const supabase = createClient(supabaseUrl, supabaseAnonKey);

we’ll be using email sign-in, so we’ll create only 3 input fields: Name, Email and password to keep it simple. On receiving the user input, we grab supabase instance and call signup function.

import { supabase } from "../lib/supabaseClient";
const { data, error } = await supabase!.auth.signUp({
      email,
      password,
      options: {
        data: {
          full_name: name,
      },
   },
});

In response we get two fields, data and error. if signup is successful, we get a data object with user info or we get error object with the error response. Depending on type of response, we inform user to take respective actions.

If Signup is successful, supabase saves this user info into Authentication table in your supabase project with a check whether the email entered is verified or not, after signup, Supabase sends your a verification link on email, until you verify your email, you can’t login (please use the same browser to verify).

Now lets see the login flow if you’ve verified your email.

import { supabase } from "../lib/supabaseClient";
const { error } = await supabase!.auth.signInWithPassword({
 email,
 password,
});

if (error) {
  toast.error(error.message);
} else {
  toast.success("Welcome back!");
}

Yes, that’s all. You don’t need to do anything else, Supabase handles the session management etc itself very beautifully.

Frontend UI

Frontend side is a simple react application built on top of React Vite with Tailwind stylings, we have a landing page and the main chat application where we show a whatsapp kind of UI to present the chats. Which is pretty simple, you can get the code in my repository.

Now, the Backend is where the magic happens.

Another Supabase config, here we use Supabase service role key in the instance, to have Admin authorization. Remember SUPABASE_SERVICE_ROLE_KEY must only be used on backend side.

import { createClient } from "@supabase/supabase-js";

export const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

The Basic idea of flow is, we take user input, add a system prompt in it and pass it to our model. we’re processing 3 kind of inputs, so lets see the approach for all three one by one.

Text Input

Its the simplified format, we take user input pass it to our backend, backend calls the Gemini model with system prompt, gets the answer and returns back to client.

File Input

When we receive an image on backend, we need to parse the file to Tesseract.js for performing OCR, it brings up the text data written on the image, now we can simply merge it in the main prompt and pass to the model.

If we’ve received a pdf file, we need to parse it with pdfParse library.

Remember we need to configure Multer to receive files on backend.

const files = req.files as Express.Multer.File[];

    let extractedText = "";
    const supportedTypes = [
      "image/png",
      "image/jpeg",
      "image/jpg",
      "application/pdf",
    ];

    const validFiles = files.filter((file) =>
      supportedTypes.includes(file.mimetype)
    );

    for (const file of validFiles) {
      try {
        if (file.mimetype === "application/pdf") {
          const pdfData = await pdfParse(file.buffer);
          extractedText += `\n [Content from attached PDF]: ${pdfData.text}`;
        } else {
          const {
            data: { text },
          } = await Tesseract.recognize(file.buffer, "eng");
          extractedText += `\n[Text from attached Image]: ${text}`;
        }
      } catch (err) {
        console.error("File processing error:", err);
        extractedText += `\n[${file.originalname}]: Failed to extract text.`;
      }
    }

We need to save the files in our DB in the respective chat, so let’s create a utility function which can help us.

//it saves all files to supabase storage bucket and gives us the pulic URL, 
//which we can save in chat message table.
export async function uploadFiles(
  userId: string,
  files: Express.Multer.File[]
): Promise<string[]> {
  const uploadedUrls = await Promise.all(
    files.map(async (file) => {
      const fileExtension = file.mimetype === "application/pdf" ? "pdf" : "jpg";
      const filePath = `chat_uploads/${userId}/${Date.now()}-file.${fileExtension}`;

      const { data, error } = await supabase.storage
        .from("chat-files")
        .upload(filePath, file.buffer, {
          contentType: file.mimetype,
          upsert: false,
        });

      if (error) {
        throw new Error(`File upload failed: ${error.message}`);
      }

      const { data: publicUrlData } = supabase.storage
        .from("chat-files")
        .getPublicUrl(filePath);

      return publicUrlData.publicUrl;
    })
  );

  return uploadedUrls;
}

URL Input

When we receive URL input, we need to scrape the URL data, for that we’ll be using Firecrawl which helps us in getting metadata of any webpage, optimized for LLMs.

//before install "@mendable/firecrawl-js" package
import FirecrawlApp, { ScrapeResponse } from "@mendable/firecrawl-js";

const scrapedData = await scrapeUrl(url);
if (scrapedData.error || scrapedData.warning) {
   res.status(500).json("Failed to fetch URL, please try again.");
}

Semantic Search

Now when we’ve prepared our inputs in text format, we’re ready to pass it to our LLM model but we need context information as well. So we need to pass some of the previous queries from the chat to LLM as a memory, we pick those messages which are semantically close to our query along with some recent messages.

To get the semantically close queries, we’ll perform semantic search on this data, we need to convert our input text to vector embeddings, we'll be using Gemini gemini-embedding-001 model for generating it.

// utils/generateEmbedding.ts
import { GoogleGenerativeAI, TaskType } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);

export async function generateEmbedding(text: string): Promise<number[]> {
  const model = genAI.getGenerativeModel({ model: "gemini-embedding-001" });

  const result = await model.embedContent({
    content: { role: "user", parts: [{ text }] },
    taskType: TaskType.RETRIEVAL_DOCUMENT,
    title: text.slice(0, 100), // Use first 50 chars as title
  });

  return result.embedding.values;
}

Semantic Search

Semantic search is a way of finding information based on meaning rather than exact keywords. Instead of matching exact words, it compares the concept of our query with stored data to return the most relevant results. Think of it like finding a song when you can’t remember the exact lyrics — you hum the tune, and the system still finds it because it understands the meaning behind your input, not just the exact words.

Vector Embedding

A vector embedding is a numerical representation of text (or other data) in a high-dimensional space, where similar meanings are placed close together. By converting queries and stored data into embeddings, we can use math (like cosine similarity) to measure how related they are. Imagine putting every idea on a giant map where similar ideas are neighborhoods close to each other. Embeddings are the GPS coordinates of those ideas, so finding related ones is just a matter of measuring distance.

Here’s a great article on how Spotify uses Semantic Search if you are more curious about it.

Now we’ve generated the text embeddings on the input received, we need to run semantic search on this data and our existing data set from history of this chat. Below is an RPC of supabase to do this.

//Supabase RPC to run semantic search on vector embeddings:
create or replace function match_chat_messages(
  query_embedding vector,
  chat_id uuid,
  user_id uuid,
  match_count int default 5
)
returns table (
  id uuid,
  message text,
  role text,
  created_at timestamp,
  similarity float
)
language sql stable
as $$
  select
    id,
    message,
    role,
    created_at,
    1 - (vector <=> query_embedding) as similarity
  from chat_messages
  where
    chat_messages.chat_id = match_chat_messages.chat_id
    and chat_messages.user_id = match_chat_messages.user_id
    and created_at >= now() - interval '15 minutes'
  order by query_embedding <=> vector
  limit match_count;
$$;

How It Works?

We pass the newly generated Vector Embedding with some restrictions to keep the search on current chat only. In our chat_messages table, we store all vector embeddings, on that existing data we run the similarity search using <=> operator which is a Postgres pgvector's distance function and it compares the cosine distance between vectors.

There are more types of pgvector distance functions:

Euclidean (L2) Distance (<-> operator)

Inner Product (<#> operator)

L1 Distance

1 - (vector <=> query_embedding) turns that distance into a similarity score (1 = identical, 0 = no similarity).

Also we’re restricting the time range to last 15 minutes, so it returns the vectors which closest and are created in last 15 minutes.

Now when we’ve got the most similar messages, we need to get most recent messages also just to be on the safe side of not missing the context, we’ll pick last 5 messages and combine both while handling duplicates.

//we call the supabase RPC we created for semantic search
const { data: similarMessages, error: contextFetchError } =
  await supabase.rpc("match_chat_messages", {
   query_embedding: vector,
   user_id: user.id,
   chat_id: chatId,
   match_count: 5, // Top N similar results
});

//recent messages
const { data: recentMessages, error: recentError } = await supabase
      .from("chat_messages")
      .select("*")
      .eq("chat_id", chatId)
      .eq("user_id", user.id)
      .eq("role", "user")
      .order("created_at", { ascending: false })
      .limit(5);

//filter duplicates:
const recentSet = new Set(
      recentMessages?.map((m) => `${m.role}:${m.message}`) || []
);

const combinedContext = [
      ...(recentMessages || []).reverse(),
      ...(similarMessages || []).filter(
        (m: any) =>
          m.role === "user" && !recentSet.has(`${m.role}:${m.message}`)
      ),
];

const context = combinedContext
      .map(
        (msg: any) =>
          `${msg.role}: ${msg.message} ${
            msg.metadata
              ? `and metadata from file or scraped url: ${msg.metadata}`
              : ""
          }`
      )
.join("\n");

Now we need to build the prompt to pass the LLM with query and some system inputs.

//System Instructions:
const systemInstruction = (username: string) => `
You're Askie, an AI buddy helping ${username}. Follow these principles:
- Assumt the end user is always a 5 years old kid, explain things in such a manner that a kid can understand.
- Greet warmly on greetings, as someone would do with a kid.
- Don't use Jargons, if necessary, explain them afterwards.
- Add stories and examples everywhere needed.
- If it's a simple factual question, answer it directly — no fluff, no intros or outros.
- Use emojis only if they help with clarity 🎨
- Be positive and encouraging 😊
- Never make things up. If unsure, just say "Sorry, I don't know."
- If a code snippet is provided, explain what it does in simple terms, focusing on the main functionality. Also provide examples if needed. And make sure you wrap the code snippets in tripple backticks, start with 3 backticks, mention programming language name put a \n then code snippet and end with 3 backticks. Please don't put too much comments,very minimal as needed. instead after the code snippet, explain it in text.
- Context input is provided to help you understand the conversation better, but use it only if it adds value to your response and always prioritize the main query, context should be an add-on knowledge, if they don't match, ignore context.
- If something related to Law, constitution is asked, provide the references from constitution and articles/sections as required.
`;

//prompt creation function
export function promptEngine(
  query: string,
  extractedText: string = "",
  context: string = "",
  scrapedData: string = "",
  username: string = "Ajeet"
): string {
  const userPrompt = `Help ${username} with this input. Understand what the input is — it could be a question, a code snippet, a document, a legal text, a URL summary, etc. Explain it clearly.`;

  const infoParts = [];

  if (extractedText) {
    infoParts.push(`Here’s some extracted text from a file:\n${extractedText}`);
  }

  if (scrapedData) {
    infoParts.push(`Here’s scraped data from a webpage:\n${scrapedData}`);
  }

  if (context) {
    infoParts.push(`Here’s additional context from earlier:\n${context}`);
  }

  const finalInfo = infoParts.length ? `\n\n${infoParts.join("\n\n")}` : "";

  return `${systemInstruction(
    username
  )}\n\n${userPrompt}\n\nUser Input:\n${query}${finalInfo}`;
}

Now we can call these utility functions in our main API and pass to LLM, yes that moment is here. 🫣

const prompt = promptEngine(
      query,    //actual text query
      extractedText,    //data read from image/pdf (optional)
      context,          //context
      scrapedData?.markdown,    //scraped data from URL (optional)
      user.user_metadata.full_name    //user's name
    );
const result = await generateWithGemini(prompt);

Now we’ve the response, put proper error handling and save the input and result both in DB. 🎉

After returning from backend, we receive it on client side and we render it there.

UI Component

Additional Enhancements that we can do on client side is to format the assitant response that is coming, because it can return code, it can return semantic elements, so to properly render, we’ve created this component, which parses the response and creates a well structured component.

interface LLMResponseRendererProps {
  response: string;
  className?: string;
}

const LLMResponseRenderer: React.FC<LLMResponseRendererProps> = ({
  response,
  className = "",
}) => {
  const parseResponse = (text: string) => {
    const elements: JSX.Element[] = [];
    let currentIndex = 0;
    let elementKey = 0;

    // Regex to match code blocks with optional language
    const codeBlockRegex = /```(\w+)?\n([\s\S]*?)```/g;
    let match;

    while ((match = codeBlockRegex.exec(text)) !== null) {
      const beforeCode = text.slice(currentIndex, match.index);

      // Process text before code block
      if (beforeCode.trim()) {
        elements.push(
          <div key={elementKey++} className="prose">
            {parseTextContent(beforeCode)}
          </div>
        );
      }

      // Add code block
      const language = match[1] || "";
      const code = match[2].trim();
      elements.push(
        <CodeBlock key={elementKey++} code={code} language={language} />
      );

      currentIndex = match.index + match[0].length;
    }

    // Process remaining text after last code block
    const remainingText = text.slice(currentIndex);
    if (remainingText.trim()) {
      elements.push(
        <div key={elementKey++} className="prose">
          {parseTextContent(remainingText)}
        </div>
      );
    }

    return elements;
  };

  const parseTextContent = (text: string) => {
    // Split by double newlines for paragraphs, but preserve single newlines within paragraphs
    const paragraphs = text.split(/\n\s*\n/);
    const elements: JSX.Element[] = [];
    let elementKey = 0;

    paragraphs.forEach((paragraph, paragraphIndex) => {
      const lines = paragraph.split("\n");

      lines.forEach((line) => {
        const trimmedLine = line.trim();

        if (trimmedLine === "") {
          // Skip empty lines within paragraphs
          return;
        }

        // Check for headings
        if (trimmedLine.startsWith("### ")) {
          elements.push(
            <h3
              key={elementKey++}
              className="text-lg font-bold mt-4 mb-2 text-gray-800"
            >
              {parseTextWithInlineCode(trimmedLine.substring(4))}
            </h3>
          );
        } else if (trimmedLine.startsWith("## ")) {
          elements.push(
            <h2
              key={elementKey++}
              className="text-xl font-bold mt-6 mb-3 text-gray-800"
            >
              {parseTextWithInlineCode(trimmedLine.substring(3))}
            </h2>
          );
        } else if (trimmedLine.startsWith("# ")) {
          elements.push(
            <h1
              key={elementKey++}
              className="text-2xl font-bold mt-6 mb-4 text-gray-800"
            >
              {parseTextWithInlineCode(trimmedLine.substring(2))}
            </h1>
          );
        } else if (trimmedLine.match(/^\d+\.\s/)) {
          // Handle numbered lists
          elements.push(
            <p
              key={elementKey++}
              className="mb-2 text-gray-700 leading-relaxed ml-4"
            >
              {parseTextWithInlineCode(trimmedLine)}
            </p>
          );
        } else {
          // Regular text - preserve line breaks within paragraphs
          elements.push(
            <p
              key={elementKey++}
              className="mb-3 text-gray-700 leading-relaxed"
            >
              {parseTextWithInlineCode(line)}
            </p>
          );
        }
      });

      // Add spacing between paragraphs (except for the last one)
      if (paragraphIndex < paragraphs.length - 1) {
        elements.push(<div key={elementKey++} className="mb-4" />);
      }
    });

    return elements;
  };

  const parseTextWithInlineCode = (text: string) => {
    // Handle inline code first, then bold text
    const codeRegex = /(`[^`]+`)/g;
    const parts = text.split(codeRegex);

    return parts.map((part, index) => {
      if (part.startsWith("`") && part.endsWith("`")) {
        // Inline code
        return (
          <code
            key={index}
            className="px-1.5 py-0.5 bg-gray-100 text-red-600 rounded text-sm font-mono border"
          >
            {part.slice(1, -1)}
          </code>
        );
      } else {
        // Handle bold text in non-code parts
        return parseBoldText(part, index);
      }
    });
  };

  const parseBoldText = (text: string, baseKey: number = 0) => {
    const parts = text.split(/(\*\*.*?\*\*)/g);
    return parts.map((part, index) => {
      if (part.startsWith("**") && part.endsWith("**")) {
        return (
          <strong key={`${baseKey}-${index}`} className="font-semibold">
            {part.slice(2, -2)}
          </strong>
        );
      }
      return part;
    });
  };

  return (
    <div className={`max-w-none ${className}`}>{parseResponse(response)}</div>
  );
};

export default LLMResponseRenderer;

Code Blocks UI Component

import React, { type JSX } from "react";
import { Copy, Check } from "lucide-react";

interface CodeBlockProps {
  code: string;
  language: string;
}

const CodeBlock: React.FC<CodeBlockProps> = ({ code, language }) => {
  const [copied, setCopied] = React.useState(false);

  const handleCopy = async () => {
    await navigator.clipboard.writeText(code);
    setCopied(true);
    setTimeout(() => setCopied(false), 2000);
  };

  const highlightCode = (code: string) => {
    const lines = code.split("\n");
    return lines.map((line, lineIndex) => {
      const tokens = tokenizeLine(line);

      return (
        <div key={lineIndex} className="leading-relaxed">
          {tokens.map((token, tokenIndex) => (
            <span key={tokenIndex} className={getTokenClass(token.type)}>
              {token.value}
            </span>
          ))}
        </div>
      );
    });
  };

  const tokenizeLine = (line: string) => {
    const tokens: Array<{
      type: string;
      value: string;
      start: number;
      end: number;
    }> = [];

    // Define regex patterns in order of precedence
    const patterns = [
      { type: "comment", regex: /\/\/.*$|\/\*[\s\S]*?\*\//g },
      { type: "string", regex: /(['"`])((?:(?!\1)[^\\]|\\.)*)(\1)/g },
      { type: "inlineCode", regex: /`[^`]+`/g },
      {
        type: "keyword",
        regex:
          /\b(const|let|var|function|class|interface|type|import|export|from|if|else|for|while|return|async|await|try|catch|finally|throw|new|this|super|extends|implements|public|private|protected|static|readonly)\b/g,
      },
      { type: "number", regex: /\b\d+\.?\d*\b/g },
      { type: "operator", regex: /[+\-*/%=<>!&|^~?:;,.]/g },
      { type: "bracket", regex: /[(){}[\]]/g },
    ];

    // Find all matches with their positions
    const allMatches: Array<{
      type: string;
      value: string;
      start: number;
      end: number;
    }> = [];

    patterns.forEach((pattern) => {
      const regex = new RegExp(pattern.regex.source, pattern.regex.flags);
      let match;
      while ((match = regex.exec(line)) !== null) {
        allMatches.push({
          type: pattern.type,
          value: match[0],
          start: match.index,
          end: match.index + match[0].length,
        });
      }
    });

    // Sort matches by position and remove overlaps
    allMatches.sort((a, b) => a.start - b.start);

    // Remove overlapping matches (keep the first one)
    const nonOverlappingMatches = [];
    let lastEnd = 0;

    for (const match of allMatches) {
      if (match.start >= lastEnd) {
        nonOverlappingMatches.push(match);
        lastEnd = match.end;
      }
    }

    // Build tokens from non-overlapping matches
    let currentIndex = 0;

    nonOverlappingMatches.forEach((match) => {
      // Add text before match
      if (match.start > currentIndex) {
        const textValue = line.slice(currentIndex, match.start);
        if (textValue) {
          tokens.push({
            type: "text",
            value: textValue,
            start: currentIndex,
            end: match.start,
          });
        }
      }

      // Add the matched token
      tokens.push(match);
      currentIndex = match.end;
    });

    // Add remaining text
    if (currentIndex < line.length) {
      const remainingText = line.slice(currentIndex);
      if (remainingText) {
        tokens.push({
          type: "text",
          value: remainingText,
          start: currentIndex,
          end: line.length,
        });
      }
    }

    return tokens;
  };

  const getTokenClass = (type: string) => {
    switch (type) {
      case "comment":
        return "text-green-400 italic";
      case "string":
        return "text-blue-400";
      case "inlineCode":
        return "text-yellow-300 bg-gray-800 px-1 rounded";
      case "keyword":
        return "text-purple-400 font-medium";
      case "number":
        return "text-orange-400";
      case "operator":
        return "text-gray-400";
      case "bracket":
        return "text-gray-300 font-medium";
      default:
        return "text-gray-200";
    }
  };

  return (
    <div className="my-4 rounded-lg border border-gray-300 bg-gray-900 overflow-hidden shadow-sm">
      <div className="flex items-center justify-between px-4 py-2 bg-gray-800 border-b border-gray-700">
        <span className="text-sm font-medium text-gray-300 capitalize">
          {language || "code"}
        </span>
        <button
          onClick={handleCopy}
          className="flex items-center gap-1 px-2 py-1 text-sm text-gray-400 hover:text-white hover:bg-gray-700 rounded transition-colors"
        >
          {copied ? <Check size={14} /> : <Copy size={14} />}
          {copied ? "Copied!" : "Copy"}
        </button>
      </div>
      <pre className="p-4 overflow-x-auto bg-gray-900">
        <code className="text-sm font-mono text-gray-200 whitespace-pre block">
          {highlightCode(code)}
        </code>
      </pre>
    </div>
  );
};

Here’s a UI sneak peak what this custom component gives us after parsing LLM response.

Rest are the basic things like setting up the routes, APIs for chats CRUD, Message components and context setup for client application which you can bulid as per your UI choices.

Thank you for reading!!

Love you all <3000 ❤️

Future Enhancements

Introduce team workspace, where >1 people join a chat and can ask/read together to brainstorm ideas or solve problems.
Image response
Video Input/Output
Chat Sharing

Building Askie, your friendly helper to explain you things in Layman terms

Table of contents

Motive behind the project

What are we building

Tech Stack

Frontend:

Backend:

Prerequisites

User Journey

Frontend UI

Text Input

File Input

URL Input

Semantic Search

UI Component

Code Blocks UI Component

Future Enhancements

Subscribe to my newsletter

Ajeet Patel

Ajeet Patel