Zero‑to‑Hero: Build a RAG App with Node.js + Express + Gemini + Postgres (pgvector)

What we’re building

A small REST API that can answer questions about your own local docs by using RAG (Retrieval‑Augmented Generation).

Flow:

/docs/*.md|.txt|.html  --(ingest)--> Postgres (text + embeddings)
                                     ^
                              Gemini embeddings (768‑d)

POST /query  -> embed query -> vector search (pgvector) -> top K chunks ->
                                                Gemini answers only from context

Why this is cool: you can make the model answer private facts it never knew before — because you feed it those facts at query time.


Prerequisites

  • Node.js 18+

  • PostgreSQL locally (or remote) and the pgvector extension installed

  • A Gemini API key (from Google AI Studio)

New to Postgres? You can use pgAdmin or psql. Make sure you can connect and run SQL in the target database.


0) Make an empty project

mkdir rag-local-gemini && cd rag-local-gemini
npm init -y

Install dependencies:

npm i express axios @google/genai pg pgvector dotenv html-to-text glob
npm i -D nodemon

package.json (add scripts + use ESM):

{
  "name": "rag-local-gemini",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "dev": "nodemon src/server.js",
    "ingest": "node src/ingest.js",
    "start": "node src/server.js"
  }
}

Create folders:

mkdir -p src docs

Create .env:

PORT=3000
DATABASE_URL=postgres://USER:PASS@localhost:5432/ragdemo
GEMINI_API_KEY=AIzaSy...your_key...
DOCS_GLOB=docs/**/*.{md,txt,html}
EMBED_DIM=768

Replace USER:PASS and database name with your actual values.


1) Prepare Postgres (tables + indexes)

Connect to the DB in your .env and run once:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS local_chunks (
  id           BIGSERIAL PRIMARY KEY,
  doc_path     TEXT NOT NULL,
  title        TEXT,
  url          TEXT,
  chunk_index  INT  NOT NULL,
  content      TEXT NOT NULL,
  embedding    vector(768),
  updated_at   TIMESTAMPTZ DEFAULT now()
);

-- Prevent duplicate chunks for the same file
CREATE UNIQUE INDEX IF NOT EXISTS local_chunks_unique
  ON local_chunks (doc_path, chunk_index);

-- Fast approximate nearest‑neighbor search (cosine)
CREATE INDEX IF NOT EXISTS local_chunks_embed_hnsw
  ON local_chunks USING hnsw (embedding vector_cosine_ops);

2) Add example docs (private knowledge)

Create docs/intro.md:

# Profile: Muhammad Ilyas

This document captures key, internal details about **Muhammad Ilyas** to be used by our RAG demo.

Create docs/faq.md:

# Muhammad Ilyas – FAQ

3) Code the app (copy these files into src/)

src/util.js

import { htmlToText } from "html-to-text";
import fs from "fs";

export function htmlToPlain(html) {
  return htmlToText(html, {
    wordwrap: false,
    selectors: [{ selector: "a", options: { ignoreHref: true } }],
  }).replace(/\n{3,}/g, "\n\n").trim();
}

export function readFileSmart(path) {
  const raw = fs.readFileSync(path, "utf8");
  if (path.endsWith(".html")) return htmlToPlain(raw);
  return raw; // md/txt are already plain
}

// Simple character-based chunking
export function chunk(text, maxChars = 1500, overlap = 150) {
  const chunks = [];
  let i = 0;
  while (i < text.length) {
    const end = Math.min(i + maxChars, text.length);
    chunks.push(text.slice(i, end));
    if (end === text.length) break;
    i = end - overlap;
  }
  return chunks;
}

src/embed.js

import { GoogleGenAI } from "@google/genai";
import "dotenv/config";

const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const DIM = Number(process.env.EMBED_DIM || 768);

export async function embedTexts(texts) {
  const res = await genai.models.embedContent({
    model: "gemini-embedding-001",
    contents: texts,
    config: { outputDimensionality: DIM, taskType: "RETRIEVAL_DOCUMENT" }
  });
  return res.embeddings.map(e => e.values);
}

export async function embedQuery(text) {
  const res = await genai.models.embedContent({
    model: "gemini-embedding-001",
    contents: text,
    config: { outputDimensionality: DIM, taskType: "RETRIEVAL_QUERY" }
  });
  return res.embeddings[0].values;
}

src/db.js

import { Pool } from "pg";
import "dotenv/config";

export const pool = new Pool({ connectionString: process.env.DATABASE_URL });

export async function initDb() {
  await pool.query(`CREATE EXTENSION IF NOT EXISTS vector`);
}

export async function insertChunks(rows) {
  const client = await pool.connect();
  try {
    await client.query("BEGIN");
    for (const r of rows) {
      const embeddingLiteral = `[${r.embedding.join(",")}]`;
      await client.query(
        `
        INSERT INTO local_chunks
          (doc_path, title, url, chunk_index, content, embedding, updated_at)
        VALUES ($1, $2, $3, $4, $5, CAST($6 AS vector), now())
        ON CONFLICT (doc_path, chunk_index)
        DO UPDATE SET
          title = EXCLUDED.title,
          url = EXCLUDED.url,
          content = EXCLUDED.content,
          embedding = EXCLUDED.embedding,
          updated_at = now()
        `,
        [r.docPath, r.title || null, r.url || null, r.chunkIndex, r.content, embeddingLiteral]
      );
    }
    await client.query("COMMIT");
  } catch (e) {
    await client.query("ROLLBACK");
    throw e;
  } finally {
    client.release();
  }
}

export async function searchByEmbedding(embedding, k = 6) {
  const embeddingLiteral = `[${embedding.join(",")}]`;
  const { rows } = await pool.query(
    `
    SELECT id, doc_path, title, url, content, chunk_index
    FROM local_chunks
    ORDER BY embedding <-> CAST($1 AS vector)
    LIMIT $2
    `,
    [embeddingLiteral, k]
  );
  return rows;
}

src/ingest.js

import "dotenv/config";
import path from "path";
import { glob } from "glob";
import { readFileSmart, chunk } from "./util.js";
import { embedTexts } from "./embed.js";
import { initDb, insertChunks } from "./db.js";

async function main() {
  await initDb();

  const pattern = process.env.DOCS_GLOB || "docs/**/*.{md,txt,html}";
  const files = await glob(pattern, { nodir: true });
  console.log(`Found ${files.length} files`);

  for (const file of files) {
    const text = readFileSmart(file).trim();
    if (!text) continue;

    const title = path.basename(file);
    const pieces = chunk(text, 1500, 150);
    const embeddings = await embedTexts(pieces);

    const rows = pieces.map((content, i) => ({
      docPath: file,
      title,
      url: null,
      chunkIndex: i,
      content,
      embedding: embeddings[i],
    }));

    await insertChunks(rows);
    console.log(`Ingested ${title} (${pieces.length} chunks)`);
  }
}

main().catch(e => { console.error(e); process.exit(1); });

src/server.js

import "dotenv/config";
import express from "express";
import { initDb, searchByEmbedding } from "./db.js";
import { embedQuery } from "./embed.js";
import { GoogleGenAI } from "@google/genai";

const app = express();
app.use(express.json());

const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const SYSTEM_PROMPT = `
You are a helpful assistant that must answer using ONLY the provided context.
- If info is missing, say: "I don't find this in the knowledge base."
- Cite sources at the end as [Title](Path).
- Keep answers concise and factual.
`;

app.post("/query", async (req, res) => {
  try {
    const q = req.body?.q?.toString?.().trim();
    if (!q) return res.status(400).json({ error: "Missing q" });

    await initDb();
    const qEmbed = await embedQuery(q);
    const topK = await searchByEmbedding(qEmbed, 6);

    const context = topK.map((r, i) => `### Doc ${i+1}: ${r.title}\nPath: ${r.doc_path}\n---\n${r.content}`).join("\n\n");

    const prompt = `${SYSTEM_PROMPT}\n\n# Question\n${q}\n\n# Context\n${context}\n\nAnswer:`;

    const result = await genai.models.generateContent({
      model: "gemini-1.5-pro", // or "gemini-1.5-flash" for speed
      contents: prompt,
    });

    const answer = (result.text ?? "").trim();
    return res.json({
      answer,
      sources: topK.map(r => ({ title: r.title, path: r.doc_path }))
    });
  } catch (e) {
    console.error(e);
    res.status(500).json({ error: e.message || "Server error" });
  }
});

const port = Number(process.env.PORT || 3000);
app.listen(port, () => console.log(`RAG server on http://localhost:${port}`));

4) Run it

  1. Ingest your docs (creates/updates chunks):
npm run ingest

Expected:

Found 2 files
Ingested intro.md (X chunks)
Ingested faq.md (Y chunks)
  1. Start the API:
npm run dev   # or npm start

Console:

RAG server on http://localhost:3000
  1. Ask questions:
curl -s http://localhost:3000/query -H "Content-Type: application/json" \
  -d '{"q":"Who leads Muhammad Ilyas currently and who mentors him?"}' | jq

curl -s http://localhost:3000/query -H "Content-Type: application/json" \
  -d '{"q":"What was his previous role and who led that team?"}' | jq

You should see an answer plus a sources array pointing at docs/intro.md or docs/faq.md.

Change a fact in the docs, re‑run npm run ingest, ask again → you’ll see freshness.


Zero‑to‑Hero checklist ✅

  • Node + deps installed

  • .env filled (DB + Gemini key)

  • Postgres table + pgvector index created

  • Docs added

  • npm run ingest OK

  • npm run dev OK

  • Queries return answers + sources

0
Subscribe to my newsletter

Read articles from Muhammad Sufiyan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Muhammad Sufiyan
Muhammad Sufiyan

As a former 3D Animator with more than 12 years of experience, I have always been fascinated by the intersection of technology and creativity. That's why I recently shifted my career towards MERN stack development and software engineering, where I have been serving since 2021. With my background in 3D animation, I bring a unique perspective to software development, combining creativity and technical expertise to build innovative and visually engaging applications. I have a passion for learning and staying up-to-date with the latest technologies and best practices, and I enjoy collaborating with cross-functional teams to solve complex problems and create seamless user experiences. In my current role as a MERN stack developer, I have been responsible for developing and implementing web applications using MongoDB, Express, React, and Node.js. I have also gained experience in Agile development methodologies, version control with Git, and cloud-based deployment using platforms like Heroku and AWS. I am committed to delivering high-quality work that meets the needs of both clients and end-users, and I am always seeking new challenges and opportunities to grow both personally and professionally.