How We Rebuilt the ChatGPT & YouTube Summary Extensions

KeiKei
9 min read

by Koki Nagai

Introduction

At Glasp, we offer several services as Chrome extensions. Our Glasp Web Highlighter allows users to highlight and save content from web pages and PDFs with the help of AI, making it easy to share with others. We also provide ChatGPT & YouTube Summary by Glasp, a tool that summarizes YouTube videos, and YouTube Summary with ChatGPT & Claude, which expands beyond YouTube videos to summarize web pages and PDFs as well.

Recently, we migrated the implementation of ChatGPT & YouTube Summary from VanillaJS to Vite with React/TypeScript.

In this article, I would like to introduce the reasons behind the decision to rewrite it, as well as the technical choices we made during the implementation process.

You can understand what ChatGPT & YouTube Summary is by watching the following YouTube video.

How Chrome extensions work

I would like to explain what Chrome extensions are and how they work. Chrome extensions are a technology that allows you to modify the behavior or appearance of web pages displayed in Chrome. They are built using static files like HTML, CSS, and JavaScript.

For example, one of the most well-known Chrome extensions in web frontend development is React Developer Tools. This extension detects pages that use React and helps users visually inspect the component tree’s state and highlight rendering, making it easier to understand how React is functioning on the page.

In the ChatGPT & YouTube Summary extension, a panel is inserted into the upper right corner of the YouTube video viewing page like the below image. This panel provides a transcript and summary of the video currently being watched.

ChatGPT & YouTube Summary

There are several concepts and technologies necessary for building Chrome extensions.

manifest.json

The manifest.json is the only required file when building a Chrome extension. It defines paths to the scripts to be injected into pages, sets permissions, and configures details such as the extension's name, version, and icon. This file must be placed in the root of the project and named "manifest.json."

background script

The background script monitors events like tab opening and closing, as well as receiving messages from content scripts (discussed later), and performs actions in response to those events.

For example, in the ChatGPT & YouTube Summary extension, the background script monitors when a user installs the extension and redirects them to an onboarding page. This is implemented using the background script mechanism.

Since Manifest Version 3, background scripts use Service Workers, which are event-driven and only load when a specific event occurs, rather than running constantly. Due to this, background scripts cannot directly reference the DOM.

Content scripts

In a Content script, you can execute JavaScript on the page that the user is viewing, allowing you to modify the DOM or insert elements. As mentioned earlier, in the ChatGPT & YouTube Summary extension, a content script is used to insert a panel in the top-right corner of the YouTube page. Based on user interactions with this panel, the script sends requests to the Glasp’s API to retrieve data and runs scripts to summarize the video content.

Option page

On the Options page, you can display a standalone static web page. In the ChatGPT & YouTube Summary extension, users can customize settings such as dark mode, AI models, and prompts, allowing them to personalize the extension to their preferences.

Options page of ChatGPT & YouTube Summary

Popup

The Popup is a pop-up element that appears when you click the icon of the extension in the Chrome toolbar. It can display various elements related to the extension, providing users with quick access to information or controls.

Why we decided to rewrite

The ChatGPT & YouTube Summary extension was released around November 2022, shortly after OpenAI launched ChatGPT. It took only about a week from ChatGPT’s release to launch this extension. During its development, our top priority was to release it as quickly as possible. Since we had previously developed another extension using VanillaJS, we decided to use that as a foundation for building ChatGPT & YouTube Summary as well.

After the release, the user base grew steadily, and we continued to expand the features of ChatGPT & YouTube Summary. However, several development issues began to surface due to the speed-first approach we had taken. These included technical debt, the complexity of DOM manipulation and state management due to the use of VanillaJS, and the lack of type safety, which led to unintended bugs.

To continue growing ChatGPT & YouTube Summary while maintaining a fast development pace, we needed to add and refine various features without slowing down. This is why we decided to rewrite ChatGPT & YouTube Summary.

The criteria for choosing the new technology for the rewrite were as follows:

  1. Technologies that would remain maintainable as the service expands.

  2. Technologies with accumulated knowledge and expertise within the team.

We are also developing and managing a service called glasp.co, which was initially built with HTML and VanillaJS. However, facing similar challenges, we had previously rewritten that service using Next.js + TypeScript. Through this process, we gained a deep understanding of the advantages of component-based development for building UIs and the improved bug detection and readability provided by TypeScript. Our team has gathered substantial knowledge in these areas. Moreover, since we are a small team of 2–3 members, minimizing the cost of context-switching between languages was important to facilitate development across different services. Based on these considerations, we decided to rewrite ChatGPT & YouTube Summary using React + TypeScript.

How the refactoring was done

We used the following technologies for the rewrite:

  • React

  • TypeScript

  • Vite + CRXJS Vite Plugin

  • Tailwind CSS

React + TypeScript

By using React, we shifted from an imperative approach of building the DOM with VanillaJS to a declarative way of constructing the UI, making it easier for developers to understand the implemented UI. Additionally, ChatGPT & YouTube Summary processes data like summaries using YouTube’s transcript data. With TypeScript, we can add types to the data received from YouTube, making the implementation more robust. This also allows other developers to understand which data from YouTube is being used by referring to the type definitions.

We use React in two main areas: the options page and content scripts. Since the options page is a simple static page, we can prepare an HTML structure like the following and implement a process to mount React to the target element with a specified ID:

<!DOCTYPE html>
<html lang="en">
  <head></head>
  <body>
    <div id="root"></div>
    <script type="module" src="./index.tsx"></script>
  </body>
</html>
import ReactDOM from "react-dom/client";

ReactDOM.createRoot(document.getElementById("root") as HTMLElement).render(
  <React.StrictMode>
    <div>Sample</div>
  </React.StrictMode>
);

Content scripts, as mentioned earlier, are scripts that execute JavaScript on a target web page. This means that if you want to mount a React component using content scripts, you can identify the desired DOM element on the target web page, just like in the options page, and then execute createRoot on that DOM element to build the UI with React. Here’s an example:

import ReactDOM from "react-dom/client";

const targetElement = document.getElementById("target-element");
ReactDOM.createRoot(targetElement).render(
  <React.StrictMode>
    <div>Sample</div>
  </React.StrictMode>
);

In this example, we are simply mounting a React component. However, YouTube is a SPA (Single Page Application), and since the mounted React component is separate from YouTube’s component lifecycle, the same component remains even when the YouTube page changes. As a result, the component displaying the previous summary will persist on the next page.

To prevent this, we use MutationObserver to detect page changes and create a new React component that remounts each time a page change occurs. Here’s the implementation:

const insertElement = () => {
  const targetElement = document.getElementById("target-element");
  ReactDOM.createRoot(targetElement).render(
    <React.StrictMode>
      <div>Sample</div>
    </React.StrictMode>
  );
};

const initPrevUrl = () => {
  let prevUrl = "";
  return (url?: string) => {
    if (url === undefined) return prevUrl;
    prevUrl = url;
    return prevUrl;
  };
};

const bodyElement = document.querySelector("body") ?? document.body;

const observer = new MutationObserver((mutations) => {
  mutations.forEach(async () => {
    const prevUrl = initPrevUrl();
    if (prevUrl() !== document.location.href) {
      prevUrl(document.location.href);
      insertElement();
    }
  });
});

observer.observe(bodyElement, { childList: true, subtree: true });

In this implementation, the MutationObserver detects changes in the child elements of the body during YouTube page transitions. If the URL has changed compared to the previous one, it remounts the React component. This approach ensures that a new React component is created and updated with new data after a page change.

Vite + CRXJS Vite Plugin

The CRXJS Vite Plugin is a tool that assists in developing Chrome extensions, offering support for features like Hot Module Replacement (HMR) and TypeScript support for manifest.json. It allows changes to content scripts and background scripts to be reflected without needing to reload the page, and it enables the implementation of manifest.json with type safety. Integrating the CRXJS Vite Plugin with Vite is very straightforward—just add the crx function to vite.config.ts as shown below, and the setup is complete.

import react from "@vitejs/plugin-react";
import { defineManifest } from "@crxjs/vite-plugin";

export const manifest = defineManifest({
  // ...
});

export default defineConfig({
  plugins: [react(), crx({ manifest })],
  // ...
});

Directory

In the code written with VanillaJS, DOM manipulation, API communication, and logic were often bundled into a single file, resulting in poorly separated responsibilities. This made it difficult for developers other than the primary maintainer to add new features without risking regressions.

In this rewriting, we have adopted a structure that organizes features into separate directories like below. For example, we have split the functionality for options and YouTube summaries into their respective directories within the features directory, ensuring that each responsibility is clearly defined.

src
├── background-scripts ## background scripts
├── chrome-extension-api ## chrome api functions like storage
├── components ## shared components like button
├── configs ## configuration files like firebase
├── content-scripts ## Content scripts's entry
├── core ## domain logics like youtube summary transcript
├── features
│   ├── options
│   │   ├── hooks
│   │.  └── components
│   └── youtube-summary
├── hooks ## shared hooks
├── options ## options page's entry
├── providers ## shared react context providers
├── services ## api request functions
└── utils ## shared functions

This approach separates functionalities by their area of concern, but in the context of Chrome extensions, it can sometimes be clearer to organize files by technical concerns, such as content scripts, option pages, or background scripts. Therefore, we divided the directories into content-scripts and background-scripts as entry files, while importing the React components from the features directory into those scripts.

Additionally, for hooks, UI components, logic that are used across multiple features, we create hooks, components, utils directories directly under src. These serve as common files that can be imported into various feature modules.

By customizing the structure for Chrome extensions while maintaining a feature-based directory approach, we believe that this separation of responsibilities will make it easier to add new features and implement future improvements.

Conclusion

In this article, I shared the story of rewriting ChatGPT & YouTube Summary using React and TypeScript. While this rewrite doesn’t directly impact the user experience, it has significantly improved development efficiency internally, allowing us to deliver new value to users at a faster pace.

The rewrite is not the end goal, but rather a step towards further growth of the service. By building on this foundation, we hope to continue delivering even more value to our users.

We are hiring software engineers to help drive the growth of Glasp. If you’re interested, please check out the job posting below.

0
Subscribe to my newsletter

Read articles from Kei directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kei
Kei

Co-founder of Glasp