Building a Web Scraper in Next JS

Gaurish NaikGaurish Naik
2 min read

Table of contents

Web Scraping is essentially extracting data from other websites.
Some websites or services may not have a public API that allows developers direct access to the data. In such scenarios, web scraping can be used as an alternative method. The only limitation is that unlike APIs web scraping may require more maintenance, as websites constantly change their structure or content over time.

In this example, We'll get the schedule for the Formula 1 2023 season from their official website https://www.formula1.com/en/racing/2023.html

Firstly create the next app : npx create-next-app

Install axios and jsdom : npm i axios jsdom

// pages/api/schedule.ts
import axios from "axios";
import { JSDOM } from "jsdom";

const BASE_URL = `https://www.formula1.com/en/racing/2023.html`;

const { data } = await axios.get(getUrl, {
      headers: {
        Accept: `< get the headers from network tab >`,
        Host: `www.formula1.com`,
        User-Agent: `< get the headers from network tab >`,
      },
    });
const dom = new JSDOM(data);

Axios is used to make an HTTP GET request to the specified URL and fetch the HTML content of the web page. The obtained HTML content is then passed to the JSDOM constructor, which creates a new DOM (Document Object Model) object from the HTML content.
Using this new DOM we can retrive specific element, modify or add new elements.

Next, we need go to inspect tab and get the class or id that needs to be retrived.

The div with class name "race-card" had the info I needed.

import axios from "axios";
import { JSDOM } from "jsdom";

const BASE_URL = `https://www.formula1.com/en/racing/2023.html`;

const { data } = await axios.get(getUrl, {
      headers: {
        Accept: `< get the headers from network tab >`,
        Host: `www.formula1.com`,
        User-Agent: `< get the headers from network tab >`,
      },
    });
const dom = new JSDOM(data);
const raceCards = dom.window.document.querySelectorAll(".race-card");

The raceCards variable will hold a NodeList of all the elements that match the specified CSS selector.

const schedule = Array.from(raceCards, (raceCard) => {
      const raceInfo = raceCard.textContent;
      const raceInfoArr = raceInfo.split(" ");
      const date = raceInfoArr[0] + " " + raceInfoArr[1];
      const venue = raceInfo.replace(date + " ", "");
      return {
        date,
        venue,
      };
    });

Array.from method is used to create a new array from the NodeList of elements that match the ".race-card" selector.

Further you can modify and use the data according to your requirements.

Demo Link : https://f1calender.vercel.app/
Github Repo: https://github.com/gaurishxjfk/web-scrapping-next-js

2
Subscribe to my newsletter

Read articles from Gaurish Naik directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gaurish Naik
Gaurish Naik