Web Scraping With Node.js and Cheerio ๐Ÿ”ฅ

In this post, I will show you how to create a web scraper using Node.js, Axios, and Cheerio. I use Axios to make an HTTP Request to a website then I use Cheerio to parse the HTML, and extract the data I needed.

Let's get to it. ๐Ÿš€

Create a Web Scraper with Node.js, Axios, and Cheerio

In this example, I will scrap Hacker News Website (https://news.ycombinator.com/) and extract the news list on it.

First, create a project folder, then npm init or yarn init . You'll get a new package.json file.

Then follow these steps:

  • Install Axios and Cheerio

      yarn add axios cheerio
    

  • Create a file called webScrapper.js

  • Import Axios and Cheerio

      const axios = require('axios')
      const cheerio = require('cheerio')
    
  • Create a function called scrape()

  • Make an HTTP call to Hacker News Website. you'll get the string containing the HTML code. To view what the data looks like, I add console.log inside Axios then block. Don't forget to call the function after that by adding the code scrape()

      const scrape = () => {
        axios.get('https://news.ycombinator.com/')
          .then(({ data: page }) => {
            console.log(page)
          })
          .catch(error => {
            console.error(error)
          })
      }
    
      scrape()
    
  • If you want to see the result, run the code with Node.js

      node webScrapper.js
    
  • You'll see the result something like this.

  • That is the same content as from the website.

  • All right, let's extract the content of the website. I want to get the news title and the URL. So, I inspect the element and find them. As you can see, the news title and URL are available inside span which has class titleline . So I will get the anchor tag (a), then get the text inside it for the news title and also the href attributes for the new URL.

  • I create a function to extract data from the page

      const extractData = (page) => {
        const $ = cheerio.load(page)
        const $newsList = $('.titleline > a')
    
        const result = []
        for (const $news of $newsList) {
          const title = $news.children[0].data
          const url = $news.attribs.href
    
          result.push({ title, url })
        }
    
        return result
      }
    

    If you don't understand what the code above does, read this:

    • cheerio.load = parse string HTML to be Cheerio Object, so you can traverse it

    • $(.titleline > a) = get the anchor element inside an element that has class titleline

    • Loop the $newsList array and get the news title and URL, then create an Object and push it to result array

    • return the result array

  • Ok, the extract function is ready. Let's call it inside scrap function

      const scrape = () => {
        axios.get('https://news.ycombinator.com/')
          .then(({ data: page }) => {
            const result = extractData(page)
            console.log(result)
          })
          .catch(error => {
            console.error(error)
          })
      }
    
  • Run the scrapper again to see the result

      node webScrapper.js
    
  • Now, I have the news list data I wanted as an Array of Object ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

๐ŸŒŸ Here is the full final code ๐ŸŒŸ

The result of Web Scraping using Node.js and Cheerio

That's an example of how to create a web scrapper with Node.js, Axios, and Cheerio ๐Ÿ˜Ž. Now, when you have the extracted data, you can save it to a CSV file, Google Spreadsheet, or a database, or post it to another place you want. If you want to see how to do it, write a comment below and I will write an article about it.

Please do web scrapping wisely. Thank you. ๐Ÿ˜‰

0
Subscribe to my newsletter

Read articles from Bagus Budi Cahyono directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bagus Budi Cahyono
Bagus Budi Cahyono