My Second Web Scraping Project Today: Scraping All Quotes from Quotes.toscrape.com with Pagination

kasumbi philkasumbi phil
2 min read

Hey everyone! 👋

I’ve been diving into web scraping recently, and today I completed my second scraping project — scraping quotes from the awesome site quotes.toscrape.com. This project was a great way to practice handling pagination and extracting multiple fields like quotes, authors, and tags.


What I Wanted to Build

The goal was to:

  • Scrape all quotes from every page on the site (there are 10 pages)

  • Extract the quote text, author, and tags

  • Save everything neatly into a CSV file for further analysis or fun


Tools I Used

  • requests — to fetch the HTML content

  • BeautifulSoup — to parse the HTML and extract data

  • csv — to save the data into a CSV file

  • urljoin from urllib.parse — to handle the pagination URLs


Here’s the Full Code I Used

import requests
from bs4 import BeautifulSoup
import csv
from urllib.parse import urljoin

base_url = "https://quotes.toscrape.com"
current_url = base_url

with open("scrapped.csv", 'w', newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Quote", "Author", "Tags"])

    while current_url:
        response = requests.get(current_url)
        soup = BeautifulSoup(response.text, "html.parser")

        quotes = soup.find_all("div", class_="quote")

        for quote in quotes:
            text = quote.find("span", class_="text").get_text()
            author = quote.find("small", class_="author").get_text()
            tag_elements = quote.find("div", class_="tags").find_all("a", class_="tag")
            tags = [tag.text for tag in tag_elements]
            tag_string = ", ".join(tags)

            writer.writerow([text, author, tag_string])

        print(f"Scraped: {current_url}")

        next_btn = soup.find("li", class_="next")

        if next_btn:
            next_btn_url = next_btn.a["href"]
            current_url = urljoin(current_url, next_btn_url)
        else:
            print("No next page found. Scraping complete.")
            current_url = None

How This Works

  • Start scraping from the homepage (base_url)

  • For each page, find all quote blocks and extract the quote, author, and tags

  • Write each quote to a CSV file

  • Look for the “Next” button — if it exists, update the URL and continue scraping

  • Stop when there are no more pages


What I Learned

  • How to handle pagination properly by updating URLs

  • Extracting multiple data points from nested HTML elements

  • Using urljoin to safely join relative URLs with the base URL

  • Writing to CSVs cleanly to preserve all data



Final Thoughts

This was a fun and practical project to deepen my web scraping skills. If you’re new to scraping, definitely give this a try — the site is perfect for beginners!

Happy scraping! 🚀

0
Subscribe to my newsletter

Read articles from kasumbi phil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kasumbi phil
kasumbi phil