Hey friends! 👋
Today, I took my web scraping journey one step further — and I’m super excited to share what I’ve learned.

In my previous post (if you missed it), I learned how to scrape book data (title, price, rating, and availability) from books.toscrape.com. But that only scraped one page.

Turns out, there are 50 pages of books on that site... and I wasn’t going to copy-paste my code 50 times 😅. So, I learned about pagination and how to make my scraper automatically go through all the pages. Let me walk you through it!

🧩 Problem: One Page Isn’t Enough

When scraping the first page, I used:

url = "https://books.toscrape.com/"

But after that, there are more pages like:

https://books.toscrape.com/catalogue/page-2.html
https://books.toscrape.com/catalogue/page-3.html
...

The HTML had a helpful navigation link:

<li class="next">
    <a href="catalogue/page-2.html">next</a>
</li>

So the goal became clear:

Follow the "next" link until there is no next link.

🛠️ My Approach

Here’s what I changed in my scraper:

✅ Step 1: Start from a base URL

base_url = "https://books.toscrape.com/"
current_url = base_url

✅ Step 2: Use a loop to keep scraping pages

while current_url:
    # get page content
    # scrape books
    # check for next page and update current_url

✅ Step 3: Combine relative links using `urljoin`

The next page was shown as a relative URL, like catalogue/page-2.html. So I used:

from urllib.parse import urljoin

current_url = urljoin(current_url, next_page_url)

This converted it into a full link like:

https://books.toscrape.com/catalogue/page-2.html

🧾 What the Final Output Looks Like

Just like before, I'm still writing all the scraped data to a CSV:

Title, Price, Availability, Rating
"A Light in the ...", £51.77, In stock, 3
"Starving Hearts", £13.99, In stock, 2
...

This time, the file contains ALL books — from every page.

🧑‍💻 Full Code: Scraping All Pages with Pagination

import requests
from bs4 import BeautifulSoup
import csv
from urllib.parse import urljoin

base_url = "https://books.toscrape.com/"
current_url = base_url

with open("scrapped.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Title", "Price", "Availability", "Rating"])

    while current_url:
        response = requests.get(current_url)
        soup = BeautifulSoup(response.text, "html.parser")

        books = soup.find_all("article", class_="product_pod")

        for book in books:
            price = book.find("p", class_="price_color").get_text()
            title = book.h3.a["title"]
            availability = book.find("p", class_="instock availability").get_text(strip=True)

            rating_map = {
                "One": 1,
                "Two": 2,
                "Three": 3,
                "Four": 4,
                "Five": 5
            }

            rating_word = book.find("p", class_="star-rating")["class"][1]
            rating = rating_map.get(rating_word, 0)

            writer.writerow([title, price, availability, rating])

        print("Scraped:", current_url)

        next_btn = soup.find("li", class_="next")
        if next_btn:
            next_page_url = next_btn.a["href"]
            current_url = urljoin(current_url, next_page_url)
        else:
            print("No next page found. Scraping complete.")
            current_url = None

💬 Final Thoughts

Pagination seemed scary at first — but once I understood that it’s just a loop that updates the URL using the “next” button, it all clicked.

If you're learning web scraping, I highly recommend starting with books.toscrape.com — it's beginner-friendly and fun.

🔮 Up Next...

In my next post, I might:

Add a progress bar to show scraping progress
Export data to JSON or SQLite
Try scraping another website

Thanks for reading!!

What I Learned Today – Scraping Multiple Pages Using Python (Pagination)

🧩 Problem: One Page Isn’t Enough

🛠️ My Approach

✅ Step 1: Start from a base URL

✅ Step 2: Use a loop to keep scraping pages

✅ Step 3: Combine relative links using `urljoin`

🧾 What the Final Output Looks Like

💬 Final Thoughts

🔮 Up Next...

Subscribe to my newsletter

kasumbi phil

kasumbi phil

What I Learned Today – Scraping Multiple Pages Using Python (Pagination)

🧩 Problem: One Page Isn’t Enough

🛠️ My Approach

✅ Step 1: Start from a base URL

✅ Step 2: Use a loop to keep scraping pages

✅ Step 3: Combine relative links using urljoin

🧾 What the Final Output Looks Like

🧑‍💻 Full Code: Scraping All Pages with Pagination

💬 Final Thoughts

🔮 Up Next...

Subscribe to my newsletter

kasumbi phil

kasumbi phil

✅ Step 3: Combine relative links using `urljoin`