What I Learned Today – Scraping Multiple Pages Using Python (Pagination)

Hey friends! 👋
Today, I took my web scraping journey one step further — and I’m super excited to share what I’ve learned.
In my previous post (if you missed it), I learned how to scrape book data (title, price, rating, and availability) from books.toscrape.com. But that only scraped one page.
Turns out, there are 50 pages of books on that site... and I wasn’t going to copy-paste my code 50 times 😅. So, I learned about pagination and how to make my scraper automatically go through all the pages. Let me walk you through it!
🧩 Problem: One Page Isn’t Enough
When scraping the first page, I used:
url = "https://books.toscrape.com/"
But after that, there are more pages like:
https://books.toscrape.com/catalogue/page-2.html
https://books.toscrape.com/catalogue/page-3.html
...
The HTML had a helpful navigation link:
<li class="next">
<a href="catalogue/page-2.html">next</a>
</li>
So the goal became clear:
Follow the "next" link until there is no next link.
🛠️ My Approach
Here’s what I changed in my scraper:
✅ Step 1: Start from a base URL
base_url = "https://books.toscrape.com/"
current_url = base_url
✅ Step 2: Use a loop to keep scraping pages
while current_url:
# get page content
# scrape books
# check for next page and update current_url
✅ Step 3: Combine relative links using urljoin
The next page was shown as a relative URL, like catalogue/page-2.html
. So I used:
from urllib.parse import urljoin
current_url = urljoin(current_url, next_page_url)
This converted it into a full link like:
https://books.toscrape.com/catalogue/page-2.html
🧾 What the Final Output Looks Like
Just like before, I'm still writing all the scraped data to a CSV:
Title, Price, Availability, Rating
"A Light in the ...", £51.77, In stock, 3
"Starving Hearts", £13.99, In stock, 2
...
This time, the file contains ALL books — from every page.
🧑💻 Full Code: Scraping All Pages with Pagination
import requests
from bs4 import BeautifulSoup
import csv
from urllib.parse import urljoin
base_url = "https://books.toscrape.com/"
current_url = base_url
with open("scrapped.csv", "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["Title", "Price", "Availability", "Rating"])
while current_url:
response = requests.get(current_url)
soup = BeautifulSoup(response.text, "html.parser")
books = soup.find_all("article", class_="product_pod")
for book in books:
price = book.find("p", class_="price_color").get_text()
title = book.h3.a["title"]
availability = book.find("p", class_="instock availability").get_text(strip=True)
rating_map = {
"One": 1,
"Two": 2,
"Three": 3,
"Four": 4,
"Five": 5
}
rating_word = book.find("p", class_="star-rating")["class"][1]
rating = rating_map.get(rating_word, 0)
writer.writerow([title, price, availability, rating])
print("Scraped:", current_url)
next_btn = soup.find("li", class_="next")
if next_btn:
next_page_url = next_btn.a["href"]
current_url = urljoin(current_url, next_page_url)
else:
print("No next page found. Scraping complete.")
current_url = None
💬 Final Thoughts
Pagination seemed scary at first — but once I understood that it’s just a loop that updates the URL using the “next” button, it all clicked.
If you're learning web scraping, I highly recommend starting with books.toscrape.com — it's beginner-friendly and fun.
🔮 Up Next...
In my next post, I might:
Add a progress bar to show scraping progress
Export data to JSON or SQLite
Try scraping another website
Thanks for reading!!
Subscribe to my newsletter
Read articles from kasumbi phil directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
