Web scraping is the process of automatically extracting data from websites. Instead of clicking through pages and manually copying information, web scraping allows you to gather large amounts of data quickly and efficiently. This is especially handy for things like market research, data analysis, or even just feeding your curiosity.

Step 1: Set Up Your Environment

First things first, make sure you have Python installed. Once you’re all set up, you can install the libraries you’ll need:

pip install requests beautifulsoup4

Step 2: Fetching a Web Page

Now, let’s get our hands dirty. We’ll start by fetching a web page using the requests library:

import requests

url = 'https://example.com'
response = requests.get(url)

if response.status_code == 200:
    print("Successfully fetched the page!")
else:
    print("Oops! Something went wrong.")

Step 3: Parsing the HTML

After fetching the page, we can use Beautiful Soup to make sense of the HTML content:

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')

Step 4: Extracting the Data You Need

Now for the fun part—extracting data! Let’s say you want to grab all the headings on the page:

headings = soup.find_all('h1')
for heading in headings:
    print(heading.text.strip())

Step 5: Saving Your Data

After you’ve extracted the information, you might want to keep it for later. Here’s how you can save those headings to a CSV file:

import csv

with open('headings.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Headings'])
    for heading in headings:
        writer.writerow([heading.text.strip()])

Day 14 - Fetching Data Using Web Scraping

Table of contents