Day 14 - Fetching Data Using Web Scraping
Web scraping is the process of automatically extracting data from websites. Instead of clicking through pages and manually copying information, web scraping allows you to gather large amounts of data quickly and efficiently. This is especially handy for things like market research, data analysis, or even just feeding your curiosity.
Step 1: Set Up Your Environment
First things first, make sure you have Python installed. Once you’re all set up, you can install the libraries you’ll need:
pip install requests beautifulsoup4
Step 2: Fetching a Web Page
Now, let’s get our hands dirty. We’ll start by fetching a web page using the requests
library:
import requests
url = 'https://example.com'
response = requests.get(url)
if response.status_code == 200:
print("Successfully fetched the page!")
else:
print("Oops! Something went wrong.")
Step 3: Parsing the HTML
After fetching the page, we can use Beautiful Soup to make sense of the HTML content:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
Step 4: Extracting the Data You Need
Now for the fun part—extracting data! Let’s say you want to grab all the headings on the page:
headings = soup.find_all('h1')
for heading in headings:
print(heading.text.strip())
Step 5: Saving Your Data
After you’ve extracted the information, you might want to keep it for later. Here’s how you can save those headings to a CSV file:
import csv
with open('headings.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Headings'])
for heading in headings:
writer.writerow([heading.text.strip()])
Subscribe to my newsletter
Read articles from Nischal Baidar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by