Web Scraping with Python - Beautiful Soup


The Day Our Professor Dropped a Bomb
In our last machine learning class, the professor made an unexpected announcement:
"All of you will need to complete a machine learning project... but with no prepared data!"
My first thought: "OMG! โฃ๏ธ Does this mean I need to find CSV files somewhere? What's the plan?"
When he saw our confused faces, he laughed and said: "You'll need to collect it yourselves through web scraping." Then the bell rang.
I couldn't wait for the next class to learn more!
Like any typical coding session, I started with tea in hand, staring at my screen. I kept thinking: "There must be a better way than manual copying and pasting ๐ค."
That's when I discovered Beautiful Soupโa Python library that magically turns websites into structured data with just a few lines of code.
Without hesitation, I rushed to freecodecamp (my lifesaver!) and searched for web scraping tutorials.
My Two-Part Learning Journey
1. Baby Steps: Scraping a Local HTML File
(Okay, I'll admitโI borrowed the HTML structure from the tutorial ๐คฃ)
Created this test file:
<div class="card">
<h5 class="card-title">Python for beginners</h5>
<a href="#" class="btn btn-primary">Start for 20$</a>
</div>
<!-- ... plus two more course cards -->
The Magic Code:
soup = BeautifulSoup(html_content, 'lxml')
courses = soup.find_all('div', class_='card')
for course in courses:
name = course.h5.text
price = course.a.text.split()[-1]
print(f"{name} costs {price}")
Output:
Python for beginners costs 20$
Python Web Development costs 50$
Python Machine Learning costs 100$
My Reaction:
"IT WORKED ON THE FIRST TRY! ๐"
2. The Real Challenge: Scraping Live Job Listings
I targeted TimesJobs for Python jobs.
Reality Check:
Real websites use dynamic content
Their HTML has weird class names (seriously, who names these?!)
They fight back against scrapers
My Detective Work:
Used Ctrl+Shift+I to inspect the page
Discovered:
Jobs were wrapped in
<li>
tagsSkills hid in
<div class="srp-keyskills">
The Breakthrough Code:
def find_jobs(skill):
url = f"https://m.timesjobs.com/...?txtKeywords={skill}"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
jobs = []
for job in soup.find_all('li'):
skills = job.find('div', class_='srp-keyskills')
if skills and skill.lower() in skills.text.lower():
jobs.append({
'title': job.find('h3').text.strip(),
'company': job.find('span', class_='srp-comp-name').text.strip(),
'skills': [skill.text.strip() for skill in skills.find_all('a')]
})
return jobs
"When it works, you won't stopโI played with the data for over an hour! It's like magic โจ"
Making It Useful: CSV Export
import csv
def save_jobs(jobs, filename="jobs.csv"):
with open(filename, 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=['title', 'company', 'skills'])
writer.writeheader()
writer.writerows(jobs)
Now I could analyze jobs in Excel or Python!
Final Thoughts
Web scraping feels like a superpower ๐ฅ. With a few lines of code, you can harvest insights from the entire internet.
"While others wait for the next lecture, I'm already feeling like a coding wizard! ๐งโ๏ธ"
What will YOU scrape first? Let me know in the comments!
๐ Full Code: GitHub Repo
๐ฆ Follow Me: @BoussanniEl
#Python #WebScraping #Automation #LearningToCode
Subscribe to my newsletter
Read articles from ASSIA EL BOUSSANNI directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
ASSIA EL BOUSSANNI
ASSIA EL BOUSSANNI
๐ Master's student in Big Data & Data Science | ๐ Focused on data science, big data, machine learning, and development. Passionate about designing scalable systems and solving real-world problems with tech innovation. ๐ On my blog, I break down complex concepts in system design and data science to help others grow. Letโs learn and build together! ๐ก