The Day Our Professor Dropped a Bomb

In our last machine learning class, the professor made an unexpected announcement:
"All of you will need to complete a machine learning project... but with no prepared data!"

My first thought: "OMG! ☣️ Does this mean I need to find CSV files somewhere? What's the plan?"
When he saw our confused faces, he laughed and said: "You'll need to collect it yourselves through web scraping." Then the bell rang.

I couldn't wait for the next class to learn more!

Like any typical coding session, I started with tea in hand, staring at my screen. I kept thinking: "There must be a better way than manual copying and pasting 🤔."

That's when I discovered Beautiful Soup—a Python library that magically turns websites into structured data with just a few lines of code.

Without hesitation, I rushed to freecodecamp (my lifesaver!) and searched for web scraping tutorials.

My Two-Part Learning Journey

1. Baby Steps: Scraping a Local HTML File

(Okay, I'll admit—I borrowed the HTML structure from the tutorial 🤣)

Created this test file:

<div class="card">
  <h5 class="card-title">Python for beginners</h5>
  <a href="#" class="btn btn-primary">Start for 20$</a>
</div>
<!-- ... plus two more course cards -->

The Magic Code:

soup = BeautifulSoup(html_content, 'lxml')
courses = soup.find_all('div', class_='card')

for course in courses:
    name = course.h5.text
    price = course.a.text.split()[-1]
    print(f"{name} costs {price}")

Output:

Python for beginners costs 20$
Python Web Development costs 50$
Python Machine Learning costs 100$

My Reaction:

Mind Blown GIF

"IT WORKED ON THE FIRST TRY! 🎉"

2. The Real Challenge: Scraping Live Job Listings

I targeted TimesJobs for Python jobs.

Reality Check:

Real websites use dynamic content
Their HTML has weird class names (seriously, who names these?!)
They fight back against scrapers

My Detective Work:

Used Ctrl+Shift+I to inspect the page
Discovered:
- Jobs were wrapped in <li> tags
- Skills hid in <div class="srp-keyskills">

The Breakthrough Code:

def find_jobs(skill):
    url = f"https://m.timesjobs.com/...?txtKeywords={skill}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')

    jobs = []
    for job in soup.find_all('li'):
        skills = job.find('div', class_='srp-keyskills')
        if skills and skill.lower() in skills.text.lower():
            jobs.append({
                'title': job.find('h3').text.strip(),
                'company': job.find('span', class_='srp-comp-name').text.strip(),
                'skills': [skill.text.strip() for skill in skills.find_all('a')]
            })
    return jobs

"When it works, you won't stop—I played with the data for over an hour! It's like magic ✨"

Making It Useful: CSV Export

import csv

def save_jobs(jobs, filename="jobs.csv"):
    with open(filename, 'w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=['title', 'company', 'skills'])
        writer.writeheader()
        writer.writerows(jobs)

Now I could analyze jobs in Excel or Python!

Final Thoughts

Web scraping feels like a superpower 💥. With a few lines of code, you can harvest insights from the entire internet.

"While others wait for the next lecture, I'm already feeling like a coding wizard! 🧙♂️"

What will YOU scrape first? Let me know in the comments!

🔗 Full Code: GitHub Repo
🐦 Follow Me: @BoussanniEl

#Python #WebScraping #Automation #LearningToCode

Web Scraping with Python - Beautiful Soup