Web Scraping with Python - Beautiful Soup

The Day Our Professor Dropped a Bomb

In our last machine learning class, the professor made an unexpected announcement:
"All of you will need to complete a machine learning project... but with no prepared data!"

My first thought: "OMG! โ˜ฃ๏ธ Does this mean I need to find CSV files somewhere? What's the plan?"
When he saw our confused faces, he laughed and said: "You'll need to collect it yourselves through web scraping." Then the bell rang.

I couldn't wait for the next class to learn more!

Like any typical coding session, I started with tea in hand, staring at my screen. I kept thinking: "There must be a better way than manual copying and pasting ๐Ÿค”."

That's when I discovered Beautiful Soupโ€”a Python library that magically turns websites into structured data with just a few lines of code.

Without hesitation, I rushed to freecodecamp (my lifesaver!) and searched for web scraping tutorials.


My Two-Part Learning Journey

1. Baby Steps: Scraping a Local HTML File

(Okay, I'll admitโ€”I borrowed the HTML structure from the tutorial ๐Ÿคฃ)

Created this test file:

<div class="card">
  <h5 class="card-title">Python for beginners</h5>
  <a href="#" class="btn btn-primary">Start for 20$</a>
</div>
<!-- ... plus two more course cards -->

The Magic Code:

soup = BeautifulSoup(html_content, 'lxml')
courses = soup.find_all('div', class_='card')

for course in courses:
    name = course.h5.text
    price = course.a.text.split()[-1]
    print(f"{name} costs {price}")

Output:

Python for beginners costs 20$
Python Web Development costs 50$
Python Machine Learning costs 100$

My Reaction:

Mind Blown GIF

"IT WORKED ON THE FIRST TRY! ๐ŸŽ‰"


2. The Real Challenge: Scraping Live Job Listings

I targeted TimesJobs for Python jobs.

Reality Check:

  • Real websites use dynamic content

  • Their HTML has weird class names (seriously, who names these?!)

  • They fight back against scrapers

My Detective Work:

  1. Used Ctrl+Shift+I to inspect the page

  2. Discovered:

    • Jobs were wrapped in <li> tags

    • Skills hid in <div class="srp-keyskills">

The Breakthrough Code:

def find_jobs(skill):
    url = f"https://m.timesjobs.com/...?txtKeywords={skill}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')

    jobs = []
    for job in soup.find_all('li'):
        skills = job.find('div', class_='srp-keyskills')
        if skills and skill.lower() in skills.text.lower():
            jobs.append({
                'title': job.find('h3').text.strip(),
                'company': job.find('span', class_='srp-comp-name').text.strip(),
                'skills': [skill.text.strip() for skill in skills.find_all('a')]
            })
    return jobs

"When it works, you won't stopโ€”I played with the data for over an hour! It's like magic โœจ"


Making It Useful: CSV Export

import csv

def save_jobs(jobs, filename="jobs.csv"):
    with open(filename, 'w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=['title', 'company', 'skills'])
        writer.writeheader()
        writer.writerows(jobs)

Now I could analyze jobs in Excel or Python!


Final Thoughts

Web scraping feels like a superpower ๐Ÿ’ฅ. With a few lines of code, you can harvest insights from the entire internet.

"While others wait for the next lecture, I'm already feeling like a coding wizard! ๐Ÿง™โ™‚๏ธ"

What will YOU scrape first? Let me know in the comments!

๐Ÿ”— Full Code: GitHub Repo
๐Ÿฆ Follow Me: @BoussanniEl

#Python #WebScraping #Automation #LearningToCode


0
Subscribe to my newsletter

Read articles from ASSIA EL BOUSSANNI directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ASSIA EL BOUSSANNI
ASSIA EL BOUSSANNI

๐ŸŽ“ Master's student in Big Data & Data Science | ๐Ÿš€ Focused on data science, big data, machine learning, and development. Passionate about designing scalable systems and solving real-world problems with tech innovation. ๐ŸŒŸ On my blog, I break down complex concepts in system design and data science to help others grow. Letโ€™s learn and build together! ๐Ÿ’ก