πŸ“° Scraping and Summarizing Kenyan Political News with Python

kasumbi philkasumbi phil
2 min read

In this blog, I’ll walk you through how I built a Python script that scrapes political news from Standard Media Kenya, summarizes each article using NLP, and saves the results to a .txt file.


πŸ› οΈ What I Built

I wanted a simple tool to help me stay up to date with Kenyan politics β€” without manually reading every article.

So I built a Python script that:

  • Scrapes article links from the Politics section

  • Downloads and parses the full article using newspaper3k

  • Summarizes each article into 3 sentences using Sumy (LexRank)

  • Saves the title, URL, summary, and full text to a .txt file


🧠 Tools and Libraries Used

  • requests + BeautifulSoup β€” for scraping

  • newspaper3k β€” for extracting full article text

  • sumy β€” for summarizing using LexRank

  • nltk β€” for tokenization

  • time β€” for basic delays (not used for auto-scheduling now)


⚠️ What I Struggled With

πŸ”— Problem: Many URLs were either incomplete or unusable
βœ… Fix: I checked and cleaned URLs early before passing to Article().

πŸ“„ Problem: Some articles didn’t load or were empty
βœ… Fix: Wrapped downloads in a try-except block to skip broken links safely.

πŸ“ Problem: Summaries were hard to read in the output
βœ… Fix: Clean formatting and line breaks made the .txt readable.


πŸ“ Output

The .txt file contains:

  • βœ… Title of each article

  • βœ… Link to the article

  • βœ… 3-sentence summary

Each article is neatly separated with clear headings and lines.


πŸ“¦ GitHub Repo

πŸ”— View the full project on GitHub


πŸ’‘ Final Thoughts

This small but powerful project helped me improve my scraping and NLP skills. If you’re looking to automate news consumption or build a simple NLP tool, this is a great place to start.

0
Subscribe to my newsletter

Read articles from kasumbi phil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kasumbi phil
kasumbi phil