Deep Dive into LLM Engineering #01: Simplifying Web Summaries with AI


1. Introduction
In my latest season, I’ve decided to dive into the fascinating world of artificial intelligence. Coming from a broad technological background in application development, cloud architecture, and more, I thought: “Why not explore this field?”
For the past few months, I’ve been working on my own AI project, and now I’ve decided to deepen my understanding of certain topics with the goal of transforming myself into an LLM (Large Language Model) engineer. Let’s see where this takes me, haha!
My idea is to share my learning journey with you. Of course, I’ll try to start from scratch so everyone can follow along. In this post, I aim to show you a fairly simple program using OpenAI. If you want to join in, I recommend getting an API key from their official website. Add $10 to your account for practice, and believe me: $10 will be more than enough for local work. I repeat, local work! Haha.
Repo: https://github.com/arodriguezp2003/web-content-summarizer
2. The Goal: Building a Web Content Summarizer
The goal of this project is to create a program that takes a URL as input, processes its content, and generates a concise summary using OpenAI’s models. While this task might seem straightforward, it requires carefully combining multiple components:
Extracting relevant content from a website.
Designing effective prompts to guide the AI.
Managing API interactions and error handling.
3. Core Components of the Program
a) Installation
Open the terminal and create a folder
Create a virtual environment in Python:
python3 -m venv llm #activate for mac source llm/bin/activate
install dependencies:
pip install requests beautifulsoup4 python-dotenv openai
create .env file
OPENAI_API_KEY="your api key"
b) Website Content Extraction (Website
Class)
The Website
class is responsible for downloading, cleaning, and preparing the content of a webpage. Using BeautifulSoup, we extract the page’s title and main text while filtering out unnecessary elements like scripts and styles.
Here’s how the Website
class works:
# core/website.py
import requests
from bs4 import BeautifulSoup
class Website:
def __init__(self, url):
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
self.title = soup.title.string if soup.title else 'No title found'
for tag in soup(['script', 'style', 'img', 'input']):
tag.decompose()
self.text = ' '.join(soup.body.stripped_strings) if soup.body else ''
b) Prompt Management (Prompts
Class)
The Prompts
class organizes interactions with OpenAI, crafting tailored messages based on the content of a website. It includes a system_prompt
to guide the AI and a messages_for
method to structure the conversation.
Here’s an example snippet:
#core/prompts.py
class Prompts:
def system_prompt(self):
return """You are an assistant that analyzes the contents of a website
and provides a short summary, ignoring text that might be navigation related.
Respond in markdown."""
def user_prompt_for(self, website):
user_prompt = f"You are looking at a website titled {website.title}"
user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
user_prompt += website.text
return user_prompt
def messages_for(self, website):
return [
{"role": "system", "content": self.system_prompt()},
{"role": "user", "content": self.user_prompt_for(website)}
]
c) Summarizing with OpenAI
The summarize
function ties everything together. It extracts content using the Website
class, prepares the prompt via Prompts
, and sends a request to OpenAI’s API to generate the summary.
Here’s how it looks:
# /main.py
import os
from dotenv import load_dotenv
from openai import OpenAI
from core.website import Website
from core.prompts import Prompts
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
raise ValueError('API key is missing')
openai = OpenAI()
prompts = Prompts()
def summarize(url):
website = Website(url)
message = prompts.messages_for(website)
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=message
)
return response.choices[0].message.content
print(summarize("https://alerodriguez.dev"))
4. Results
Running this program produces a concise summary of any webpage’s content. For example, summarizing https://alerodriguez.dev might yield something like:
# Summary of Alejandro Rodríguez's Team Blog
Alejandro Rodríguez's Blog features insightful posts that document lessons learned from practical experiences in cloud computing, particularly with Google Cloud services.
## Recent Posts
- **Lesson Learned #02: Docker Images for Google Cloud Run on Mac M1 (ARM)**
*Date: Nov 14, 2024*
A brief reflection on the challenges faced when applying theoretical knowledge to practice, specifically relating to Docker images on Mac M1.
- **Lesson Learned #01: Cloud Run with Cloud Pub/Sub**
*Date: Nov 13, 2024*
An exploration of the author's experience integrating Google Cloud Pub/Sub with Cloud Run within a microservices architecture. This post discusses the project's context and its operational complexities.
Overall, the blog serves as a resource for those interested in cloud technologies, offering practical insights and lessons from real-world applications.
5. Lessons Learned
Managing Content Complexity: Extracting clean, relevant content from web pages is key to effective AI interactions.
Crafting Effective Prompts: Well-structured prompts significantly impact the quality of AI-generated responses.
API Handling: Robust error handling and API key management are essential for smooth operation.
6. What’s Next?
This introduction lays the foundation for more advanced use cases of LLMs. In upcoming posts, I plan to:
Explore embedding generation and semantic search.
Dive into fine-tuning prompts for specific applications.
Experiment with multi-layered NLP pipelines.
Stay tuned for the next post in Deep Dive into LLM Engineering Series! If you’re building something similar or have ideas to share, drop a comment. Let’s learn together! 🚀
Subscribe to my newsletter
Read articles from Alejandro Rodríguez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Alejandro Rodríguez
Alejandro Rodríguez
I’m Alejandro, a tech leader with a solid track record of transforming millions of lives through innovative digital solutions. As a Tech Lead at Banco Falabella, I spearheaded the implementation of Apple Pay in Chile, positioning the bank as a pioneer in mobile payments. I also led the launch of Transfiya in Colombia, a platform now enabling over 15 million users to make instant transfers, representing 44% of the country’s banked population. My education in Digital Transformation at MIT strengthens my commitment to continuous learning and my ability to lead at the intersection of technology and social impact. Currently based in Malta, I’m improving my English and expanding my professional horizons to explore new global perspectives. With a blend of advanced technical skills and management expertise, I lead high-performing teams toward innovation, fostering a collaborative culture where each team member contributes and grows. My projects not only improve operational efficiency but also create a tangible impact on millions of lives. My vision is clear: to break technological barriers and build a more connected and efficient future.I’m Alejandro, a tech leader with a solid track record of transforming millions of lives through innovative digital solutions. As a Tech Lead at Banco Falabella, I spearheaded the implementation of Apple Pay in Chile, positioning the bank as a pioneer in mobile payments. I also led the launch of Transfiya in Colombia, a platform now enabling over 15 million users to make instant transfers, representing 44% of the country’s banked population. My education in Digital Transformation at MIT strengthens my commitment to continuous learning and my ability to lead at the intersection of technology and social impact. Currently based in Malta, I’m improving my English and expanding my professional horizons to explore new global perspectives. With a blend of advanced technical skills and management expertise, I lead high-performing teams toward innovation, fostering a collaborative culture where each team member contributes and grows. My projects not only improve operational efficiency but also create a tangible impact on millions of lives. My vision is clear: to break technological barriers and build a more connected and efficient future.