Page Zen: The Open-Source Article Cleaning API You've Been Waiting For

In today's information-rich world, we're constantly bombarded with cluttered web articles filled with ads, popups, navigation menus, and other distractions. What if you could extract just the essential content from any article with a simple API call? Meet Page Zen - an open-source, self-hostable solution that transforms messy web articles into clean, readable content.
๐ What is Page Zen?
Page Zen is a powerful Go-based API service that takes any article URL and returns clean, distraction-free content in multiple formats. Whether you're building a reading app, content aggregator, or just want to save articles without the clutter, Page Zen has you covered.
Key Features
โ
Clean Article Extraction - Removes ads, navigation, social widgets, and other noise
โ
Multiple Output Formats - Get content as clean text or markdown
โ
Open Graph Metadata - Extract rich social media metadata
โ
Medium Article Support - Works perfectly with Medium and other popular platforms
โ
Self-Hostable - Complete control over your data and infrastructure
โ
Open Source - MIT licensed, community-driven development
๐ Why Choose Page Zen?
1. Open Source & Self-Hostable
Unlike proprietary services that lock you into their ecosystem, Page Zen is completely open source. You can:
Host it on your own infrastructure
Customize it for your specific needs
Never worry about API rate limits or service shutdowns
Maintain complete control over your data
2. Works with Any Article Platform
Page Zen intelligently handles content from various sources:
Medium articles โจ
News websites
Blog posts
Technical documentation
And virtually any web article!
3. Rich Metadata Extraction
Beyond just cleaning content, Page Zen extracts comprehensive Open Graph metadata:
Article title and description
Author information
Publication dates
Social media images
Twitter Card data
And much more!
๐ ๏ธ Easy to Deploy
Getting started with Page Zen is incredibly simple. The project includes Docker support for easy deployment:
# Clone the repository
git clone https://github.com/rohithgilla12/page-zen.git
# Run with Docker Compose
docker-compose up -d
That's it! Your article cleaning API is now running locally.
๐ API Usage Examples
Extract Article Content
curl -X POST http://localhost:8080/extract \
-H "Content-Type: application/json" \
-d '{"url": "https://itnext.io/essential-cli-tui-tools-for-developers-7e78f0cd27db", "include_markdown":true}'
Extract Open Graph Data Only
curl -X POST http://localhost:8080/opengraph \
-H "Content-Type: application/json" \
-d '{"url": "https://dev.to/gillarohith/develop-url-shortener-application-with-redwood-js-3cf7 "}'
Response:
{
"url": "https://dev.to/gillarohith/develop-url-shortener-application-with-redwood-js-3cf7",
"open_graph": {
"title": "Develop URL shortener application with Redwood JS.",
"description": "Develop URL shortener application with RedwoodJS Introduction What is...",
"image": "https://media2.dev.to/dynamic/image/width=1000,height=500,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77phvxr1c3i00fvv0jly.png",
"url": "https://dev.to/gillarohith/develop-url-shortener-application-with-redwood-js-3cf7",
"type": "article",
"site_name": "DEV Community",
"twitter_card": "summary_large_image",
"twitter_site": "@thepracticaldev",
"twitter_creator": "@gillarohith",
"twitter_title": "Develop URL shortener application with Redwood JS.",
"twitter_description": "Develop URL shortener application with RedwoodJS Introduction What is..."
},
"success": true
}
๐ฏ Perfect Use Cases
Content Aggregators: Build clean RSS feeds or news aggregators
Reading Apps: Create distraction-free reading experiences
Research Tools: Extract clean content for analysis
Social Media Tools: Get rich preview data for link sharing
Documentation: Convert web articles to clean markdown
๐ง Advanced Features
Page Zen goes beyond basic article extraction:
Image Processing: Converts complex picture elements to simple img tags
URL Resolution: Handles relative URLs and converts them to absolute paths
Smart Content Detection: Uses Mozilla's Readability algorithm for accurate content extraction
Configurable Cleaning: Remove specific elements based on your needs
Comprehensive Logging: Built-in structured logging for debugging and monitoring
๐ Join the Community
Page Zen is more than just a tool - it's a community-driven project that welcomes contributions:
๐ Report bugs and suggest features
๐ป Contribute code and improvements
๐ Improve documentation
โญ Star the repo to show your support
๐ Get Started Today
Ready to clean up the web? Here's how to get started:
Try it locally: Clone the repo and run with Docker
Deploy to production: Use the included Dockerfile for easy deployment
Integrate: Start making API calls from your application
Customize: Fork the project and adapt it to your needs
Links:
๐ GitHub Repository
๐ API Documentation
Page Zen - Because the web deserves to be readable.
Subscribe to my newsletter
Read articles from Rohith Gilla directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
