Building an Automated Web Scraper Using Supabase Edge Functions

Table of contents
- The Problem to Solve
- The Solution: Supabase Edge Functions + PostgreSQL
- Step 1: Setting Up the Database Schema
- Step 2: Creating the Edge Function
- Step 3: Local Development and Testing
- Step 4: Deploying the Edge Function
- Step 5: Testing the Deployed Function
- Step 6: Setting Up Automatic Scheduling
- Key Benefits of This Approach
- Challenges and Solutions
- Conclusion
- Resources and Further Reading
- Next Steps

Web scraping is a common task in many applications, whether you're collecting pricing data, monitoring content updates, or aggregating information from multiple sources. While there are many ways to build a scraper, creating one that runs automatically on a schedule without managing your own server infrastructure can be challenging.
In this post, I'll walk through how I built a fully automated web scraping solution using Supabase's free tier. This solution periodically extracts data from a website and stores it in a database, all without managing any infrastructure.
The Problem to Solve
I needed to monitor a specific website for daily content updates. The content included images and text that changed daily, and I wanted to archive this information automatically. Specifically, I needed to:
Extract the image URL and title text for each day's content
Store this data in a structured database
Run the scraper automatically every few hours
Do all of this without managing servers
The Solution: Supabase Edge Functions + PostgreSQL
My solution combines several technologies:
Supabase Edge Functions - serverless functions that run our scraping code
Supabase PostgreSQL - database to store the scraped data
pg_cron - PostgreSQL extension for scheduling tasks
Cheerio - Node.js library for parsing HTML
Axios - HTTP client for making web requests
Step 1: Setting Up the Database Schema
First, I created a table in Supabase to store the scraped data:
create table daily_images (
id serial primary key,
date date unique not null,
image_url text not null,
title text,
created_at timestamp default now()
);
-- Adding a fallback URL column for cases when scraping fails
alter table daily_images add column fallback_url text;
This schema allows me to:
Store one entry per day (using the unique constraint on date)
Save both the image URL and descriptive title
Track when each entry was created
Have a fallback image URL if the scraper fails to find an image
Step 2: Creating the Edge Function
Next, I wrote a Supabase Edge Function to perform the actual scraping. Edge Functions run on Deno, which is different from Node.js, so the import syntax is slightly different:
import { serve } from 'https://deno.land/std@0.177.0/http/server.ts'
import axios from 'npm:axios'
import * as cheerio from 'npm:cheerio'
import { format } from 'npm:date-fns'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
const supabaseUrl = Deno.env.get('SUPABASE_URL') as string
const supabaseServiceKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') as string
const supabase = createClient(supabaseUrl, supabaseServiceKey)
async function scrapeContent() {
try {
console.log('Starting scraping process...')
// Fetch the target webpage
const response = await axios.get('https://example.com/daily-content')
console.log('Successfully fetched the webpage')
const $ = cheerio.load(response.data)
console.log('Loaded HTML content with cheerio')
// Format today's date to match how it appears on the site
const targetDate = format(new Date(), 'MMMM d')
console.log('Looking for date:', targetDate)
// Find the heading containing today's date
const targetHeading = $(`h2.wp-block-heading:contains('${targetDate}')`)
console.log('Found matching headings:', targetHeading.length)
if (!targetHeading.length) {
console.log('No heading found for today\'s date')
return { imageUrl: null, title: null }
}
// Extract the heading text to use as title
const title = targetHeading.text().trim()
// Find the gallery element that follows the heading
const gallery = targetHeading.next('figure.wp-block-gallery')
console.log('Found gallery:', gallery.length > 0 ? 'yes' : 'no')
// Get the first image from the gallery
const firstImage = gallery.find('img').first().attr('src')
console.log('Image URL found:', firstImage || 'none')
return { imageUrl: firstImage || null, title }
} catch (error) {
console.error('Scraping failed with error:', error.message)
return { imageUrl: null, title: null }
}
}
serve(async (req) => {
try {
// Get the image URL and title
const { imageUrl, title } = await scrapeContent()
if (imageUrl) {
// Get today's date formatted for the database
const today = format(new Date(), 'yyyy-MM-dd')
// Update the table with the new image URL and title
const { data, error } = await supabase
.from('daily_images')
.upsert(
{
date: today,
image_url: imageUrl,
title: title || `Daily Content - ${today}`,
created_at: new Date().toISOString()
},
{ onConflict: 'date' }
)
if (error) throw error
return new Response(JSON.stringify({
success: true,
imageUrl,
title
}), {
headers: { 'Content-Type': 'application/json' },
status: 200
})
} else {
// If no image is found, you might want to set a fallback URL
const fallbackUrl = "https://example.com/default-image.jpg"
const today = format(new Date(), 'yyyy-MM-dd')
// Update with fallback information
const { data, error } = await supabase
.from('daily_images')
.upsert(
{
date: today,
image_url: null,
fallback_url: fallbackUrl,
title: `Daily Content (Fallback) - ${today}`,
created_at: new Date().toISOString()
},
{ onConflict: 'date' }
)
if (error) throw error
return new Response(JSON.stringify({
success: false,
message: 'No image found, fallback used',
fallbackUrl
}), {
headers: { 'Content-Type': 'application/json' },
status: 404
})
}
} catch (error) {
return new Response(JSON.stringify({ success: false, error: error.message }), {
headers: { 'Content-Type': 'application/json' },
status: 500
})
}
})
This function:
Searches a webpage for content related to today's date
Extracts the image URL and title text
Updates our Supabase database with the information
Handles failures gracefully by using a fallback URL
we have added logs because we can check logs of the deployed edge function in supabase dashboard
Step 3: Local Development and Testing
Before deploying, I needed to set up a local development environment to test my Edge Function. Supabase Edge Functions run on Deno (not Node.js), so the local setup needs to reflect this.
Setting Up the Local Environment
First, I installed Deno:
# On macOS or Linux
curl -fsSL https://deno.land/x/install/install.sh | sh
# On Windows (using PowerShell)
iwr https://deno.land/x/install/install.ps1 -useb | iex
# Or using Homebrew
brew install deno
Next, I needed to create a local Supabase setup for testing. The Supabase CLI uses Docker to run a local development environment:
# Install Docker if you don't have it
# Instructions vary by OS: https://docs.docker.com/get-docker/
# Install Supabase CLI
npm install -g supabase
# Initialize a new Supabase project
mkdir my-scraper && cd my-scraper
supabase init
# Start the local Supabase stack
supabase start
This command spins up Docker containers with PostgreSQL, PostgREST, GoTrue, and other Supabase services locally.
Creating and Testing the Edge Function Locally
With the local environment running, I created and tested my Edge Function:
# Create a new Edge Function
supabase functions new scrape-daily-image
# Add my code to the function file at:
# supabase/functions/scrape-daily-image/index.ts
# Set local environment variables
supabase functions secrets set SUPABASE_URL="http://localhost:54321"
supabase functions secrets set SUPABASE_SERVICE_ROLE_KEY="[local-service-key-from-env-file]"
# The local service key can be found in the .env file created by supabase start
# Run the function locally
supabase functions serve --no-verify-jwt
With the function running locally, I tested it with a separate terminal:
curl -X POST http://localhost:54321/functions/v1/scrape-daily-image \
-H "Content-Type: application/json"
Unit Testing Edge Functions
For more robust testing, Supabase Edge Functions can be unit tested with Deno's built-in testing framework.
For more detailed information on testing Supabase Edge Functions, check out this comprehensive guide on testing Supabase Edge Functions with Deno Test and Supabase's official unit testing documentation.
Step 4: Deploying the Edge Function
With local testing complete, I was ready to deploy the function to Supabase:
# Login to Supabase
supabase login
# Deploy the function
supabase functions deploy scrape-daily-image --no-verify-jwt
The --no-verify-jwt
flag allows the function to be called without authentication. For production use with sensitive operations, you might want to require authentication by omitting this flag.
After deployment, the function is available at https://your-project-ref.functions.supabase.co/scrape-daily-image
.
Environment Variables for Production
I also needed to set environment variables for the production environment:
supabase secrets set SUPABASE_URL="https://your-project-ref.supabase.co" \
--project-ref your-project-ref
supabase secrets set SUPABASE_SERVICE_ROLE_KEY="your-service-role-key" \
--project-ref your-project-ref
These environment variables allow the function to connect to your production Supabase database.
Step 5: Testing the Deployed Function
Once deployed, I tested the function by invoking it directly:
curl -X POST https://your-project-ref.functions.supabase.co/scrape-daily-image \
-H "Authorization: Bearer your-anon-key" \
-H "Content-Type: application/json"
This ensured that my function could:
Successfully connect to the target website
Parse the HTML and extract the needed information
Save the data to my Supabase database
Step 6: Setting Up Automatic Scheduling
The final piece was setting up automatic scheduling. For this, I used Supabase's PostgreSQL database with the pg_cron extension:
-- Enable the required extensions
CREATE EXTENSION IF NOT EXISTS pg_cron;
CREATE EXTENSION IF NOT EXISTS pg_net;
-- Create a function to call the edge function
CREATE OR REPLACE FUNCTION call_scrape_edge_function()
RETURNS void AS $$
BEGIN
PERFORM net.http_post(
url := 'https://your-project-ref.functions.supabase.co/scrape-daily-image',
headers := '{"Authorization": "Bearer your-anon-key", "Content-Type": "application/json"}',
body := '{}'
);
END;
$$ LANGUAGE plpgsql;
-- Schedule the function to run every 3 hours
SELECT cron.schedule('scrape-daily-image', '0 */3 * * *', 'SELECT call_scrape_edge_function()');
After running this SQL, I verified that the job was correctly scheduled by checking the cron job table:
SELECT * FROM cron.job;
This returned
| jobid | schedule | command | nodename | nodeport | database | username | active | jobname |
| ----- | ----------- | ---------------------------------- | --------- | -------- | -------- | -------- | ------ | ------------------ |
| 1 | 0 */3 * * * | SELECT call_scrape_edge_function() | localhost | 5432 | postgres | postgres | true | scrape-daily-image |
The scraper was now fully automated and would run every 3 hours!
Key Benefits of This Approach
Using Supabase for this project provided several advantages:
Zero infrastructure management - No servers to maintain or monitor
Cost-effective - Runs on Supabase's free tier
Automatic scheduling - Built-in PostgreSQL scheduling with pg_cron
Persistent storage - Data is stored directly in a relational database
Easily scalable - Can handle increased load as needed
Simple deployment - Straightforward CLI-based deployment process
Modern development experience - TypeScript support and Deno runtime
Challenges and Solutions
Like any project, I encountered a few challenges:
Deno vs Node.js - Supabase Edge Functions use Deno, not Node.js, requiring slightly different import syntax and environment setup.
Solution: Used Deno-compatible imports and tested locally with Deno before deployment
Example:
import axios from 'npm:axios'
instead ofimport axios from 'axios'
Missing pg_net extension - Initially got an error about the "net" schema not existing.
Solution: Added
CREATE EXTENSION IF NOT EXISTS pg_net;
to enable HTTP requests from PostgreSQLError message:
ERROR: 3F000: schema "net" does not exist
Error handling - Needed to handle cases where the expected content wasn't found.
- Solution: Added fallback logic and proper error handling to ensure the process didn't fail silently
Local development environment - Setting up a local testing environment with Deno requires some additional steps.
- Solution: Used Docker with Supabase CLI's
supabase start
command to create a fully functional local development environment
- Solution: Used Docker with Supabase CLI's
Debugging deployed functions - Once deployed, debugging can be challenging.
- Solution: Added extensive logging using
console.log()
statements and usedsupabase functions logs scrape-daily-image
to view logs
- Solution: Added extensive logging using
Conclusion
This project demonstrates how Supabase can be used to create a fully automated web scraping solution without managing any infrastructure. By combining Edge Functions, PostgreSQL, and pg_cron, I was able to build a reliable system that:
Regularly scrapes a website for daily content
Stores that content in a structured database
Runs automatically on a schedule
Handles failures gracefully
All of this was achieved using Supabase's free tier, making it a cost-effective solution for many small to medium projects.
What's particularly impressive is how this solution leverages the power of PostgreSQL not just as a database, but as a scheduler (via pg_cron) and as an HTTP client (via pg_net). This creates a truly serverless architecture where all components, from scheduling to execution to storage, are handled without managing any infrastructure.
Resources and Further Reading
For those looking to implement a similar solution, here are some valuable resources:
Next Steps
To extend this project further, I could:
Add notifications when new content is found (via email or webhooks)
Create a simple front-end to display the archived content
Implement more sophisticated error handling and retries
Add authentication to protect the scraped data
Set up monitoring to alert me if the scraper fails consistently
What scraping projects would you build with this approach? Let me know in the comments!
Subscribe to my newsletter
Read articles from Subramanya M Rao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Subramanya M Rao
Subramanya M Rao
CS Undergrad pursuing Web Development. Keen Learner & Tech Enthusiast.