The Ultimate Guide to Generating PDFs from HTML with Node.js and Puppeteer


Ever needed to generate a PDF invoice, a report, or an e-ticket from your web application? It's a common requirement, but turning dynamic HTML into a pixel-perfect PDF can be surprisingly tricky.
In this guide, we'll walk you through the entire process of building a robust PDF generation script using Node.js and Puppeteer, a powerful headless browser tool. We'll cover the basics and also touch on the real-world challenges you might face in a production environment.
Prerequisites
Before we begin, make sure you have Node.js (version 18 or higher) and npm installed on your machine.
Step 1: Setting Up Your Project
First, let's create a new project directory and initialize it with a package.json
file.
mkdir pdf-generator
cd pdf-generator
npm init -y
Next, we need to install Puppeteer. This package will download a recent version of Chromium that we can control programmatically.
npm install puppeteer
Step 2: The Core PDF Generation Logic
Now for the fun part. Create a file named generate.js and add the following code. This script defines a function that takes a string of HTML, launches a headless browser, and saves the rendered content as a PDF.
// generate.js
import puppeteer from 'puppeteer';
import fs from 'fs';
// Example HTML content for an invoice
const invoiceHtml = `
<html>
<head>
<style>
body { font-family: Arial, sans-serif; margin: 40px; }
.invoice-header { text-align: center; margin-bottom: 40px; }
.invoice-header h1 { margin: 0; }
.item-table { width: 100%; border-collapse: collapse; }
.item-table th, .item-table td { border: 1px solid #ddd; padding: 8px; }
.item-table th { background-color: #f2f2f2; }
.total { text-align: right; margin-top: 20px; font-weight: bold; }
</style>
</head>
<body>
<div class="invoice-header">
<h1>Invoice #123</h1>
<p>Issued: August 20, 2025</p>
</div>
<table class="item-table">
<thead>
<tr><th>Item</th><th>Quantity</th><th>Price</th></tr>
</thead>
<tbody>
<tr><td>Web Development Services</td><td>10</td><td>$150.00</td></tr>
<tr><td>API Consulting</td><td>5</td><td>$200.00</td></tr>
</tbody>
</table>
<div class="total">
Total: $2500.00
</div>
</body>
</html>
`;
async function generatePdfFromHtml(htmlContent) {
let browser;
try {
console.log('Launching browser...');
browser = await puppeteer.launch();
console.log('Opening new page...');
const page = await browser.newPage();
console.log('Setting page content...');
await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
// To ensure all assets like fonts or images are loaded
await page.emulateMediaType('print');
console.log('Generating PDF...');
const pdfBuffer = await page.pdf({
format: 'A4',
printBackground: true
});
console.log('Saving PDF...');
fs.writeFileSync('invoice.pdf', pdfBuffer);
console.log('PDF generated successfully: invoice.pdf');
} catch (error) {
console.error('Error generating PDF:', error);
} finally {
if (browser) {
console.log('Closing browser...');
await browser.close();
}
}
}
generatePdfFromHtml(invoiceHtml);
You can run the script from your teminal:
node generate.js
After a few moments, you'll have a beautifully rendered invoice.pdf
in your project folder!
Step 3: Optimizing for Performance: From Cold to Warm Starts
Your generate.js
script is a great starting point, but it has a major performance bottleneck: it launches a new browser instance for every single PDF. This "cold start" can take several seconds and is very inefficient for a real application like a web server.
Let's refactor our code to use a "warm," singleton browser instance that is launched once and reused for all subsequent requests.
First, let's turn our logic into an Express server. If you haven't already, install Express:
npm install express
Now, create an server.js
file:
// server.js
import express from 'express';
import puppeteer from 'puppeteer';
const app = express();
app.use(express.json()); // Middleware to parse JSON bodies
let browser; // We'll hold the browser instance here
// This is an async IIFE (Immediately Invoked Function Expression)
// to launch the browser at the start.
(async () => {
console.log("Launching a warm browser instance...");
browser = await puppeteer.launch({
headless: "new",
// Important for running in Docker
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
})();
app.post('/convert', async (req, res) => {
if (!browser) {
return res.status(503).send('Browser is not ready.');
}
const { html } = req.body;
if (!html) {
return res.status(400).send('HTML content is required.');
}
const page = await browser.newPage();
try {
await page.setContent(html, { waitUntil: 'networkidle0' });
await page.emulateMediaType('print');
const pdfBuffer = await page.pdf({
format: 'A4',
printBackground: true
});
res.contentType('application/pdf');
res.send(pdfBuffer);
} catch (error) {
console.error('Error generating PDF:', error);
res.status(500).send('Failed to generate PDF.');
} finally {
await page.close(); // IMPORTANT: Only close the page, not the browser
}
});
const PORT = 3000;
app.listen(PORT, () => {
console.log(`PDF generation server listening on port ${PORT}`);
});
By moving puppeteer.launch()
outside the request handler, we've eliminated the cold start. Now, each call to /convert
is significantly faster.
Step 4: Advanced Challenges in Production
While our server is now much faster, running Puppeteer in a production environment introduces a new set of challenges:
Server Dependencies: Your server needs all the correct system libraries to run headless Chromium. This can be complex to manage, especially in Docker containers.
Handling Dynamic Content and Authentication: What if the page you need to PDF requires a user to be logged in, or what if it needs to fetch data from an API before rendering? You can't just pass static HTML. You need to manage browser sessions and wait for asynchronous operations.
For example, to handle a page that's behind a login, you would need to:
// This is pseudo-code to illustrate the complexity
const page = await browser.newPage();
// 1. Navigate to the login page
await page.goto('https://myapp.com/login');
// 2. Programmatically fill in username and password
await page.type('#username', 'user');
await page.type('#password', 'pass');
await page.click('#login-button');
// 3. Wait for the redirect after login to complete
await page.waitForNavigation();
// 4. NOW navigate to the page you want to PDF
await page.goto('https://myapp.com/my-report');
// 5. Wait for the data to load on the page (e.g., a chart)
await page.waitForSelector('#chart-container.loaded');
const pdf = await page.pdf();
This adds significant complexity around authentication, cookies, and knowing exactly what to wait for on the page.
Performance at Scale: While our singleton pattern solves the "cold start" issue, what happens when you get 100 simultaneous requests? A single browser instance can become a bottleneck. You have to build a pooling mechanism to manage multiple browser instances, distribute the load, and handle queueing.
Maintenance: You're now responsible for keeping the server, Node.js, and Puppeteer updated to patch security vulnerabilities.
The Simpler Way: Using a Dedicated API
Building and maintaining this infrastructure is a lot of work. For most projects, a dedicated API is a more efficient and reliable solution.
That's why I built PageFlow. It's a simple REST API that handles all the browser management, scaling, and maintenance for you.
Here’s how you would accomplish the exact same task using the PageFlow API:
// pageflow-example.js
import fetch from 'node-fetch'; // or native fetch
import fs from 'fs';
const API_KEY = 'YOUR_API_KEY'; // Get yours from app.pageflow.dev
const invoiceHtml = `...`; // The same HTML string from before
async function generateWithPageFlow() {
try {
const response = await fetch('https://api.pageflow.dev/convert', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ html: invoiceHtml })
});
if (!response.ok) {
throw new Error(`API Error: ${response.statusText}`);
}
const pdfBuffer = await response.buffer();
fs.writeFileSync('invoice-api.pdf', pdfBuffer);
console.log('PDF generated with PageFlow: invoice-api.pdf');
} catch (error) {
console.error(error);
}
}
generateWithPageFlow();
The result is the samle high-quality PDF, but your code is simpler, and you have zero infrastructure to maintain.
Conclusion
Building your own PDF generator with Puppeteer is a great way to understand the underlying mechanics. But for a production application where speed, reliability, and your own time are critical, a dedicated API like PageFlow can be a game-changer.
Ready to give it a try? You can get your free API key from the PageFlow Dashboard and generate your first PDF in under a minute.
My name is Mohamed Fadlalla, and I'm a Software Engineer at Google who is passionate about building simple and powerful tools for developers. I created PageFlow to solve the exact challenges described in this article. My goal is to make PDF generation a solved problem so you can focus on your core product. You can follow my journey building PageFlow on Twitter/X and Indie Hackers.
Subscribe to my newsletter
Read articles from Mohamed Fadlalla directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Mohamed Fadlalla
Mohamed Fadlalla
Software Engineer at Google who loves finding elegant solutions to tricky problems. I believe the best way to learn is by building, and the best way to solidify knowledge is by teaching. My articles offer practical, in-depth tutorials on backend development and the tools of the trade. I'm also the creator of PageFlow (pageflow.dev), a developer-first API for PDF generation, and I often use challenges from my own project as inspiration for my posts.