Parallel Documents Rendering: VM vs. Kubernetes vs. FaaS in AWS

TL;DR

This blog post summarizes the comparison of the performance of parallel PDF document rendering across three AWS architectures: Virtual Machine (EC2) with Docker and Nginx, Kubernetes (EKS), and Function-as-a-Service (FaaS) using AWS Lambda. The goal is to evaluate scalability, cost-effectiveness, and complexity when rendering PDFs with Puppeteer under different cloud-based setups. I’ve written more details on the subject, which you can check out here.

Introduction

Cloud computing has dramatically transformed application development and deployment, particularly compute-intensive workloads. Document generation, a common task across industries like finance and reporting, is an ideal scenario for evaluating the performance of different cloud architectures.

In this comparison, I evaluate the performance of rendering PDFs using Puppeteer across three different AWS services:

EC2 (Docker and Nginx)
EKS (Kubernetes)
Lambda (FaaS)

Performance is measured in terms of request time, render time and total processing time for request batches of various sizes. The insights from this comparison can help developers choose the right cloud architecture for their specific use case.

Architecture

Local Container (Docker)

For this setup, I developed a Node.js application with an Express endpoint to handle PDF rendering. The application was containerized using Docker and tested on a local machine, serving as a baseline for standalone deployments.

EC2 (Docker and Nginx)

In the EC2 setup, I provisioned an EC2 instance, installed Docker and Nginx, and deployed the application container. Nginx served as a reverse proxy, handling HTTP requests and forwarding them to the Docker container, ensuring efficient resource usage for parallel processing.

EKS (Kubernetes)

For Kubernetes, I used Amazon Elastic Kubernetes Service (EKS) with a 3-node cluster. The Docker container was pushed to Amazon Elastic Container Registry (ECR) and deployed through Kubernetes. A NodePort service exposed the application to external traffic, allowing for testing under different load conditions.

Lambda (FaaS)

The serverless approach utilized AWS Lambda to handle PDF rendering as a function. The Lambda function was triggered through an API Gateway, enabling dynamic scaling without the need to manage the underlying infrastructure.

Testing Code Structure

I implemented a Node.js script that sent parallel HTTP POST requests to each service, passing EJS templates and JSON data as payloads. The testing was conducted for batch sizes of 1, 10, 30, 50, and 100 documents. Each service was evaluated based on average response time, render time, and total processing time.

Performance and Results Analysis

Average Times Per Document

The table below summarizes the average response and generation times in seconds per document for 100 parallel requests.

Service	Avg Response Time (s)	Avg Generation Time (s)
Local Container	0.79	0.11
Lambda	0.7	0.22
EC2	0.89	0.27
EKS	0.91	0.28

Total Processing Times for Batches

The following table shows the total processing times in seconds for varying batch sizes across the different architectures, with a graph representing the data.

Num. Documents	Lambda	Local	EC2	EKS
1	1.9	0.96	1.17	1.28
10	2.82	5.42	6.12	3.05
30	7.51	15.54	17.51	9.64
50	9.3	30.43	33.22	17.56
100	14.06	63.41	68.23	35.93

Cost Analysis of Document Rendering Services

When comparing the costs of document rendering across different AWS services, it’s crucial to understand execution times and the associated charges. Based on the results of rendering 100 documents (as detailed in Table II), here’s a breakdown of cost estimates for each service:

1. AWS Lambda

Lambda functions charge based on execution time and allocated memory. With 2048 MB memory, the cost is $0.0000000333 per millisecond. For 100 documents, with an average of 14.06 seconds (14,060 ms), the cost comes to approximately $0.00047. While this may seem highly cost-effective, Lambda’s pricing is best suited for short, on-demand tasks rather than long-running or high-frequency jobs.

2. Amazon EC2

Using a t4g.small EC2 instance costs $0.0168 per hour, and the execution time for processing 100 documents was 68.23 seconds (0.01896 hours). This results in a cost of approximately $0.00032 for rendering 100 documents. However, it’s important to note that EC2 instances remain active and incur costs even when not processing documents, making the actual cost much higher for continuous or low-usage workloads.

3. Amazon EKS (Kubernetes)

The EKS setup used c6g.large nodes, which cost $0.02628 per hour per node. With a 3-node cluster and an execution time of 35.93 seconds (0.00998 hours), the cost for rendering 100 documents was approximately $0.00026. However, additional costs for EKS must be considered, including control plane charges ($0.10 per hour) and potential load balancer usage. Like EC2, EKS clusters also incur costs for all running nodes, even when not actively processing tasks, which significantly increases the true cost for workloads requiring persistent resources.

4. Local Container on Dedicated Server

For this simulation, running the Local Container on a dedicated server was assumed to be free. In a real-world scenario, however, a dedicated server typically costs between $0.05 and $0.10 per hour. With an average processing time of 63.41 seconds (0.01761 hours), the estimated cost for rendering 100 documents would be approximately $0.001 to $0.002, depending on server pricing. While this approach offers predictable costs, it requires maintaining dedicated infrastructure, which could become inefficient for variable workloads.

Discussion

The results clearly highlight the strengths and weaknesses of each architecture:

Local Container:
- Benefits: Suitable for isolated or small-scale deployments.
- Limitations: Fully self-managed, with a lot of overhead and constant costs.
EC2:
- Benefits: Full control over the environment, flexibility, and resource allocation.
- Limitations: Requires manual scaling and management of resources.
Kubernetes (EKS):
- Benefits: Automated scaling, resource management, large workload handling.
- Limitations: Complexity in setup and management, higher cost due to infrastructure overhead.
Lambda (FaaS):
- Benefits: Fully serverless, automatic scaling with no infrastructure management.
- Limitations: Cold start latency and limitations on execution time and resource allocation.

Conclusion

Choosing the right infrastructure for your workload depends on several factors, including cost, scalability, and performance. AWS Lambda offers serverless scalability for small-scale, cost-effective solutions but may struggle with high concurrency. EC2, while providing higher resource limits and simplicity, comes with continuous infrastructure costs. EKS shines for large-scale, continuous workloads, harnessing Kubernetes’ power for orchestration, but its complexity and cost can hinder low-utilization tasks. A Local Container setup is ideal for development or small deployments but lacks the scalability for more demanding applications.

Ultimately, assessing the specific needs of your workload—whether it’s predictable or variable, small or large-scale—along with budget considerations will help you make the most informed choice. A hybrid approach combining these options often provides the optimal balance of cost, performance, and flexibility.

Recommendations

My personal take on choosing the right infrastructure for this specific use case, rendering PDF documents, is:

Online solution -If our goal is to generate documents, whether for sale or personal use, I would go for Lambda (FaaS). The pay-as-you-go pricing model is ideal because you only pay when documents are generated. If we are selling them, the cost of document generation essentially “pays for itself.” There are no costs for personal use, and when documents aren’t being processed, no infrastructure is running in the background.
Offline solution - Usually, this use case means that a mid-to-large company generates documents on-premise. If this is for personal use, I would suggest just using Docker and starting a container locally. But, if this is for a large company where we need to handle big workloads and offer scalability, reliability, and elasticity, I would go for a local Kubernetes orchestration.

Check out the full paper here for more detailed information on cloud architecture choices and performance comparisons.

Parallel Documents Rendering: A Performance Comparison of VM, Kubernetes and FaaS in AWS