Serverless Successes and Challenges Reviewed

Over the past 1.5 years, I have been working all-in on serverless architecture. From lightweight background jobs to full-blown APIs, every service we deploy runs serverless — primarily using AWS Lambda, API Gateway, DynamoDB, RDS, ElastiCache, S3, and SQS. This experience has taught me a lot — not just about what serverless makes easy, but also where the rough edges are.

This post captures the key lessons, optimizations, and bottlenecks we learned while building and scaling real-world products using serverless end-to-end.

✅ What Serverless Gets Right

1. Great for Lightweight, Asynchronous Operations

Serverless functions shine for async jobs — sending emails, cleaning up data, or triggering webhooks. You don’t need to worry about servers sitting idle. It just works and scales automatically.

In our case, most of our event-based microservices were built as simple Lambdas triggered by queues or scheduled events.

2. Works Beautifully with Event-Driven Architectures

We’ve fully embraced event-driven design — S3 triggers, SNS → Lambda chains. It’s easy to decouple services and build scalable, reactive systems without overcomplicating things.

This pattern helpes to split monoliths into focused, independent units with clean communication via events.

3. CI/CD Pipelines Become Simpler

Deployments are now super streamlined. With tools like the Serverless Framework and GitHub Actions, each merge to main can auto-deploy a service with zero downtime in our case we are manually triggering the workflow. No managing EC2s, ECS clusters, or Docker build pipelines.

4. Layers Reduce Code Duplication

We use AWS Lambda Layers to package shared logic — like logging, OpenTelemetry instrumentation, and Prisma clients — once, and reuse them across services. This reduces deployment size and cold start time, while making code more maintainable.

⚠️ The Trade-offs and Bottlenecks

5. Cold Starts Can Be Painful

Especially with VPC-connected functions or large packages, cold starts can take hundreds of milliseconds — or more. This adds latency spikes that are hard to predict.

Provisioned concurrency can reduce cold starts, but:

It costs extra
DNS and TCP connections aren’t always cached, even with provisioned concurrency delaying the third party service interactions

6. Package Size Constraints

Lambda has a hard limit of 50 MB zipped (or 250 MB unzipped with layers). If you’re using heavy libraries (like Puppeteer or large ML models), you’ll need to trim aggressively or switch to container images.

Try to use esbuild and serverless-esbuild to bundle only required code

7. Database Connection Limitations

Traditional relational DBs (PostgreSQL, MySQL) struggle with connection pooling in serverless. Since Lambdas can spin up hundreds of instances in parallel, they can easily exhaust DB limits.

What can help:

Using RDS Proxy
Switching to serverless-friendly databases like DynamoDB or Aurora Serverless
Implementing lazy DB connection logic per invocation

8. DNS Caching Issues

Each Lambda container may not reuse DNS resolution between invocations. That means on cold start (or new containers), your function re-resolves DNS — slowing down outbound requests (especially to external APIs or databases) and in most of the cases we don’t think about this but is an issue for sure.

Workarounds:

Use a long-lived DNS TTL
Configure custom DNS resolvers (for VPC-hosted Lambdas)
Use HTTP keep-alive agents (agentkeepalive or native)

9. TCP Connections Aren’t Always Reused

Even if DNS is cached, TCP sessions may not persist. This causes repeated SSL/TLS handshakes, increasing latency.

Using persistent agents (http.Agent, https.Agent, or libraries like undici) helped us maintain TCP connections within warm containers, reducing overhead.

10. Performance Is Not Predictable

Despite autoscaling, performance can vary due to:

Cold starts
Network retries
External service latency
Limits in concurrent executions

We use Sentry(just a wrapper on top of Opentelemetery) to monitor p95 latency, function duration, and error rate over time. This helped us spot hidden slowdowns.

11. Vendor Lock-In Is Real

The more AWS-native tools you use (EventBridge, Step Functions, DynamoDB, etc.), the harder it becomes to switch clouds or even replatform to containers.

To minimize lock-in:

Use abstractions like the Serverless Framework or Terraform
Avoid tightly coupling business logic to AWS-specific APIs
Keep handler code platform-agnostic where possible

12. Observability Is Non-Negotiable

With no persistent server or container logs, traditional debugging won’t help X-Ray tracing does help but is expensive.

What can be used:

OpenTelemetry for distributed tracing
Grafana + Tempo + Loki for unified observability
Correlation IDs to track user flow across async Lambdas

💡 Serverless without tracing is like debugging blindfolded.

13. VPC-Connected Lambdas Are Slow to Start

Functions inside a VPC (needed for RDS, ElastiCache) suffer huge cold start penalties — due to ENI attachment.

Mitigations:

Use RDS Proxy
Move to Aurora Serverless v2 (Data API enabled)
Keep Lambdas outside VPC if only external access is needed

14. Local Testing is Limited

Testing Lambdas locally isn’t perfect. You can’t fully emulate:

IAM behavior
VPC networking
DNS resolution

We use:

serverless-offline for simple REST simulation
localstack for AWS mocks
Isolation: write core logic in testable pure functions

Still, the only true test is in staging or prod.

💡 Final Thoughts

Going serverless everywhere has been a massive productivity boost for our team. It enabled rapid iteration, fine-grained scaling, and reduced operational overhead. But it also demanded deep awareness of the platform’s internals — especially around cold starts, networking, and database connections.

Would I recommend serverless for everything? Not necessarily. But if your app fits an event-driven or async-first model, and you’re okay embracing a few cloud-specific trade-offs — it can be incredibly efficient.

Let me know what your experience with serverless has been like. Hit me up with your insights, or things you’ve optimized!

1.5 Years of Going All-In on Serverless: What Worked, What Didn't.