Introduction:

This Christmas, we tried something new for our ceremony: a roasting party with an AI twist! We called it "Chef Roaster," in keeping with our DataChef theme. The concept was simple—provide AI with some information and roasting ideas, then let it serve up some spicy jokes in return. But we didn’t stop there! We also added some cool AI features, like generating roast images and having the AI read the roasts out loud, so we could roast each other in a variety of ways—verbally, visually, and audibly!

Visit the roasting website at Chefroaster.ai! Here is the final video of our ceremony:

The Interface and User Flow:

We wanted to create a simple UI where users can input a description of the teammate they want to roast, and then the AI identifies who it is and roasts them. This is the UI that appears when the user starts:

Then the roasting happens behind the scenes. It involves several steps: identifying the teammate (or choosing randomly if not provided), gathering information about them, generating a roast text, using text-to-speech to read it aloud, and utilizing GenAI to generate a roast image.

These steps take some time, especially the image generation. To avoid any awkward pauses in the user experience, we implemented a trick by showing a pre-roast while the image is being generated. Here’s what the final roast looks like:

One last feature we added, in keeping with the spirit of roasting, is the ability to share the roast via a shareable link (displayed at the top). This way, users can roast someone and share it with the "victim" 🙂.

The Architecture:

So, what happens under the hood? We wanted a fully serverless system that interacts with LLM providers’ APIs and the user. To achieve this, we chose an AWS Lambda function—specifically, a monolithic Lambda function (or Lambdalith), which made sense given the small scope of this function. You can read more about when to use a Lambda monolith (Lambdalith!) here.

This Lambda function acts as a bridge between the user and the services we need, such as AWS Secrets Manager for securely storing API keys, OpenAI API for LLMs and image generation, and Amazon Polly for text-to-speech. For serving static content like images and shareable roasts, we used signed URLs from S3.

The diagram below summarizes the architecture:

Lessons Learned:

In this section, we’ll briefly share what we learned while building this AI product, which we hope will help save you time and money!

Amazon Nova Underwhelms!

Initially, we aimed to use Amazon's newly released AI models, specifically the Nova Pro model for generating roasts and Nova Canvas for image generation. These models were launched during the 2024 re:Invent, and there was a lot of hype surrounding them. However, when we tried them, they turned out to be quite disappointing! The Nova Pro model (currently the leading LLM in the series) could barely generate anything remotely funny and struggled to follow instructions. The image generation was even worse, with nearly a minute of latency, which made them unsuitable for real-time applications. At this point, their main advantage seems to be their affordability, which isn't surprising given the performance!

Amazon Polly is Pretty Fast!

While we considered using OpenAI’s text-to-speech service, staying within the AWS ecosystem was preferable for us, and Amazon Polly performed well enough to keep us happy!

Keeping Users Engaged

Although we chose OpenAI’s DALL-E over Amazon Nova Canvas, we still experienced a 10-15 second latency in image generation, which disrupted the user flow. To address this, we implemented a few strategies to keep users engaged while the image was being created.

First, we added a typing animation for the roast text, even though we didn’t stream the result. This small delay helped stretch out the experience a bit longer. To make it more engaging, we incorporated a text-to-speech feature. While this kept users occupied, there was still a few seconds of delay. Since the text generation was noticeably faster than the image creation, we introduced a “Pre-Roast” step, offering hints about who the roast would be about.

In this way, we were able to fill the gap during image generation with something interesting and engaging for the users!

AI Models Have Biases, and Default Guardrails May Not Be Enough

While companies like OpenAI work to implement strict guardrails on their models, biases still persist due to the datasets used for training. This issue became even more apparent when we used the output of large language models (LLMs) for image generation. Seemingly harmless outputs from LLMs, such as mentioning a person's name or a city, could lead to strange behaviors during the image generation process.

Image generation models are particularly prone to biases and harder to steer than text-based models. For instance, mentioning a Middle Eastern city or name could result in exaggerated, stereotypical imagery. Similarly, if you try to anonymize a name by replacing it with a general term like "female name," the generated image might overly emphasize feminine traits. This behavior likely stems from limited training data in these areas and a lack of steerability in the models.

For now, our best approach has been to ensure that the input text from the LLM doesn't include sensitive entities such as names, cities, countries, or references to gender and politics.

Your Turn:

Now it’s your turn! We’ve made this project open-source and easy to configure, so you can use it to roast your teammates :) Check out the GitHub repo and follow the instructions in the README to set up your own Roasting party!

How We Built an AI Team Roaster and Roasted Ourselves! (Featuring Amazon Bedrock)

Table of contents