AI Script for Realistic Mock Data Images

Generating mock data for an app can be very time-consuming, even if you use AI chatbots. You have to write out prompts, copy the results, and then somehow import everything into your database. It’s manageable for a few items, but what if you need dozens or even hundreds? That’s the issue I faced while working on a new project. I needed lots of mock items, and doing it manually simply wasn’t practical. So, I decided to automate the entire process by writing a script that uses OpenAI’s API. Here’s how I did it.

Features of the Script

⭐ JSON Structured Output: Uses OpenAI's GPT-4o-mini model to generate structured JSON output.
⭐ Configurable Batch Sizes: You can control how quickly the data is generated.
⭐ Image Generation: Generates realistic images for items using DALL·E 3.
⭐ Image Optimization: Compresses images into WebP format using Sharp for better performance.
⭐ Image Upload: Stores the optimized images in Edge Store for easy access.
⭐ Database Integration: Seamlessly inserts the generated data into your database.
⭐ Reusable Design: Can be easily adapted for different kinds of mock data.

Running Standalone TypeScript Scripts

The first thing I needed was a way to run scripts in my app, with access to all necessary environment variables. I used tsx and dotenv-cli for this.

Install them using:

npm i -D tsx dotenv-cli

Now, we can create a simple script:

(async () => {
  console.log("TEST");
})();

And run it with:

npx dotenv -e .env tsx ./src/scripts/test.ts

Generating Mock Data

I wanted to generate a list of recipes with specific fields in JSON format to add to my database. When I first ran multiple generations in parallel, I got some duplicate recipes. To avoid that, I switched to generating them sequentially and included the existing recipes in the prompt to minimize duplication. There are definitely ways to implement seeding logic for parallel generation without duplicates, but for this purpose, I was fine with generating them in series.

import { db } from "@/server/db";
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

// Schema for the recipe data returned by OpenAI
const recipeSchema = z.object({
  recipes: z.array(
    z.object({
      name: z.string(),
      description: z.string(),
      ingredients: z.array(z.string()),
      steps: z.array(
        z.object({
          instruction: z.string(),
        })
      ),
      duration: z.number(),
      servings: z.number(),
    })
  ),
});

type Recipe = z.infer<typeof recipeSchema>["recipes"][number] & {
  image?: string; // This will be populated later
};

async function generateRecipes(count: number, allRecipeNamesStr: string) {
  const prompt = `Generate ${count} unique and diverse recipes that are different from the following recipes: ${allRecipeNamesStr}.`;

  // Generate a list of recipes in JSON format using OpenAI's API
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
    response_format: zodResponseFormat(recipeSchema, "recipes"),
  });

  const responseContent = completion.choices[0]?.message?.content ?? "";
  return recipeSchema.parse(JSON.parse(responseContent)).recipes;
}

Generating Images

To generate images, I used the dall-e-3 model. The cost is $0.04 per image. You could opt for dall-e-2 at half the cost, but the images aren’t as good.

async function generateRecipeImage(recipe: Recipe) {
  console.log(`Generating image for ${recipe.name}`);

  const imagePrompt = `An appetizing photo of the dish: ${recipe.name}`;

  const res = await openai.images.generate({
    model: "dall-e-3",
    prompt: imagePrompt,
    size: "1024x1024",
  });

  const url = res.data[0]?.url;
  if (!url) {
    throw new Error("No image url found");
  }
  return url;
}

Optimizing the Image

The images generated by DALL-E were quite large, so I used Sharp to convert them to WebP. This reduced the size from about 1.7MB to 170KB, without noticeable loss of quality.

import sharp from "sharp";

async function optimizeImage(imageUrl: string) {
  const response = await fetch(imageUrl);
  if (!response.ok) {
    throw new Error(`Failed to fetch image: ${response.statusText}`);
  }

  const arrayBuffer = await response.arrayBuffer();

  const optimizedBuffer = await sharp(Buffer.from(arrayBuffer))
    .webp() // Default quality is 80
    .toBuffer();

  return new Blob([optimizedBuffer], { type: "image/webp" });
}

Uploading the Image

The images from DALL-E are hosted on URLs that expire, so I needed to upload them to my storage. Naturally, I used Edge Store for this, which made storing the images easy and free.

Set up the Edge Store project keys in your environment variables:

EDGE_STORE_ACCESS_KEY=xxx
EDGE_STORE_SECRET_KEY=xxx

Configure the bucket for the images:

import { initEdgeStore } from "@edgestore/server";

const es = initEdgeStore.create();
const edgeStoreRouter = es.router({
  img: es.imageBucket(), // A simple public image bucket
});

export const backendClient = initEdgeStoreClient({
  router: edgeStoreRouter,
});

Upload the image blob:

async function uploadImage(blob: Blob) {
  const res = await backendClient.img.upload({
    content: {
      blob,
      extension: "webp",
    },
  });
  return res.url;
}

P.S. I'm the creator of Edge Store, so I'm obviously biased towards it. 😇

Inserting Data into the Database

Once I had all the data and images ready, I inserted them into my database. I used Drizzle with Postgres on Supabase, but this approach works with any database technology you prefer.

async function insertRecipesIntoDB(recipes: Recipe[]) {
  await db
    .insert(tRecipe)
    .values(
      recipes.map((recipe) => ({
        name: recipe.name,
        description: recipe.description,
        ingredients: recipe.ingredients,
        steps: recipe.steps,
        duration: recipe.duration,
        servings: recipe.servings,
        image: recipe.image,
      }))
    )
    .execute();
}

Cost Considerations

The cost of text generation is negligible compared to image generation, which costs $0.04 per image ($0.02 if using dall-e-2). For example, generating 30 items costs about $1.20.

Full Code

Here's the complete code that brings everything together.

import { db } from "@/server/db";
import { tRecipe } from "@/server/db/schema";
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";
import { backendClient } from "@/lib/edgestore";
import sharp from "sharp";

const openai = new OpenAI();

// Schema for the recipe data returned by OpenAI
const recipeSchema = z.object({
  recipes: z.array(
    z.object({
      name: z.string(),
      description: z.string(),
      ingredients: z.array(z.string()),
      steps: z.array(
        z.object({
          instruction: z.string(),
        })
      ),
      duration: z.number(),
      servings: z.number(),
    })
  ),
});

type Recipe = z.infer<typeof recipeSchema>["recipes"][number] & {
  image?: string; // Add an image field that will be populated on a future prompt
};

(async () => {
  const totalRecipes = 20; // Total number of recipes to generate
  const recipesPerBatch = 10; // Number of recipes to generate per batch
  const totalBatches = Math.ceil(totalRecipes / recipesPerBatch); // Calculate the total number of batches

  // Fetch existing recipe names from the database to avoid duplicates
  const allRecipeNames = (await db.query.tRecipe.findMany().execute()).map(
    (recipe) => recipe.name
  );

  for (let batchNumber = 1; batchNumber <= totalBatches; batchNumber++) {
    console.log(`Generating recipes batch ${batchNumber}/${totalBatches}`);

    // Generate a batch of new recipes, ensuring they are unique
    const recipes = await generateRecipes(
      recipesPerBatch,
      allRecipeNames.join(", ")
    );

    // Generate images for each recipe in the batch
    await generateImagesForRecipes(recipes);

    // Insert the new recipes into the database
    await insertRecipesIntoDB(recipes);

    // Update the list of all recipe names with the newly added recipes
    allRecipeNames.push(...recipes.map((recipe) => recipe.name));
  }
})();

async function generateRecipes(count: number, allRecipeNamesStr: string) {
  const prompt = `Generate ${count} unique and diverse recipes that are different from the following recipes: ${allRecipeNamesStr}.`;

  // Generate a list of recipes in JSON format using OpenAI's API
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
    response_format: zodResponseFormat(recipeSchema, "recipes"),
  });

  // Extract the content from the AI's response
  const responseContent = completion.choices[0]?.message?.content ?? "";

  // Parse and validate the response using the defined schema
  const generatedRecipes = recipeSchema.parse(
    JSON.parse(responseContent)
  ).recipes;

  return generatedRecipes;
}

async function generateImagesForRecipes(recipes: Recipe[]) {
  const batchSize = 5; // Number of images to generate concurrently
  for (let i = 0; i < recipes.length; i += batchSize) {
    const batch = recipes.slice(i, i + batchSize);
    // Generate images for the current batch concurrently
    await Promise.all(
      batch.map(async (recipe) => {
        try {
          const imageUrl = await generateRecipeImage(recipe);
          recipe.image = imageUrl; // Add the image URL to the recipe
        } catch (error) {
          console.error(`Failed to generate image for ${recipe.name}:`, error);
        }
      })
    );

    console.log(`Waiting before the next batch...`);
    // The tier1 rate limit is 5 requests per minute 😢
    // You might be able to remove this delay if you have a higher tier
    await new Promise((resolve) => setTimeout(resolve, 50000)); // Wait for 50 seconds
  }
}

async function generateRecipeImage(recipe: Recipe) {
  console.log(`Generating image for ${recipe.name}`);

  const imagePrompt = `An appetizing photo of the dish: ${recipe.name}`;

  // Call OpenAI's image generation API with the prompt
  const res = await openai.images.generate({
    model: "dall-e-3",
    prompt: imagePrompt,
    size: "1024x1024",
  });

  // Extract the image URL from the response
  const url = res.data[0]?.url;
  if (!url) {
    throw new Error("No image url found");
  }

  // Optimize the image before uploading
  const blob = await optimizeImage(url);

  // Upload the optimized image to edgestore
  const esRes = await backendClient.img.upload({
    content: {
      blob,
      extension: "webp",
    },
  });

  return esRes.url;
}

async function optimizeImage(imageUrl: string) {
  // Fetch the image from the provided URL
  const response = await fetch(imageUrl);
  if (!response.ok) {
    throw new Error(`Failed to fetch image: ${response.statusText}`);
  }

  // Convert the response to an ArrayBuffer for processing
  const arrayBuffer = await response.arrayBuffer();

  // Use sharp to convert the image to WebP format
  const optimizedBuffer = await sharp(Buffer.from(arrayBuffer))
    .webp() // Default quality is 80
    .toBuffer();

  // Create a Blob from the optimized buffer
  return new Blob([optimizedBuffer], { type: "image/webp" });
}

async function insertRecipesIntoDB(recipes: Recipe[]) {
  await db
    .insert(tRecipe)
    .values(
      recipes.map((recipe) => ({
        name: recipe.name,
        description: recipe.description,
        ingredients: recipe.ingredients,
        steps: recipe.steps,
        duration: recipe.duration,
        servings: recipe.servings,
        image: recipe.image,
      }))
    )
    .execute();
}

Conclusion

Automating the creation of mock data with AI saved me a lot of time, especially when compared to manually generating and managing data. This approach scales well, whether you need a handful of items or hundreds. Plus, having realistic data adds to the overall presentation of your app, making it feel more complete. Although it requires some time investment initially, having the logic ready now means that I'll be able to do it much faster in future projects. If you’re in need of lots of mock data, you might want to try this approach!

Thanks for reading!

👋 Here are my links!

YouTube
X

Building an AI Script to Generate Mock Data with Realistic Images