Haixu Case Study #2 - Volcano - a visual guide

Anand (RC)Anand (RC)
4 min read

This is the second deep dive case study of a haixu visual guide.

This refers to the visual guide called Volcano, find it here.

See the first visual guide (Glowing Life, on Bioluminescence) case study for broad context (what is haixu, how it works, what to expect, etc.). I will assume you have read the common concepts there.

Overview

There are two visual guides in this product - the one called v2 in the url has this cover and table of contents:

The one called v1 has this cover and table of contents:

Topic Suitability

A topic like Volcanoes lends itself very well to creating visual guides using AI:

  • it has a lot of potential for good visual imagery that explains concepts

  • it does not have any characters who appear in different panels

  • it is colorful and bright

On the other hand, we need to engineer the images and text carefully to avoid too graphic depiction of the unavoidable devastation that volcanoes bring to the towns and people near it.

Review

This is my own review of the visual guides (v2 and v1):

The images and text are informative, beautiful and colorful, notice how the bottom panel (from v2, page 2) explains the layers of the earth's crust and how volcanoes work)

Insight: there may be errors in image accuracy to the content, which are hard to identify without domain expertise, but text seems to be fairly accurate based on the source content

One challenge with creating educational visual guides from a source using AI is that we cannot verify the accuracy of the text or images unless we know the topic well. Doing this at scale is difficult, and there might be wrong depictions. I have engineered to verify/curate/etc., but some errors can still creep through, especially in the images.

For example, In the next page top panel shown below, we can see different types of volcanoes being described, and it does show the two kinds (gentle slope and cone shaped ones) that are mentioned in the text.

Insight: AI Image generators can mix up different elements in the image prompt, leading to sometimes cool effects and sometimes errors

For example, see below panel (v2, page 4) - the panel image prompt is derived from the panel image text. Some relevant parts of the image prompt are:

Comic book style panoramic view of the solar system, focusing on volcanic activity across different celestial bodies.

Earth's volcanoes, Mars' Olympus Mons, Venus' lava plains, and Io's sulfur volcanoes visible.

Cosmic background with stars and nebulae.

We have different elements being referenced here, including volcanoes, planets, stars, nebulae. The prompt specifies volcanoes on planets and their moons - this kind of relative reference in image prompts will perform poorly and mix things up as we can see in the generated image - there are planets coming out of the land, and rings of planets across the land. The background is reasonable. The image does not show nebulae, but luckily that is not critical to the panel. I think the image is cool, but the planets on the surface are an error that should be avoided.

If I were to manually generate the image prompt (as opposed to using AI), I would have composed and worded it differently for the given panel text. And iterated over it till I got a good image without these errors.

One of the biggest challenges with generating good aligned images with full automation is to balance image quality with the cost of curation/verification and iteration using AI agents.

Here's another page that has a pretty good pair of panels - images and text are good and seem well aligned (to the extent of my knowledge of the subject).

You can get this visual guide (with two different variants) here.

About me

I am a software/AI/product engineer/scientist conducting cutting edge research in modern generative AI - please consider buying my AI generated visual guides at the store to support my work. It will also help me immensely if you spread the word, comment, like and share my posts about haixu. Thank you!

0
Subscribe to my newsletter

Read articles from Anand (RC) directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anand (RC)
Anand (RC)

I am currently working on generative AI, currently text+image experiences, educational AI generated visual guides in various mediums (comic, video, etc.). Before that, I was building llm apps with chat models, evaluating GPTs and Assistants API. Before that worked in conversational AI. Prior to that, have worked on many things product, software, AI, ML.