Translating Semantic Grid Maps into Visual Scenes with Stable Diffusion

Sprited DevSprited Dev
2 min read

Let’s think creatively and think about if there is anything I can do to make the Machi environment to stand out against all the other Minecraft clones? I want to really imbue intelligence into the project.

Sentient Tiles: Can I make the tiles themselves intelligent? I could make the tiles talk. Or the tiles to have emotions. Like happiness and create log of self talks.

Tile Diffusion: Can I use stable diffusion on these tiles using the tile configurations? Perhaps I can use some model that is easily available.

Let me time box the second one and see what I can achieve.

Midourney

Tried MidJourney to see if I can produce a rendered map consistent with above design, but I wasn’t able to get a consistent generation.

ChatGPT

PixelLab

Tried out PixelLab, and it is able to do it for smaller model.

It also automatically applied transparency which was great. However, the image size were limited to 64 by 64. There is model that generates larger tiles but that wasn’t giving me anything at least in the web UI I tried. At least we see some hope.

RetroDiffusion

Tried using the img2img feature. It provided most control. However, I’m not sure if the quality matches my expectation. I think there will need to be a lot of tuning that should happen in order to produce something really good.

At the very list though, it does produce something that resembles the tile map with the right positioning of elements. It is probably that I will need to tune different numbers and find an optimal values. Under the hood it uses stable diffusion, so I think perhaps I should focus on the stable diffusion web-ui instead.

Stable Diffusion Web-UI Img2Img

TODO: I haven’t had chance to do this yet.

TODO: https://youtu.be/FIOXGWCQgAI

0
Subscribe to my newsletter

Read articles from Sprited Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sprited Dev
Sprited Dev