Goal of SpriteDX is to create a tool that will generate consistent sprite assets for target games.

Consistency: Able to generate consistent pixel art assets.
Controllable: User should be able to control the scale and style easily.
Animation: Able to generate image sequences for a character.
Quality: Able to generate near-MidJourney level pixel art fidelity.
Serviceability: Able to be serviced in some hosting environment at affordable cost.

So far, we’ve tried:

Takeaway is that current state-of-the-art methods available is not yet able to create the desired goals stated above. However, there are ways to tweak the state-of-the-art models to behave the way I want it to behave.

First, I’m pretty much set on using either Flux.1-dev or Flux.1-schnell model as base diffusion model. From previous study, I notice that we would want to LoRA fine-tune the model to be able to better align the pixels to the super-pixel grids. Then, we will need to use off-the-shelf control nets to make it tile multiple frames of characters into one generation. Finally, we will need to train a custom control net to allow us to provide different dials like n_heads_tall, has_outline, n_colors, frame_index, and hue. That will help create a tool that can generate NPC characters with minimal effort with easy way to tweak.

Animation generation is still going to be quite challenging because Flux was never designed to generate multiple frames. And, we haven’t found any off-the-shelf model that can do this satisfactorily. Current best known approach is to use pose control net; however, I’m not super hopeful.

I also tried pattern matching by giving the Flux1.fill-dev model a template image sequence and first frame of rest of the sequence, but the model wasn’t able to fill out the rest. It mostly just copy pasted the first frame for all the other frames.

Alternative approach is to keep a high resolution reference and animate that and make the model translate it to pixelated version.

It was promising to see that translating non-pixel art into pixel art was quite good. And with help of control net to freeze some of the pixels, we can gain consistent characters in different poses. However, we will still need to first construct the high-res reference animation which kinda is going to be time-consuming.

Few more things I want to confirm on.

Can off-the-shelf VAEs reconstruct 1024×1024 sprite sheets with 100% accuracy?
Can we use Multi-Diffusion to generate multiple frames of characters with consistent design? But, I’m not hopeful since there is no cross-attention between non-overlapping patches and each patch is diffused separately.
Also investigate on Tune-A-Video to see if that can be used for generating frames.
Ultimately, I want a prototype that will generate highly detailed pixel art sheets with animated frames.

—Sprited Dev

Updated Plan for SpriteDX

Subscribe to my newsletter

Sprited Dev

Sprited Dev