Plan for SpriteX Prototype


Date: 6/24/2025 (Day 26 of Sprited)
Goals: Build one-click game sprite generation tool.
Scope: Define a proof of concept criteria for Sprite generation AI tool.
Anti-Scope: Unity plugin. Sleek UI/UX.
Requirements:
[ ] Define inputs to the model.
[ ] Define generation pipeline.
[ ] Provide Evaluation criteria for picking API services.
[ ] Provide Evaluation table for scoring API services.
[ ] Pick one API provider with justification.
[ ] Create 5 template sprite sheets.
[ ] AI can re-skin the template sprite sheets according to user prompt.
Inputs:
- Template sprite sheet + User Prompt
Outputs:
- Reskinned sprite sheet
Sub Steps:
Provide a template image and user prompt.
Mask template image.
Send them into API provider of choice (use API key).
Export the resulting reskinned image.
Evaluation Criteria:
Skin: Can re-skin (1pts)
Mask: Can mask regions (1pts)
Prompt: Can follow user prompt (1pts)
Size: Sizing consistency (1pts)
Anim: Animation is continuous (1pts)
Pixel: Produces Pixel Arts (1pts)
BG: Easy to Cleanup Backgrounds (1pts)
Evaluation Table:
Provider | Results |
Retro-Diffusion | … |
Pixel Lab | … |
MidJourney | Does not provide API access. |
Retro Diffusion
Input:
Prompt: character sheet a lousy dad with tooth brush on his mouth and underwear
Parameters: Default
Output:
Retro Diffusion’s Image-to-Image is pretty strong. It can basically keep the same form and generate similar image. And the developer APIs are publicly available. However, it doesn’t look like the prompt has much of an impact. It does to certain extent but very weak.
Some alternatives generated on RetroDiffusion website.
As a bonus feature, RetroDiffusion has some animation support for 8 direction movement animation. However, it only supports 48x48 and the style does not get transferred correctly. It was not ready for prime time.
We also tried feeding in a sprite sheet of walk cycle and the result was not that great. The generated sprite sheet does NOT necessarily form a nice animation.
Adjusting change-strength parameter seems to have a good impact to the smoothness of animation. However, when we lower this, the images end up looking too similar to original image. Overall, there seems to be a genuine limitation in sprite generation engines that are out there today. We will have to work around this limitation.
One possible workaround would be to do rejection sampling. We generate then ask another model to describe the image. If the description of image does not match we can dial down the change-strength automatically.
There is also no color variation. The output sprites often follow the exact color blotches. So, there isn’t really a re-skinning to create different overall colors.
If you think about sprite-sheet, each sprite sheet is unique in that there is no repeating patterns followed by all games. Usually the games use whatever layout they prefer. So there is no way to generalize this sprite sheet generation. I feel that there is a near-tarpit idea warning here. There is some limitations that will require lots of fine tuning that we have to do and lots of data-preprocessing needs to align data to fit certain layout patterns.
PixelLab
PixelLab’s image-to-image blurs the images a bit. However, it has similar level of ability to stay on track with the input image.
There are probably some post-processing we can run to sharpen them.
Also tried out the legacy animation generation logic. It didn’t go as expected.
Inpainting feature was also out of context. May be I’m not doing this correctly? The UI is rather confusing.
So far we’ve seen:
Midjourney can create initial sprite sheets that are “artistic.”
Using MidJourney output as a starting point is proving to be useful.
Both engines—RetroDiffusion and PixelLab—do a decent job at image-to-image without losing contours with RetroDiffusion staying ahead with crisp clear edges and finishes.
Animation capabilities are still very limited.
Let’s circle back onto this after a break.
Scenario
Scenario’s “Edit with Prompts“ is pretty strong out of the box. It is able to capture the semantics rather than simply replacing blotches of colors. The pricing seems exorbitant though…
Flux Retro Aesthetics can produce very consistent images. Quality seems higher and more refined than Retro Diffusion. RetroDiffusion seems to produce more dull looking colors and FRA seems to produce more vibrant colors with more cartoony look. Sorry about the cropping. It seems like there is some limitation in how wide the image can be.
Pixel Art XL is little hard to describe but quality of output was not as good as other models so far. Like composition seemed broken and anatomy wasn’t always consistent.
So far Flux Retro Aesthetics was the best model I could find by myself dabbling on Scenario.
Flux in a Nutshell
Pixel: Think of Flux as the unofficial “SDXL-PLUS-ULTRA” that left the Stability playground, bulked up on rectified-flow steroids, and now bench-presses 12 billion parameters for fun. It’s a family of text-to-image transformer models from Black Forest Labs (BFL) built to beat MidJourney-level quality while still shipping open weights for tinkering.
Where did It come from?
Genealogy: Many of the original SD / SDXL researchers spun up BFL and launched Flux 1 on Aug 1 2024 as their flagship suite.
Mission: Ship state-of-the-art generative models that are actually open (well… at least the dev & schnell checkpoints).
Model Line-Up
Flux 1 Pro (12B) - Commercial API only
Flux 1 Dev (12B) - Non-commercial
Flux 1 Schnell (3B) - Apache-2.0
Flux 1 Fill (12B) - Same as Dev - Extend canvas, in-painting
Why Is it different?
Rectified Flow Transformer - Replaces diffusion’s noise step-ladder with a straight-line flow from noise → image, cutting inference steps by ~30%.
Hybrid Text-Image Blocks - Separate weights for text & image tokens that cross-talk each layer, which helps it nail captions, signagem and multi-object layouts.
Dual-CLIP + T5-XXL Conditioning - Two CLIP encoders (G/14 & L/14) plus a language model for long, natural prompts—no “prompt salad” or negative prompt gymnastics needed.
Guidance Distillation - Dev/Schnell are distilled from Pro, so you get Pro’s brains without Pro’s bulk.
Performance Cheat-Sheet
Quality & prompt follwing: BFL’s own benchmark shows Flux 1 Pro/Dev edging out MidJourney v6 and DALLE 3 on human preference tests for composition, text accuracy, and style diversity.
Speed: On a 24GB GPU, 1024² images land in ~3s at 40 steps; Schnell hits near-real-time 16GB at 512². Community quantizations run on 8GB cards.
Pixel & Retro strengths: Dozens of community LoRAs (e.g. Retro-Pixel-Flux) ride on top, making it a go-to for sprite work and old-school palettes.
Licensing Gotchas
Dev & Fill: Non-commercial. You can publish personal projects, research papers, and open-source tools, but no selling the outputs or running paid APIs.
Schnell: Apache-2.0—green light for commercial use.
Pro: Pay-per-token / enterprise deal, hosted only.
Ecosystem & Tooling
Offline Pipelines: FluxPipeline is merged into diffusers; ComfyUI has native node.
LoRAs & Merges: >300 LoRAs on HuggingFace (anime, retro, linocut, photoreal, you name it).
ControlNets / IP-Adapters: Early ports exist; compatibility improving each month.
Third-Party APIs: Replicate, fal.ai, mystic.ai all host the Pro checkpoint if you’d rather not lug 12B params around.
Quick “Hello Flux“ (Diffusers)
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
img = pipe(
prompt="32x32 pixel art knight, NES palette, walking animation frame",
height=512, width=512
guidance_scale=3.5, num_inference_steps=40
).images[0]
img.save("flux_knight.png")
# Swap in prithivMLmods/Retro-Pixel-Flux-LoRA for extra 8-bit goodness.
When should you care?
AAA concept art / photoreal - Flux Pro is competitive with MJ v6 / SD3-Ultra
Sprite sheets & pixel art - Dev + Retro LoRA is chef’s kiss—perfect for Sprited’s one-click generator.
Fast local prototyping - Schnell beats SDXL-Turbo in detail at similar speeds.
Truely free for-profit rights - Stick to Schnell or negotiate Pro—Dev is hands-off for commercial.
Bottom Line
Flux is what happens when the brains behind SDXL ditches the diffusion training wheels and cook up a transformer that flows straight to the good stuff. For your Sprited pipeline, pair Flux Dev + Retro LoRA for local R&D, graduate to Flux Schnell (or Pro API) once money enters the chat, and enjoy fewer “why won’y this prompt work?” moments along the way.
end of text from Pixel
Subscribe to my newsletter
Read articles from Sprited Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
