Create Images from Text with Hugging Face

🌎 Introduction

In continuation of the Hugging Face series, we're moving from text-based tasks to something visual: Generating images from natural language prompts!

Ever imagined a unicorn surfing a rainbow or a futuristic city floating in the sky? Now you can bring those ideas to life 💡 using the Diffusers library by Hugging Face. Using just a few lines of Python code and the powerful Gradio interface, you can deploy your own AI image generation app with ease. Let’s dive in!

🌀 What is a Diffuser Model?

Diffusers are a class of generative models used to create data like images by gradually removing noise from a random signal, guided by a trained neural network. Imagine starting with visual static and slowly sculpting a clear image from it, step by step.

In the context of Hugging Face, the diffusers library provides powerful tools to run state-of-the-art models like Stable Diffusion, which can generate stunning images from simple text prompts. These models are widely used for creative applications like concept art, storytelling, and more. Here, we'll specifically use the Stable Diffusion v1.4 model hosted in the Hugging Face Model Hub. This model transforms textual descriptions into visually stunning images.

🔗 Prerequisites

Before you begin, make sure you have the following:

A Hugging Face account (with write access to create Spaces)
Installed packages: diffusers, torch, gradio
Access to a GPU (locally or via Hugging Face Spaces, for better performance)
The GPU space is charged hourly. The free spaces configuration can also be used, but the generation takes time.

Now, let us look at the code,

👨‍💻 The Complete Code

from diffusers import StableDiffusionPipeline
import torch
import gradio as gr

# Load the Stable Diffusion model
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
#Use GPU if available else run the code on CPU
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")

# Define the generation function
def generate_image(prompt):
    image = pipe(prompt).images[0]
    return image

# Launch with Gradio
gr.Interface(fn=generate_image, inputs="text", outputs="image").launch()

💡 Code Explanation

StableDiffusionPipeline.from_pretrained(...) loads the image generation model.
pipe.to(...) ensures the model runs on GPU if available, falling back to CPU.
The generate_image function takes a text prompt and returns a generated image.
gr.Interface wraps everything into a simple web app with input and output.

🔹 Sample Prompts to Try

Here are some sample prompts that might excite you.

"A cat playing chess in outer space"
"Cyberpunk Tokyo street at night"
"A magical forest with glowing trees"
"A robot painting a self-portrait"
“Mouse riding a tricycle”

⚠️ Limitations

Performance: Image generation on the CPU is very slow. Use GPU-enabled environments like Hugging Face Spaces for practical speed.
Memory: Models like Stable Diffusion require high memory. GPU with 8GB+ VRAM is recommended if you run the code locally.
Prompt: Choose a simple prompt for experimentation. Complex prompts take time and may require GPU for better results.
When you run the code on Colab, choose the environment as GPU, which gives a better result.
Safety: Be aware of content filtering and ethical use when generating images.

🌐 Deploying to Hugging Face Spaces

Create a new Space:
- Choose Blank as the template
- Set Python as the SDK
- Enable GPU runtime if you are willing to pay per use. or choose Free tire CPU Configuration

Upload the following files:

app.py (the above code)

requirements.txt:

  torch
  gradio
  diffusers
  transformers
  accelerate
  safetensors

Commit and Deploy

You will find your application running under the app section. To try out the code access the sample interface at

https://huggingface.co/spaces/divivetri/prompt_to_image

📄 References

🎉 Wrap-up

You're now equipped to transform your creative thoughts into images using a very siple code and implementation. Whether you're experimenting for fun or building a creative app, this project is a rewarding first step into generative models.

Try your own prompts, play around, and most importantly, share what you build! 🚀

Transforming Text Prompts into Images with Hugging Face Diffusers