Creating an App that uses Diffusion Model to Generate Images

Generative Modelling
It is about teaching a model to generate new data that resembles its training data. There are numerous open access models that are trained on large datasets available for use. These pre-trained models can generate new data.
Diffusion
Diffusion is a type of a generative model used to create images, audio.
Stable Diffusion is a diffusion model capable of generating images.
Hugging Face is one of the repositories of open source model
Creating an App that uses Diffusion Model to Generate Images
To create an app that does the below
Get prompt from user
Generate the image based on the prompt
Make changes to the generated image based on below controlnet conditions
Canny
Depth
OpenposeThis will be implemented using python, stable diffusion model, Gradio for User Interface.
Pre-Requisites
This is being implemented in macos for windows/linux different commands has to be used
- check if python is installed
python3 --version
- Install pytorch, torchvision torch audio ( This is required because, this is the one that does the background process of converting the prompt to image, models are also implemented in pytorch )
brew install pytorch
pip install torchvision torchaudio
- Install Gradio ( This will help to create an user interface )
pip install gradio
- Install controlnet ( This will provide additional inputs to generated image like depth, canny, openpose )
pip install controlnet_aux
Code
import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16)
if torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
pipe.to(device)
canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")
def generate_pics(prompt):
image = pipe(prompt).images[0]
canny_image = canny_detector(image)
pose_image = pose_detector(image)
depth_image = midas_detector(image)
return image, canny_image, pose_image, depth_image
gr.Interface(
fn=generate_pics,
inputs=gr.Textbox(lines=2, label="Enter your prompt"),
outputs=[
gr.Image(label="Generated Image"),
gr.Image(label="Canny"),
gr.Image(label="OpenPose"),
gr.Image(label="Depth"),
],
title="Image Generation using Stable Diffusion",
description="Enter a prompt to generate an image using Stable Diffusion"
).launch()
Explanation for the Code
- Import Required Libraries
import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr
- Load the image generation Pipeline using Hugging face diffusion library
stableDiffusionPipeline - It contains the models and logic needed for image generation. This part of code tells to load stable-diffusion-v1-5 model. Torch_dtype = torch.float16, tells the percision with which model has to be loaded. Float16 is faster.
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16)
when this part alone is executed in jupyter notebook this is the result
- Define the hardware in which the model has to run. This is being run in mac, so it will use MPS . MPS - Metal performance Shaders. This allows pytorch to run faster image generation using apple’s GPU. If running in windows cuda has to be installed. if running in google colab settings has to be changed.
if torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
pipe.to(device)
- Creating the User Interface
The user should provide input ( It is the prompt ) and 4 output images should be generated.
the gradio part of the code calls the function generate_pics with user prompt. the function Process and generates and image and the output is displayed
gr.Interface(
fn=generate_pics,
inputs=gr.Textbox(lines=2, label="Enter your prompt"),
outputs=[
gr.Image(label="Generated Image"),
gr.Image(label="Canny"),
gr.Image(label="OpenPose"),
gr.Image(label="Depth"),
],
title="Image Generation using Stable Diffusion",
description="Enter a prompt to generate an image using Stable Diffusion"
).launch()
- Generating Images - Initialize the canny, openpose, depth detectors from controlnexaux library. For pose and depth argumants need to be provided. provdide pretrained models (lllyasviel/ControlNet). If no argument is provided it will throw an error. canny does not require an argument.
canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")
Generating Images - The image generating function does the following
create an image based on the prompt and store it image. calls the stable diffusion pipeline, uses the prompt given and generates the image.
image = pipe(prompt).images[0]
use the created image to generate 3 different types of image ( canny, pose, detector )
returns all 4 images
4 images are displayed in the output screen
canny_image = canny_detector(image) pose_image = pose_detector(image) depth_image = midas_detector(image) return image, canny_image, pose_image, depth_image
Hugging Face App link
https://huggingface.co/spaces/Velayutham/diffusiondemo
References :
Using Stable Diffusion with python - Andrew Zhu
Hands-oN Generative AI with transformers and Diffusion Models.
Subscribe to my newsletter
Read articles from Velayutham directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
