Creating an App that uses Diffusion Model to Generate Images

VelayuthamVelayutham
4 min read

Generative Modelling

It is about teaching a model to generate new data that resembles its training data. There are numerous open access models that are trained on large datasets available for use. These pre-trained models can generate new data.

Diffusion

Diffusion is a type of a generative model used to create images, audio.

Stable Diffusion is a diffusion model capable of generating images.

Hugging Face is one of the repositories of open source model

Creating an App that uses Diffusion Model to Generate Images

To create an app that does the below

  1. Get prompt from user

  2. Generate the image based on the prompt

  3. Make changes to the generated image based on below controlnet conditions

    Canny
    Depth
    Openpose

  4. This will be implemented using python, stable diffusion model, Gradio for User Interface.

Pre-Requisites

This is being implemented in macos for windows/linux different commands has to be used

  1. check if python is installed
python3 --version
  1. Install pytorch, torchvision torch audio ( This is required because, this is the one that does the background process of converting the prompt to image, models are also implemented in pytorch )
brew install pytorch
pip install torchvision torchaudio
  1. Install Gradio ( This will help to create an user interface )
pip install gradio
  1. Install controlnet ( This will provide additional inputs to generated image like depth, canny, openpose )
pip install controlnet_aux

Code


import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16)

if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

pipe.to(device)

canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")


def generate_pics(prompt):

    image = pipe(prompt).images[0]


    canny_image = canny_detector(image)
    pose_image = pose_detector(image)
    depth_image = midas_detector(image)

    return image, canny_image, pose_image, depth_image


gr.Interface(
    fn=generate_pics,
    inputs=gr.Textbox(lines=2, label="Enter your prompt"),
    outputs=[
        gr.Image(label="Generated Image"),
        gr.Image(label="Canny"),
        gr.Image(label="OpenPose"),
        gr.Image(label="Depth"),
    ],
    title="Image Generation using Stable Diffusion",
    description="Enter a prompt to generate an image using Stable Diffusion"
).launch()

Explanation for the Code

  1. Import Required Libraries
import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr
  1. Load the image generation Pipeline using Hugging face diffusion library

stableDiffusionPipeline - It contains the models and logic needed for image generation. This part of code tells to load stable-diffusion-v1-5 model. Torch_dtype = torch.float16, tells the percision with which model has to be loaded. Float16 is faster.

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16)

when this part alone is executed in jupyter notebook this is the result

  1. Define the hardware in which the model has to run. This is being run in mac, so it will use MPS . MPS - Metal performance Shaders. This allows pytorch to run faster image generation using apple’s GPU. If running in windows cuda has to be installed. if running in google colab settings has to be changed.
if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

pipe.to(device)
  1. Creating the User Interface

The user should provide input ( It is the prompt ) and 4 output images should be generated.

the gradio part of the code calls the function generate_pics with user prompt. the function Process and generates and image and the output is displayed

gr.Interface(
    fn=generate_pics,
    inputs=gr.Textbox(lines=2, label="Enter your prompt"),
    outputs=[
        gr.Image(label="Generated Image"),
        gr.Image(label="Canny"),
        gr.Image(label="OpenPose"),
        gr.Image(label="Depth"),
    ],
    title="Image Generation using Stable Diffusion",
    description="Enter a prompt to generate an image using Stable Diffusion"
).launch()
  1. Generating Images - Initialize the canny, openpose, depth detectors from controlnexaux library. For pose and depth argumants need to be provided. provdide pretrained models (lllyasviel/ControlNet). If no argument is provided it will throw an error. canny does not require an argument.
canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")
  1. Generating Images - The image generating function does the following

    1. create an image based on the prompt and store it image. calls the stable diffusion pipeline, uses the prompt given and generates the image.

        image = pipe(prompt).images[0]
      
    2. use the created image to generate 3 different types of image ( canny, pose, detector )

    3. returns all 4 images

    4. 4 images are displayed in the output screen

           canny_image = canny_detector(image)
           pose_image = pose_detector(image)
           depth_image = midas_detector(image)
      
           return image, canny_image, pose_image, depth_image
      

https://huggingface.co/spaces/Velayutham/diffusiondemo

References :

  1. Using Stable Diffusion with python - Andrew Zhu

  2. Hands-oN Generative AI with transformers and Diffusion Models.

0
Subscribe to my newsletter

Read articles from Velayutham directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Velayutham
Velayutham