Generative Modelling

It is about teaching a model to generate new data that resembles its training data. There are numerous open access models that are trained on large datasets available for use. These pre-trained models can generate new data.

Diffusion

Diffusion is a type of a generative model used to create images, audio.

Stable Diffusion is a diffusion model capable of generating images.

Hugging Face is one of the repositories of open source model

Creating an App that uses Diffusion Model to Generate Images

To create an app that does the below

Get prompt from user
Generate the image based on the prompt
Make changes to the generated image based on below controlnet conditions

Canny
Depth
Openpose
This will be implemented using python, stable diffusion model, Gradio for User Interface.

Pre-Requisites

This is being implemented in macos for windows/linux different commands has to be used

check if python is installed

python3 --version

Install pytorch, torchvision torch audio ( This is required because, this is the one that does the background process of converting the prompt to image, models are also implemented in pytorch )

brew install pytorch
pip install torchvision torchaudio

Install Gradio ( This will help to create an user interface )

pip install gradio

Install controlnet ( This will provide additional inputs to generated image like depth, canny, openpose )

pip install controlnet_aux

Code


import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16)

if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

pipe.to(device)

canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")


def generate_pics(prompt):

    image = pipe(prompt).images[0]


    canny_image = canny_detector(image)
    pose_image = pose_detector(image)
    depth_image = midas_detector(image)

    return image, canny_image, pose_image, depth_image


gr.Interface(
    fn=generate_pics,
    inputs=gr.Textbox(lines=2, label="Enter your prompt"),
    outputs=[
        gr.Image(label="Generated Image"),
        gr.Image(label="Canny"),
        gr.Image(label="OpenPose"),
        gr.Image(label="Depth"),
    ],
    title="Image Generation using Stable Diffusion",
    description="Enter a prompt to generate an image using Stable Diffusion"
).launch()

Explanation for the Code

Import Required Libraries

import torch
from diffusers import StableDiffusionPipeline
from controlnet_aux import CannyDetector, OpenposeDetector, MidasDetector
import gradio as gr

Load the image generation Pipeline using Hugging face diffusion library

stableDiffusionPipeline - It contains the models and logic needed for image generation. This part of code tells to load stable-diffusion-v1-5 model. Torch_dtype = torch.float16, tells the percision with which model has to be loaded. Float16 is faster.

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16)

when this part alone is executed in jupyter notebook this is the result

Define the hardware in which the model has to run. This is being run in mac, so it will use MPS . MPS - Metal performance Shaders. This allows pytorch to run faster image generation using apple’s GPU. If running in windows cuda has to be installed. if running in google colab settings has to be changed.

if torch.backends.mps.is_available():
    device = "mps"
elif torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"

pipe.to(device)

Creating the User Interface

The user should provide input ( It is the prompt ) and 4 output images should be generated.

the gradio part of the code calls the function generate_pics with user prompt. the function Process and generates and image and the output is displayed

gr.Interface(
    fn=generate_pics,
    inputs=gr.Textbox(lines=2, label="Enter your prompt"),
    outputs=[
        gr.Image(label="Generated Image"),
        gr.Image(label="Canny"),
        gr.Image(label="OpenPose"),
        gr.Image(label="Depth"),
    ],
    title="Image Generation using Stable Diffusion",
    description="Enter a prompt to generate an image using Stable Diffusion"
).launch()

Generating Images - Initialize the canny, openpose, depth detectors from controlnexaux library. For pose and depth argumants need to be provided. provdide pretrained models (lllyasviel/ControlNet). If no argument is provided it will throw an error. canny does not require an argument.

canny_detector = CannyDetector()
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
midas_detector = MidasDetector.from_pretrained("lllyasviel/ControlNet")

Generating Images - The image generating function does the following
1. create an image based on the prompt and store it image. calls the stable diffusion pipeline, uses the prompt given and generates the image.
```
  image = pipe(prompt).images[0]
```
2. use the created image to generate 3 different types of image ( canny, pose, detector )
3. returns all 4 images
4. 4 images are displayed in the output screen
```
     canny_image = canny_detector(image)
     pose_image = pose_detector(image)
     depth_image = midas_detector(image)

     return image, canny_image, pose_image, depth_image
```

Hugging Face App link

https://huggingface.co/spaces/Velayutham/diffusiondemo

References :

Using Stable Diffusion with python - Andrew Zhu
Hands-oN Generative AI with transformers and Diffusion Models.

Creating an App that uses Diffusion Model to Generate Images

Table of contents

Generative Modelling

Diffusion

Creating an App that uses Diffusion Model to Generate Images

Pre-Requisites

Code

Explanation for the Code

Hugging Face App link

References :

Subscribe to my newsletter

Velayutham

Velayutham