Creating doodles with HED detection and ControlNet

相关文章推荐

正直的芹菜 · 宋涛会见印度国大党外事部负责人夏尔马· 4 月前 ·

焦虑的春卷 · 建筑装饰工程技术专业考研方向分析_大学生必备网· 5 月前 ·

霸气的白开水 · word中图片不能完全显示的解决办法_wor ...· 1 年前 ·

笑点低的松球 · 研讨与反思-南方医科大学纪委监察处· 1 年前 ·

风流的大蒜 · 经典咏流传- 知乎· 1 年前 ·

Holistically-nested edge detection (HED) is a deep learning model that uses fully convolutional neural networks and deeply-supervised nets to do image-to-image prediction. HED develops rich hierarchical representations automatically (directed by deep supervision on side replies) that are critical for resolving ambiguity in edge and object boundary detection.

The resultant output looks similar to this:

This improves on Canny edge detection by being able to pick up many more features in the back and foreground of the image. For full details of how to fine tune one of these models, check out Adrian Rosebrock’s PyImageSearch blog: https://pyimagesearch.com/2019/03/04/holistically-nested-edge-detection-with-opencv-and-deep-learning/ .

In this tutorial we will look at how to load pre-trained HED detectors from the HuggingFace library, and use this pretrained model, alongside the Stable Diffusion model, and a controlnet for this specific task. Let’s get going and take the tutorial through building a class to perform this modelling, to loading in an image and then fine tuning based on our image and prompt pair.

Building our class

We will follow steps to build our class which will incorporate the use of HED detection with our control nets to produce a way to alter an image, based on the image’s boundary (edge) with a generation from a stable diffusion pretrained model.

Importing the required packages

Firstly, as with all projects, we need to import the relevant packages we are going to need for this project:

We call our class


    ScribbleControlNet

and then we define our


    __init__()

statement to take in parameters


    image_path

(the path to where the image is stored i.e. the source image to use);


    hed_detector

this is the pretrained HED detector model (we will use the controlnet implementation of this);


    pt_controlnet

(here we use a stable diffusion scribble model pretrained) and


    pt_stablediffusion

(this is the stable diffusion model to use)

We set instance variables for the block so they can be referenced by our class methods later on in the code. We set


    self.image_path


    self.hed

and


    self.controlnet

For the HED detector, ControlNetModel and Stable Diffusion models we use the


    from_pretrained

method to use our transformers package to serialise these models from pretrained. Importantly here, we also adapt our PyTorch tensors to have a different decimal precision, this allows the computation of the various weights in the tensors to be sped up, but we loose a little precision from this

We then load a random noise scheduler for the diffusion process, enable


    
     xformers

for efficient attention mechanisms to be built in the transformer architecture and finally we allow model offloading onto the cpu with


    enable_model_cpu_offload()

. These steps help speed up the computation of the model.

Next, we use an


    assert

statement to check that the image path that is passed into the class, upon instantiation (creating a new instance of the class), is of a


    str(string)

data type. If this passes, then the image is then opened using


    Image.open()

from the PIL (Pillow) image manipulation library.

The next step once the image is loaded is to use the pretrained hed model to load in our image for edge and boundary detection. Here, it will be using my LinkedIn profile to do some wacky antics with later on.

Set the instance variables of


    self.prompt


    self.save_path

and


    self.num_steps

to the linked passed parameters into the parameter block of the code e.g.


    def generate_scribble(self, prompt, save_path=None, num_inf_steps=20):

We then use the


    pipe

instance variable we declared in the


    __init__

block to fine tune our model based on the prompt and image pairing. To get a more realistic photo, increase the inference steps for better quality outputs, however there is a performance cost to doing this, as it will be generally slower and require more memory (both GPU and CPU). We select the first generation and return that by using slicing


    images[0]

The last steps are to check if there is something passed into the


    save_path

parameter, as this default to


    None

and if there is then it will save the generation to a path you desire. Finally, we return the image to be used downstream.

One additional thing we do, is use a dunder method, to overload the


    print()

command, so when we print this we will get a custom print statement saying “Image loaded from {whatever image path you pass to the function}”

Image generation has come along way since the days of the first Generative Adversarial Networks (Goodfellow et al.) and it is still being developed at ground breaking deep. It truly is the era for generative AI in image and text (OpenAI GPT3.5 and 4) and what a time to be alive, as someone who loves creating models of all shapes and sizes.

The code for this tutorial can be found here: ControlNet Playground .

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.