Blending in Stable Diffusion

4 minutes

A sustainable building in a park, Model: sd_xl_base_1.0 + Refiner

Blending (keywords, subjects, artists, celebrities or styles) in Stable Diffusion, is the title of the article to group different prompting techniques that allows users to create complex and nuanced images by combining or blending multiple concepts, styles, and elements. By assigning different weights to keywords, using parentheses and brackets for emphasis, and iterating through prompt variations, users can fine-tune their prompts to generate highly customized and detailed images. This article explores some aspects of these prompting techniques, offering insights into syntax and keyword weighting to help you make the most of Stable Diffusion's capabilities.

Simple blending

The most simple form of blending is to concatenate two words with a colon:

A portrait of a woman:cat wearing a hat

example for the prompt 'oil painting, woman:cat' in mdjrny-v4 — 'oil painting, woman:cat' in mdjrny-v4

Keyword weighting

A way to control a scene is by assigning different weights to the elements or keywords, this controls how much influence or probability each keyword has on the final image. For example, specifying `a dog:2, a cat:0.5` makes that the cat will appear less or the cat is having a lesser influence than the dog. This technique allows for fine-tuning probabilities into "balancing" elements.

The order of elements has a 'higher priority', so without assigning these weights, the dog already was more important, because the cat is second, increasing the weight of dog may also generate more dogs. The keyword weighting technique can be applied to all elements of the prompt, even lighting. If your scene needs to have an element, start the prompt with that keyword.

You can use parentheses, colon or double colon, the last is a hard break to split up the parts of a prompt or to add weight.

Example hallucinations:

A sustainable green city with a (spectacular sky:1.5) showcasing (breathtaking hues of sunrise:2), greenery, and eco-friendly architecture (by Zaha Hadid:4), HDR, UHD
A sustainable green city with a spectacular sky:1.5 ...
A sustainable green city with a spectacular sky::1.5 ...

A sustainable green city, with a spectacular sky:1.5 showcasing breathtaking hues of sunrise:2, greenery, and eco-friendly architecture by Zaha Hadid:4, HDR, UHD

Syntax and Emphasis

Using the right syntax is crucial for effective prompting. Keywords and their weights in numerical values is one way, parentheses `()` and square brackets `[]` can also adjust the emphasis on parts of the prompt. For instance, `(keyword1)` increases the emphasis, while `[keyword1]` decreases it. This flexibility helps in highlighting or diminishing specific aspects of the prompt, giving you more control over the image generation process.

Example hallucination:

A sustainable green city with a (spectacular sky) showcasing ((breathtaking hues of sunrise)), greenery, and eco-friendly architecture (((by Zaha Hadid))), HDR, UHD

Examples of imaginary collaborations of two architects:

A sustainable building in a park with a (spectacular sky:1.5), center, showcasing breathtaking hues of (sunset:2), greenery, and eco-friendly architecture by Toyo Ito:4, by Zaha Hadid:4, HDR, UHD

A sustainable green building in a park with a (spectacular sky:1.5), center, showcasing breathtaking hues of (sunset:2), greenery, and eco-friendly architecture by Toyo Ito:4, by Zaha Hadid:4, HDR, UHD, , Model: sd_xl_base_1.0, Refiner: sd_xl_refiner_1.0 — A sustainable green building in a park with a (spectacular sky:1.5), center, showcasing breathtaking hues of (sunset:2), greenery, and eco-friendly architecture by Toyo Ito:4, by Zaha Hadid:4, HDR, UHD, Model: sd_xl_base_1.0, Refiner: sd_xl_refiner_1.0

Blending faces

Consistent face or 'stereotyping'

If you want a consistent face [1], you can use multiple celebrity names with weights before the prompt:

(Emma Watson:0.5), (Scarlett Johansson:0.9), (Angelina Jolie:1.2), photo of young woman, perfect eyes, highlight hair,...

There are more methods for consistent faces like 'The ReActor extension', 'Dreambooth', 'LoRA' and 'ControlNet IP adapter face'. If you need to create consistent character from different viewing angles, you can follow this tutorial.

When working with faces using either GFPGAN or CodeFormer will improve the quality of the outcome.

(Emma Watson:0.5), (Scarlett Johansson:0.9), (Angelina Jolie:1.2), photo of young woman — (Emma Watson:0.5), (Scarlett Johansson:0.9), (Angelina Jolie:1.2), Model: sd_xl_base_1.0, Refiner: sd_xl_refiner_1.0

(Robert Downey Jr:0.5), (Benedict Cumberbatch:0.5), (Brad Pitt:1.2),Model: sd_xl_base_1.0, Refiner: sd_xl_refiner_1.0

Keyword blending

In order to make a transition between to two subjects, you can use keyword blending to mix two keywords, and the proper term for this is prompt scheduling. The syntax is [keyword1 : keyword2: factor] The factor, a number between 0 and 1, controls at which step in the denoising process keyword1 is switched to keyword2.

(Robert Downey Jr : Brad Pitt : 0.5)

Mixing Artist Styles

Building a prompt with artists can refine your image to something more and you are not limited to two artists, use as many as you want. Dead artist can collaborate in the AI World. Try different artists and style references at the same time and you notice the ways that these combine.

A portrait of a girl wearing a hat and sunglasses::4 coquelicot color::5 art by leonardo da vinci::2 banksy::1 jackson pollock::1, Model: sd_xl_base_1.0 — A portrait of a girl wearing a hat and sunglasses::4, art by leonardo da vinci::2 banksy::1 jackson pollock::1, Model: sd_xl_base_1.0

A portrait of a girl wearing a hat and sunglasses, black dress::4 coquelicot color::5 art by salvador dali::1 leonardo da vinci::1 roy lichtenstein::2, mdjrny-v4 — A portrait of a girl wearing a hat and sunglasses, by salvador dali::1 leonardo da vinci::1 roy lichtenstein::2, mdjrny-v4

A portrait of a girl wearing a hat and sunglasses::4 coquelicot color::5 art by banksy::1 jackson pollock::1 leonardo da vinci::2, mdjrny-v4 — A portrait of a girl wearing a hat and sunglasses::4, art by banksy::1 jackson pollock::1 leonardo da vinci::2, mdjrny-v4

Blending models

The SD checkpoint merger is a new function in SD that allows users to refine AI images by merging up to three models, including custom-trained ones. When merging adjust the multiplier (M) to set their relative weights. A multiplier of 0.5 merges the models equally. Though Stability AI does not officially support third-party interfaces for model merging, AUTOMATIC1111 is a popular choice. Merging different SD models will be a great next experiment.

Early Conclusions

The Iterative Process

Generating images through keyword blending is often an iterative process. Users may need to adjust their prompts based on the initial outputs to better align with their vision. By refining the balance and interaction between different elements in the prompt, users can progressively enhance the quality and relevance of the generated images. Iteration helps in achieving the desired outcome, making prompt engineering a crucial part of the process.

Blending keywords in Stable Diffusion prompts opens up a world of creative possibilities. By understanding and applying techniques such as keyword weighting, emphasis adjustment, and iterative prompt engineering, users can create intricate and personalized images. Whether you're looking to blend different artistic styles, combine various subjects, or explore new scenarios, mastering keyword blending will help you unlock the full potential of Stable Diffusion. Dive in and start experimenting with your prompts to see what unique and captivating images you can create.