Diffusion models have emerged as a powerful tool for creative expression. These models, such as Midjourney, DALL-E, and SDXL, harness the principles of diffusion processes to generate stunningly realistic images based on textual prompts. However, the key to unlocking their full potential lies in the art of prompt design/engineering. In this article, we'll explore design techniques, references and experiments with AI-generated images, and provide examples of tailored prompts for different creative domains. By the end, you'll gain an advanced understanding of prompt design, enabling you to unlock the immense potential of synthetic image generation.
Understanding Prompts
At its core, prompt design involves crafting concise and specific textual inputs that guide Diffusion models in generating desired outputs. These prompts provide the AI model with essential information to produce visually compelling results. By carefully crafting prompts, creators can influence the style, content, and overall aesthetic of the generated images. Prompt engineering takes prompt design a step further by fine-tuning the textual inputs to achieve specific objectives through modifiers, weights and parameters. This process requires a understanding of the capabilities and limitations of the underlying model as well as a creative intuition. Through iterative experimentation and refinement, prompts can be optimized to produce high-quality and visually coherent outputs.
From the Stable Diffusion Prompt Book we learn that this model was trained on images in the LAION-5B dataset and was developed by CompVis, Stability AI, and RunwayML. The book helps to get you started quickly and will help you learn essential building blocks and touch the techniques to master Stable Diffusion. This article resumes the book and complements with other helpful information I came across in my experiments.
The secret for generating good images has two parts, first a well-written prompt consisting of modifiers and a good sentence structure and second well-adjusted parameters. You can use the default, but sometimes fine-tuned parameters can generate much better results. "Prompt engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Think of it as the language you need to speak in order to tell an AI model what to draw."[1]
The recommendation is to start by asking questions:
1. What medium do you want? a photo, painting, sketch, illustration, ...
2. What’s the subject?
3. What details do you want?
- Special Lighting. Soft, ambient, ring light, neon
- Environment. Indoor, outdoor, underwater, in space
- Color Scheme. Vibrant, dark, pastel, neon
- Point of view. Front, Overhead, Side, Aerial view, Isometric view
- Background. Solid color, nebula, forest, landscape
4. In a specific art style? 3D render, Low Poly, Pixar Renders
5. A specific photo type? Macro, telephoto, Polaroid
"This is not an all-inclusive list, but will help you get great results when you start your prompt crafting journey. Don’t be afraid to experiment the more you try different prompts the better you will become."[1]
With the answers of the questions you create a complete sentence:
"A painting of a cute goldendoodle wearing a suit, natural light, in the sky, with bright colors, by Studio Ghibli."[1]
The earlier a word is in the sentence, the more importance it will be given. "The order and presentation of our desired output is almost as an important aspect as the vocabulary itself. It is recommended to list your concepts explicitly and separately than trying to cramp it into one simple sentence."[1]
Midjourney recommends to be clear about any context or details and to think about:
Subject: person, animal, character, location, object
Medium: photo, painting, illustration, sculpture, doodle, tapestry
Environment: indoors, outdoors, on the moon, underwater, in the city
Lighting: soft, ambient, overcast, neon, studio lights
Color: vibrant, muted, bright, monochromatic, colorful, black and white, pastel
Mood: sedate, calm, raucous, energetic
Composition: portrait, headshot, closeup, birds-eye view
Prompt structure to get started:
Medium+shot type - artist/reference style - subject - description - environment - colors - light - camera - mood
After experimenting with this tips and different Diffusion models, you understand that beginning and the end has more importance and sometimes the middle part is ignored or adjectives apply to the whole, not only the subject. You will realize the importance of modifiers, weights and parameters to generate great outcomes. You have to look up each model and version to inform yourself of special modifiers or magic words.
Modifiers
In the Stable Diffusion Prompt Book modifiers "are words that can change the style, format, or perspective of the image. There are certain magic words or phrases that are proven to boost the quality of the image." There are many types of modifiers, the prompt book gives you a list and I added some additional tips.
Photography
- Shot type: Close-up, Extreme Close-up, POV, Medium shot, Long shot
- Style: polaroid, Monochrome, Long exposure, Color splash, Tilt-shift
- Lighting: Soft, Ambient, Ring, Sun, Cinematic, Spotlight, Rim lighting, Sunlight, Backlight, Studio lighting, Volumetric, Crepuscular rays, dimly lit
- Context: Indoor, Outdoor, At night, In the park, Studio
- Lens: Wide-angle (16mm to 24mm), Telephoto, 24mm, EF 70mm, Bokeh, 100mm, 35mm, 50mm, 800mm
- Device: iPhone X, CCTV, Nikon Z FX, Canon, Gopro, Hasselblad, Leica
Tip: Increase the weight of the keyword if you don’t see the effect.
Example prompts from the Stable Diffusion Prompt Book with SDXL Base without refiner:
- POV, Long exposure, Grey cat, Ring lighting, At night, 24mm, Nikon Z FX
- Medium shot, Color splash, Bunny, Sun, In the park, EF 70mm, Canon
- Long-shot, Tilt-shift (is the rotation of the lens plane relative to the image plane, called tilt, and movement of the lens parallel to the image plane, called shift), Ferrari, Cinematic, studio, Bokeh, Gopro
Photography Styles
- Polaroid: Still photo of a child sitting in the middle of a wide empty city street, his back to the camera, symmetrical, polaroid photography, highly detailed, crisp quality
- Tilt-Shift: Photo of construction site, workers, tilt shift effect, bokeh, Nikon
- Product Shot: Product shot of Nike shoes, with soft vibrant colors, 3D blender render, modular constructivism, blue background, physically based rendering, centered
- Long Exposure: An aerial view of a city at night, long exposure, Instagram contest
- Portrait: Portrait photo of a stormtrooper with his beautiful wife on his wedding day
- Color-Splash: Color splash wide photo of red phone booth in the middle of an empty street, detailed, mist, soft vignette
- Monochrome: Photo of a staircase in an abandoned building, symmetrical, monochrome photography, highly detailed, crisp quality and light reflections, 100mm lens
- Satellite: Google Earth satellite image of New York City, detailed buildings and streets
Cameras
- GoPro: Monkey swimming, GoPro footage
- CCTV: Darth Vader at a convenience store, pushing shopping cart, CCTV still, high-angle security camera feed
- Drone: Drone photo of Tokyo, city center
- Thermal: Thermal camera footage from a helicopter, war scene
- Hasselblad 500C or CM: by Man Ray, waves of the ocean, clouds in the sky, smooth texture, cinematic, spotlight, Hasselblad 500CM
- Leica M3: by Man Ray, empty beach in front of the ocean, smooth texture, spotlight lighting, shoot by leica (Leica M3, Leica 50mm)
Lenses
- Telephoto: Alligator emerging from water, telephoto lens
- Fish-eye: Night club, people dancing, Fish-eye lens
- 800mm: Photo of a hummingbird, 800mm lens
- Macro: Photo of a ladybug-bee hybrid standing on a tulip, macro lens
Lighting
- Nostalgic: Fallout concept art school interior render grim, nostalgic lighting, Unreal Engine 5
- Purple Neon: Fallout concept art school interior render grim, realistic purple neon lighting, Unreal Engine 5
- Sun Rays: Fallout concept art school interior render grim, sun rays coming through window, Unreal Engine 5
Art Mediums
- Chalk: A sidewalk chalk painting of of beautiful landscape
- Graffiti: Wall graffiti art of astronaut holding a super soaker
- Airbrush painting: Airbrush painting of a tiger
- Water Colors: Watercolor painting of sunset behind mountains, detailed, vaporwave aesthetic
- Oil Painting: Oil painting of human Rick Sanchez, contest winner
- Clay: Clay model of a city, studio lighting
- Fabric: Crochet doll of Spiderman, studio lighting
- Pencil Drawing: Pencil painting of the throne from Game of Thrones
- Wood: Modern spiral-shaped table design, made of wood, studio lighting
- Movie still: Movie still of a beautiful man
- Tattoo art: Tattoo art of a beautiful flower
- Pixel art: Pixel art of a cat
Artists
- Portrait: Derek Gores, Miles Aldridge, Jean-Baptiste Carpeaux, Anne-Louis Girodet
- Landscape: Alejandro Burdisio, Jacques-Laurent Agasse, Andreas Achenbach, Cuno Amiet
- Horror: H.R. Giger, Tim Burton, Andy Fairhurst, Zdzislaw Beksinski
- Anime: Makoto Shinkai, Katsuhiro Otomo, Masashi Kishimoto, Kentaro Miura
- Sci-fi: Chesley Bonestell, Karel Thole, Jim Burns, Enki Bilal
- Photography: Ansel Adams, Ray Eames, Peter Kemp, Ruth Bernhard, Man Ray
- Concept artists (video game): Emerson Tung, Shaddy Safadi, Kentaro Miura
Portrait Artists: Using an artist known for doing portraits can be helpful in creating a specific style. Some artists style have a very profound effect and others have just a subtle effect.
Landscape Artists: When making a landscape, it's smart to specify the time of day (morning, noon, or night) and to set the season.
Horror Artists: Horror artists are known for creating chilling images, but they can be used to make pleasing images when mixed with other artists.
Anime Artists: It’s important when using anime artists to keep in mind the style they focus on and what time period they are from.
Sci-fi Artists: These tend to have very distinctive styles. Remember that you can not only use traditional art mediums but also artists from films.
Photography Artists: You can use noted photographers in your prompts. Try to use landscape or portrait photographers depending on what you are focusing on.
Concept Artists (Video Games): When it comes to concept artists, some will make scenes better while others will make better character designs.
Advanced Technique - Mixing Artist Styles: Building a prompt with artists can refine your image to something more, you are not limited to two artists, use as many as you want. Experiment and notice the subtle ways that the artist styles combine.
Also you can apply styles, like:
- Academicism painting
- Pop-art
- Surrealism painting
- Art deco illustration
- Avant-garde painting
- Classicism painting
- Op Art
External Links to Artist Reference Materials
- SDXL-artists-browser
- Google Arts and Culture
- Stable Art Artist Guide
- SD Artist Collection
- The AI Art - Modifiers Guide
Illustration
3D illustrations
Stable diffusion can be used to create any 3D scene or object you can imagine!
- Cute panda, origami art
- Needle felted scene from the Simpsons, highly detailed, tilt shift, highly textured, action
- Isometric assets: Tiny cute isometric kitchen in a cutaway box, soft smooth lighting, soft colors, 100mm lens, 3d blender render
- Low Poly: kawaii low poly squirrel character, 3d isometric render, white background, ambient occlusion, unity engine
- Pixar Renders: 3d fluffy Lion, closeup cute and adorable, cute big circular reflective eyes, long fuzzy fur, Pixar render, unreal engine cinematic smooth, intricate detail, cinematic
- 3D Item Render: Tiny isometric Alarm Clock, soft smooth lighting, soft colors, 3d blender render, trending on polycount, modular constructivism, physically based rendering
More illustrations
- Children’s book: Elephant-turtle hybrid, in Children’s book illustration style
- Vector: Vector illustration of Living Room in Flat Style, pastel color palette
- Scientific Illustration: Anatomy of Pikachu, dissection Scientific illustration from a biology book
- Comic: Retro comic style artwork, highly detailed batman, comic book cover, symmetrical, vibrant
- Caricature: Caricature art of spiderman sitting on a bed having a nervous breakdown
- Propaganda Poster: USSR propaganda poster. Eat Oreo!
- Movie Poster: Adventurous trash can, movie poster
- Psychedelic Art: Hypnotic illustration of a dear face, hypnotic psychedelic art by Dan Mumford, pop surrealism, dark glow neon paint, mystical, Behance
- Splash Art: Splash art of an armored mage channeling arcane magicks, mana shooting from his hands, mystical energy in the air, action shot, heroic fantasy art, special effects, hd octane render
- Ukiyo-e: Peppa pig, in Ukiyo-e style
- Stickers: Die-cut sticker, Cute kawaii Goldendoodle character sticker, white background, illustration minimalism, vector, pastel colors
- Fantasy Maps: DnD map with roads, for printing, highly detailed, with many towns
- Pop up paper card: pop up paper card of a beautiful city
Character design
When it comes to creating a character you want to first describe the broad description of them like "male orc" then adding more to them like "metallic armor". After that building the details while generating the images and make sure to add artists fitting the person.
Emotions
Simple feelings modifiers can set the atmosphere of the scene!
Positive emotions
- Cosy: Cosy vintage bedroom, octane render by weta digital, exotic colorful pastel, ray traced lighting and reflections
- Romantic: Photo of a couple shopping, romantic lighting
- Joyful: Joyful photo of a husky puppy splashing water at the beach, canon eos r3
- Energetic: Energetic waves of the ocean
- Hope: Woman, filled with hope, in a beautiful dress on the beach
- Lust: Painting of a couple, filled with lust, by mike mignola
- Peaceful: A peaceful Japanese city street, dreamy, soft colors, studio ghibli style
- Satisfaction: Old man looking at the camera, filled with satisfaction, Canon EOS 5D Mark IV
Negative emotions
- Depressing: Depressing photo, futuristic park
- Loneliness: Girl sitting in window, reading a book, loneliness
- Grim: Grim painting of a lake with ducks
- Regret: Painting of a man looking at photo album, filled with regret
- Suffering: Digital painting showing the suffering of a woman, sitting on a bench in the forest, by goro fujita
- Hopelessness: Man, hopelessness, black and white, looking into the camera, sketch, intricate details
- Fear: Child running towards the camera, in fear, by atey ghailan and mike mignola
- Disgust: Photo of a child looking at his food with disgust
Aesthetics
Vibrant
- Weirdcore: Weirdcore image of a zoo
- Dreamcore: Photo of neighborhood, Dreamcore style
- Vaporwave: Vaporwave pool
Gloomy
- Liminal Space: Flooded, liminal space, underground city carpark, lighting with lensflares, photorealistic 8 k, eerie
- After Hours: After hours, stairs to the park
- Brutalism: Abandoned building, brutalism architecture, flowers growing
- Post-Apocalyptic: Photo in a Post-Apocalyptic town, with houses and cars
Historic
- Baroque: Painting of Danny DeVito, in Baroque cloth and style
- Sovietwave: People walking in the street, Sovietwave
- Wild West: Photo of a car driving in a town, Wild West
- Film Noir: Chandler and monica, detailed faces, Film Noir style
SDXL has 77 predefined styles you can apply:
3D Model, Abstract, Advertising, Alien, Analog Film, Anime, Architectural, Cinematic, Collage, Comic Book, Craft Clay, Cubist, Digital Art, Disco, Dreamscape, Dystopian, Enhance, Fairy Tale, Fantasy Art, Fighting Game, Film Noir, Flat Papercut, Food Photography, GTA, Gothic, Graffiti, Grunge, HDR, Horror, Hyperrealism, Impressionist, Isometric Style, Kirigami, Legend of Zelda, Line Art, Long Exposure, Lowpoly, Minecraft, Minimalist, Monochrome, Nautical, Neon Noir, Neon Punk, Origami, Paper Mache, Paper Quilling, Papercut Collage, Papercut Shadow Box, Photographic, Pixel Art, Pointillism, Pokémon, Pop Art, Psychedelic, RPG Fantasy Game, Real Estate, Renaissance, Retro Arcade, Retro Game, Silhouette, Space, Stacked Papercut, Stained Glass, Steampunk, Strategy Game, Street Fighter, Super Mario, Surrealist, Techwear Fashion, Texture, Thick Layered Papercut, Tilt-Shift, Tribal, Typography, Watercolor, Zentangle, base
Magic words
HDR, UHD, 64K: Quality words like HDR, UHD, 4K, 8k, and 64K can make a dramatic difference.
- A Landscape
- A landscape, HDR, UHD, 64K
Highly detailed: Quality words like highly detailed can make a dramatic difference.
- Joann of Arc portrayed by Jennifer Lawrence, highly detailed, concept
Studio lighting: Studio lighting could really add some nice texture to the image
- A cinematic film still of Morgan Freeman starring as 50 Cent, portrait, 40mm lens, shallow depth of field, close up, studio lighting
Professional: Adding professional, can greatly improve the color contrast and details in the image
- Empty temple, professional photograph
Trending on artstation
- Portrait photo of a beautiful female cyborg from 1920, trending on artstation
Unreal engine
- Hyper realistic 4 d model, unreal engine
Vivid Colors: Adding Vivid Colors, adds life to your images
- Photo from a city street in the 1970s, vivid colors
Bokeh: Bokeh blurs the background and highlights the subject. It’s like iPhone portrait mode.
- A cute totoro in a yard, bokeh
High resolution scan: Want a historic looking photo? Add "High resolution scan"
- Aerial view of New York City, 1930, High resolution scan
Advanced Prompting
Commas
In Stable Diffusion prompts commas aren't strictly necessary but can enhance clarity by defining distinct attributes or concepts. Proper prompt structuring, including the use of commas, can influence the AI's interpretation and the quality of the generated image. Commas improve readability and help specify a list of attributes, styles or elements.
Use commas to separate the subject, setting, additional elements and the reference style. A comma structured prompts allows you define weights for segments. However, the importance of commas may vary depending on the model's training and its interpretation of prompt structure.
Weights
Prompt weights allow you to assign varying levels of importance to different elements of your prompt. Utilizing prompt weights can enhance the accuracy, efficiency, and control over the output. Prompt weights are indicated by a double colon ("::") followed by a number, such as "::2". This number specifies the influence the prompt element should have on the generated image. The default weight is 1. Weights can be negative to exclude elements.
Relative scaling of weights is crucial. A weight of 0.5 compared to 2 impacts the image similarly to a weight of 1 compared to 4. Beyond a certain range weight scaling has diminishing returns, making weights over 10 is usually unnecessary.
An equivalent way to adjust keyword strength is to use Prompt Parentheses () and Prompt Brackets [].
Parentheses can increase and decrease, brackets can only decrease.
(keyword) increases the strength of the keyword by a factor of 1.1 and is the same as (keyword:1.1).
[keyword] decrease the strength by a factor of 0.9 and is the same as (keyword:0.9).
[[keyword]] is equivalent to (keyword: 0.81)
[[[keyword]]] is equivalent to (keyword: 0.73)
Keyword blending: You can mix two keywords and the proper term is prompt scheduling. The syntax is [keyword1 : keyword2: factor] The factor, a number between 0 and 1, controls at which step keyword1 is switched to keyword2. For more information read the article Blending in Stable Diffusion.
Negative prompts: You can use the negative prompt to tell Stable Diffusion what not to include in the image. This is especially useful when paired with using the same seed for the new generation.
In summary, start with a clear description of the desired image or concept. Break down the prompt into key components and assign its weights accordingly. Highlight essential elements with higher weights and prioritize their placement in your prompt. You can also use negative weights or negative prompts to exclude elements. Continuously experiment and adjust weights for optimal results.
In Midjourney you can use the effects of '--style raw'. "Images made with --style raw have less automatic beautification applied, which can result in a more accurate match when prompting for specific styles."[2] It reduces interpretation and gives you more control over the generated image.
Stable Diffusion Parameters
Resolution
This parameter increases the required VRAM, and the time needed to generate. Choose the Width and Height of the generated images. It’s important to know at what resolution the model was trained, most models are trained for 512x512 pixels, except SDXL 1024x1024, and in general these dimensions provide the best quality and composition. Even with all parameters fixed, changing the resolution will completely change the generated image. Although it may have similar colors and composition. If you want bigger images, you use an Upscaler.
Classifier Free Guidance (CFG) – default is 7
You can see this parameter as a “Creativity vs. Prompt” scale. Lower numbers give the AI more freedom to be creative, while higher numbers force it to stick to the prompt.
Prompt: a red bird drinking water from a lake, children's book painting CFG: 0 Completely ignores the prompt
CFG: 4 Missing the red color
CFG: 7 Good balance
CFG: 15 Too high Starts creating artifacts
This parameter does not affect the VRAM needed, or the generation time.
Step count
This parameter does not affect the VRAM needed, but increasing it is directly proportional to the time it takes to generate an image. Stable Diffusion creates an image by starting with a canvas full of noise and denoise it gradually to reach the final output, this parameter controls the number of denoising steps. Usually, higher is better but to a certain degree, for beginners it’s recommended to stick with the default or a lower count, like 30.
Seed – default is “random”
Seed is a number that controls the initial noise. The seed is the reason that you get a different image each time you generate when all the parameters are fixed. By default, on most implementations of Stable Diffusion, the seed automatically changes every time you generate an image. You can get the same result back if you keep the prompt, the seed and all other parameters the same.
Sampler
Diffusion samplers are the method used to denoise the image during generation, and since they differ in the way of calculating the next step in the image production, they take different durations and different number of steps to reach a usable image. We suggest beginners to use DDIM since it's fast and can usually generate good images with only 10 steps, making it easy and fast to experiment and improve. Give Euler-a a try, it is fast, too.
Important tips
When to use what CFG value?
- CFG 2 - 6: Creative, but might not follow the prompt
- CFG 7 - 10: Recommended for most prompts. Good balance between creativity and guided generation
- CFG 10 - 15: When you’re sure that your prompt is good/specific enough.
- CFG 16 - 20: Not generally recommended unless the prompt is well detailed. Might affect coherence and quality
In prompts with multiple subjects, it’s a good idea to increase the CFG scale.
The power of seeds
Some seed are just better, so try to save a good seed and slightly tweak the prompt to get what you’re looking for while keeping the same composition. This can also be used to test the effect of different modifiers.
Token efficiency
Your prompt is limited to 75 tokens. If you are working with a long prompt try to be efficient with words. A typical example is when using an artist as a modifier to get a particular style. Here are a few prompts and their token counts.
â—Ź A horse in the style of Vincent Van Gogh (11)
â—Ź A horse by Vincent Van Gogh (7)
â—Ź A horse by Van Gogh (6)
â—Ź Horse by Van Gogh (6)
The order of words can be as important as the words themselves. This trick is especially useful when trying to make unusual creations.
- Pink ice cream truck with machine gun mounted on it, technical
- Machine gun mounted on top of a pink ice cream truck, technical
In the above example, the machine gun doesn’t appear unless you put it in the start of the prompt.
Other features Img2Img in/out painting
Sketch to professional art: With img2img you can turn a simple sketch into beautiful art using a text description.
Img2Img Variation: Img2Img is useful for creating a variation of a image and getting similar images. If you wish to create a image but something isn't quite right you can use Img2Img to remake the image.
Inpainting: This technique can be used to fix a part of an image, by completely removing or changing the subject in an image, or just fixing a small detail.
Outpainting/uncropping: You can use this technique to expand real/generated images, it can be very useful since Stable Diffusion likes to crop images.
OpenArt Showcase prompts in Stable Diffusion XL
My recommended reading list to start with Diffusion prompting
Stable Diffusion prompt: a definitive guide
The art of prompting: An introduction to Midjourney
The Art of AI Prompt Crafting: A Comprehensive Guide for Enthusiasts
MidJourney: Exploring the World of Analog Photography
Other prompt examples
20+ Incredible MidJourney Prompts That Will Blow Your Mind ( Part 1 )
How to Master Midjourney Prompts
Midjourney Cheat Sheet 2024: Copy-Paste Prompt for any Style
Best Midjourney prompts for product photography
Midjourney Prompt Weight Mastery
10 Midjourney AI Image Prompts for Futuristic Cities
Prompt Guide for Stable Diffusion XL
Top 40 useful prompts for Stable Diffusion XL
80+ Best Stable Diffusion Styles
Comments