hot	top
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]

Infopost | 2022.11.19

Last time I got Stable Diffusion's Hello World going (with some video card and Python IDE side quests). This time: prompts and parameters.

Strength

Stable diffusion Valentino Rossi Ducati Laguna Seca

The strength parameter is a 0.0-1.0 value that determines how creative the model will be with the image/text inputs. Here's the above photo of Vale with a descriptive prompt demonstrating strength values from 0.1-0.9:

As the machine learning model is given more freedom to redraw the scene, Vale winds up back on a Yamaha, but on a bike that doesn't look quite right. Of course, this is just one run at each strength, another strength 0.9 run might give you something entirely different (but derived from the input image and generally following the prompt).

Prompts

Both txt2img and img2img use textual prompts to re/create images. Lexica.art has a wealth of examples that help provide empirical guidance on prompt writing.

Artist

Stable diffusion reddit meme Will Smith Greg Rutkowski

Source.

The Andy's Blog post from last time mentioned that he saw significantly better results when supplying one or more specific artists in the prompt. Either because his name yields good results or maybe because it's a meme, a large chunk of Lexica uses Greg Rutkowski in its prompt. I'd never heard of Greg Rutkowski until trying out Stable Diffusion, but sure enough he's quite prolific on Artstation which I think was one of Stable Diffusion's sources of tagged imagery.

Format

Source.

I've read a variety of things about comma-separated tokens, natural language, and ordering by importance. The near-nondeterminism of deep learning makes it difficult to draw any conclusions on this.

The joy of AI painting using txt2img

With the basics in mind, I tried a bunch of different prompts, mostly producing 512x512 images that I've downscaled a little bit for this site. In the context of Style Transfer and Dall-e mini, the takeaway from the image dimensions is that the output images are significantly larger than a 256x256 but still require additional work to be postcard size.

An abandoned factory

Starting simple...

ink illustration of an abandoned factory.  retrofuturistic, photorealistic.

One looks like a traditional black and white photo, the other looks like architecturalish drawing. Both have signs of abandonment. One looks modern, the other looks brutalist.

A yellow starfighter

Next I juiced it up a bit, specifying subject color and a background. I pulled a random artist name from Lexica and whatever cgsociety is. And I went rectangular, probably along the wrong axis (as starfighters tend to be horizontal not that it matters in space).

python scripts/txt2img.py 
   --prompt "A highly detailed painting of a futuristic starfighter.  
             Yellow body, glowing cockpit.  Concept art by ian mcque, 
             cgsociety.  Sci-fi.  Stars and nebulas can be seen in the 
             background." 
   --n_samples 8 
   --H 640 --W 320

Some of these look like 70s sci-fi novel cover art. Cool.

CGI tools

Some of the prompts specified a rendering platform (Unreal Engine, Maya, etc.), I think rendering and postprocessing tools are one of the tags in Artstation.

octane rendering of a colorful wormhole.  soft lighting, cyberpunk, 
hyper realistic.

It even has a watermark.

Vocabulary

Six months ago the PUBG crew was wowed that Dall-e knew about battle royales. Stable diffusion does too:

Skaggy style

Likewise, SD knew about Claptrap and Borderlands art style. I used another random Artstation creator from Lexica.

unreal engine rendering of claptrap from borderlands in a field.  
grainy, solarpunk, smooth render. in the style of andrei riabovitchev.

Looking at Andrei's work the Borderlands elements seem to have dominated the style input.

Twist of the wrist

Trying some more subject/scene interplay:

source filmmaker cgi rendering of an aprilia motorcycle with blue accents
in a postapocalyptic waterfall.  cyberpunk, photorealistic.

Stable diffusion motorcycle waterfall cyberpunk

It doesn't look like a Prilly, but SD nailed the postapocalyptic waterfall. Other than some subtle details, the bike looks really good. Another:

cross-processed photograph of a ducati in a town square.  dark, futuristic,

clean composition.

Go with what it knows

SD (as I understand it) is trained on a boatload of real images with human-applied tags. If you've seen the Silicon Valley where Erlich enlists Big Head's college class to tag datasets for him, it's unrewarding work. The neat thing about using Artstation (and others?) as a datasource is that the imagery there is tagged and ready for deep learning consumption.

So while Stable Diffusion might have to guess what a banana with a goatee looks like, it can recreate isomorphic things with considerably more certainty. Likewise, if every instance of something looks very different ("draw me a vehicle") an SD prompt might yield a mashup of those things. But a more refined command like "draw me a Jaguar XJ220" will give it more specific criteria to generate from.

Further, let's consider a Ferrari, here's some mild hyperbole that probably dicatates how well a deep learning algorithm can understand one:

There are lots of photos/drawings of them.
Exotic car photos usually have just a few canonical angles.
They come in like three colors.
They aren't typically modified.
The company has a signature style that is visible in most of their vehicles of a given generation.

So 'car' is bad, 'Bugatti' is good, 'Toyota' is okay. Your mileage may vary.

Monochrome photograph of a Lamborghini parked in the rain.  Contrasty.
Highly detailed but smooth.

The joy of AI repainting using img2img

Switching gears (lmao) to the other side of Stable Diffusion, prompts are just as important to img2img.

Drifter

I grabbed a photo from the archive and asked Stable Diffusion to recreate it in something contextually-similar: Gran Turismo. Writing a specific prompt (not sure if this is good or bad) was a lot easier since I just had to describe what's in the image.

cgi rendering in the gran turismo engine.  nissan 350z drifter on a
racetrack. tires smoking, front wheels at opposite lock.  stickers 
cover the scuffed bodywork.

It's definitely Gran Turismo-like. That last car's spoiler seems to be glitching though.

Archangel

Since Stable Diffusion knows about Borderlands, I thought I'd try a screencap of everyone's favorite Turian police sniper? Just to throw it a curveball, I asked for a plate of nachos.

detailed 35mm photograph of garrus vakarian from mass effect.  standing in

front of colorful artwork, holding a plate of nachos.

Photo -> cgi worked pretty well with the 350Z, cgi -> photo wasn't so good. 33% nacho rate though. Not bad.

AI green screen

Back to contextually-similar, recognizable things, I asked Stable Diffusion to reimagine a paintball photo as a highly specific war photograph.

world war 1 photograph of troops crossing no man's land wearing gas masks.
smoke billows in the background.  mortars explode nearby.

Faces and hands

Source.

Things frequently go awry when Stable Diffusion tries to do faces and hands, with the exception of passport-headshot-style photos that are probably well-represented in its dataset. Generated words resemble text but aren't coherent (see above where Valentino Rossi is riding an "OIHNJOD").

Photo -> CGI: good
CGI -> photo: meh
Photo -> painting: ???

I wanted to see how a photo would look as a painting.

digital painting of a surfer sitting in the lineup.  wearing a shark tooth
necklace because he's a kook.  very detailed, art by greg rutkowski.  
pastel colors.  green water, blue sky.

Stable Diffusion was pretty shy about redrawing the subject's face. Other things turned out rather incoherent. I said Greg Rutkowski, not GWB.

Iterating img2img

You guessed it, img2img outputs can be pruned and fed back into the algo with an identical or modified prompt. Here are some (not the same prompt):

A generative pipeline

Now for an end-to-end run: txt2img followed by numerous passes of img2img with refined prompts.

cgi rendering of a gray lamborghini in the rain.  contrasty, epic,
hyper realistic.

Draw me some lambos.

Oh yeah, AI can't draw wheels very well either. Let's go with the low-angle one in the rain.

Time to fire up img2img. Monochrome is nice and all, but these cars deserve bright colors so let's change the prompt a little:

Colorful cgi rendering of a bright green Lamborghini in the rain.
Contrasty. Highly detailed but smooth.

Those aren't really all that colorful and one of the photos looks like a fireplace. Hmmm. Maybe the one with the neat reflection. Let's keep the bright green and instead put the lambo in its natural environment: the garage. Just kidding, the track.

Colorful cgi rendering of a bright green Lamborghini at Suzuka Raceway
in Japan.  Contrasty.  Highly detailed but smooth.

I see some grandstands and some safety barriers. The one with the short depth of field is pretty awesome though the car's front right wheel well may have disappeared. On the plus side, the car is truly bright green with some neat orange accents on the grill. Let's try a different color scheme and maybe see if Stable Diffusion will put a muscle car air cleaner on a Lamborghini frunk (golf clubs need high-psi ventilation too).

Colorful cgi rendering of a Lamborghini in the orange and blue Gulf Oil
racing lervery.  On track at Suzuka Raceway in Japan.  Hood as a large
air cleaner hood scoop for its supercharger.  Contrasty.  Highly detailed
but smooth.

We're still pretty green and not at all looking like a GT40 lookalike. Either 'Gulf Oil livery' isn't well represented in the dataset or we left the strength value too low in trying to preserve our work so far. Stable Diffusion did get creative with a Bugatti-ish redesign.

But what if the car was not on a racetrack in Japan but actually in the Swiss Alps? Well then it'd probably need to be a Lamborghini trophy truck.

Colorful cgi rendering of a lifted Lamborghini trophy truck racing in the
snow in the Swiss Alps.  Contrasty.  Highly detailed but smooth.

We got a couple of low-poly trophy trucks, a near-redraw, and a neat version on a snowy city street. And this is how you can meander for hours without touching a single paintbrush or OpenGL API call.

Additional checkpoints

Stable Diffusion Nissan 350Z drifter elden ring postapocalypse

So this nitrosocke guy created a checkpoint (set of trained models) using Elden Ring images. I envision this as being like transfer learning wherein final model is a generalized network (like a trained VGG-19) that is adapted to a specific dataset.

The Elden Ring style checkpoint is cool and worth dedicating some post to, but it's even more than that. The Elden Ring outputs are really good. That is, the images are more distinct, coherent, and stylistically-accurate than stuff generated by the base model. Of course it would make sense that a specialized network is better at its job.

There's one caveat that I'll mention after some Elden Ring txt2img:

Caveat: Elden Ring characters often armor covering their face and hands so it sidesteps SD's weakness.

Trading Torrent out for a motorcycle

(Torrent is your horse in Elden Ring.) Back to transfer learning, since the Elden Ring checkpoint derives from a more generalized base model, it knows what a motorcycle is.

Source. Kind of like the Thomas mod for Skyrim.

Elden Ring-Mass Effect crossover

Armor makes the jump pretty easily from sci-fi to fantasy.

Caelid Highway

There were some interesting takes on the Z-car photo, including one that had Gran Turismo in the prompt.

Paintball with scythes

The paintball image was easy for SD to redraw in Elden Ring style.

Code

The canned examples are pretty good for one-off runs, but consistency isn't great. A successful generative process requires a lot of runs and a lot of variation. The Stable Diffusion sample scripts are small enough to modify easily, but do require some work (e.g. nested withs that have you starting lines at column 60). I made the following mods:

Parameterized repeat runs (not using batching) as well as iterating over a full directory of inputs.
Parameter randomization (strength, ddim, etc.).
A prompt generator and mutator, e.g. "photo/photograph/35mm photograph of red/green/blue lambo/ferrari/citroen on a street/track/dirt road by greg rutkowski/greg rutkowski/greg rutkowski".

tags:

◄

2022.11.18

Surprise

Just a meme about the World Cup.

2022.12.03

New quest

Finishing Elden Ring - almost - and starting Tiny Tina's Wonderlands.

►

Related / internal

Some posts from this site with similar content.

2022.11.14

Stable Diffusion setup

Getting Stable Diffusion up and running on Ubuntu 22.04 with an RTX3080 Ti.

2022.06.23

Dall-e

Experimenting with Dall-e text inputs, moving a sandbox example offline, and troubleshooting jax/cuda.

2022.08.03

Keras cheat sheet

Examples of keras merging layers, convolutional layers, and activation functions in L, RGB, HSV, and YCbCr.

Related / external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

www.shruggingface.com

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog - In this post, we walk through my entire w...

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog - In this post, we walk through my entire workflow/process for bringing Stable Diffusion to life as a high-quality framed art print. We'll touch on making art with Dreambooth, Stable Diffusion, Outpainting, Inpainting, Upscaling, preparing for print with Photoshop, and finally printing on fine-art paper with an Epson XP-15000 printer.

davidyat.es

Stable Diffusion

On the 22nd of this month, Stability.ai released Stable Diffusion, a free, open-source AI tool for generating images from text prompts (and other images).

simonwillison.net

Stable Diffusion is a really big deal

If you havent been paying attention to whats going on with Stable Diffusion, you really should be. Stable Diffusion is a new text-to-image diffusion model that was released to the

Created 2025.06 from an index of 779,420 pages.

hot

top

Content

navigation

Artist

Format

An abandoned factory

A yellow starfighter

CGI tools

Vocabulary

Skaggy style

Twist of the wrist

Go with what it knows

Drifter

Archangel

AI green screen

Faces and hands

Trading Torrent out for a motorcycle

Elden Ring-Mass Effect crossover

Caelid Highway

Paintball with scythes

Surprise

New quest

Stable Diffusion setup

Dall-e

Keras cheat sheet

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog - In this post, we walk through my entire w...

Stable Diffusion

Stable Diffusion is a really big deal

hot		top
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]

hot		top
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]

hot		top
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]