We tried Stable Diffusion 3, and… well…

We tried Stable Diffusion 3, and… well…

Stable Diffusion 3 was introduced as a revolutionary update for the renowned image generator, promising significant improvements in image quality, generation speed, and the ability to better understand textual prompts. The initial demos circulating online are enticing, and the expectations were high. However, upon testing and comparing it with Midjourney 6, a different picture emerges, with results that are, in our opinion, disappointing.

The promises included the elimination of deformed hands and faces, improved rendering of text, and a better understanding of descriptions to truly deliver what was being asked for. The goal was to address the common issues users had encountered in previous versions, such as unnatural-looking hands and facial features, and to enhance the system’s ability to accurately interpret and generate images based on detailed textual prompts.

In this article, we will compare the results of various prompts in both image generation engines

The prompt battle begins

To test the new image generator, we srote this prompt: “closeup of a baker making a dough with his hands, backlight“. Here is Midjourney v6 result:

As you can see, everything is in his right place. The backlight is totally achieved, and the hands are perfect. Let’s see how SD3 performs:

Stable Diffusion 3 continues to have serious issues with hands. In this case, the baker has six fingers on one hand and seven on the other. This image is totally unusable.

Let’s try with something else: “grilled meat skewers and grilles greens“. Midjourney v6 rendered this:

SD3, instead, did this:

Why does the skewer in SD3 have a double stick? Who would prepare a dish like that?

The results obtained so far are certainly not great, but it’s probably also a matter of bad luck. So, we’ve switched to a different genre of images and came up with a prompt of doctors arranged in a circle, with this prompt: “Diverse team of surgeons and medical professionals in surgical masks and scrubs, standing in a ircle, preparing for an operation in a hospital’s operating room“.

Midjourney v6 created this:

On the other hand, SD3 did not understand the fully meaning of the prompt, generating this:

This image is terrible, faces and eyes are not natural, the gazes seem bewildered, distorted. It is disturbing.

The frustration is increasing, but another test is necessary. Here’s the new prompt: “Cheerful couple in bathrobes toasting champagne glasses in a hotel room.
We now know which of the two engines will generate the better image, but we weren’t prepared for this.
While Midjourney v6 did this:

SD3 displaying this:

The hands are deformed, and the facial expression of the man is SCARY. He also has 6 fingers. The woman should not smile at this.

Finally, a last test.This might be disturbing for some of you, so… WARNING, HORROR INCOMING.
We asked for “a woman receiving a back massage in a spa“.
Midjourney v6 gave us this excellent result:

Woman receiving a relaxing massage

The hand is perfect, the texture of the skin, the droplets… Excellent. Now, this is where things get messy, but REALLY messy for SD3.

I… I don’t reallt know what to say. The image speakf for itself.

In conclusion, we can say that SD3 is a vastly inferior product to Midjourney v6, which was released several months before it. In our opinion, it’s a huge disappointment that doesn’t represent even a slight step forward compared to Stable Diffusion XL, its predecessor.