Stable Diffusion is a text-to-image deep learning model developed by Stability AI and first released to the public in August 2022. The model returns an image based on a text prompt provided by the user. Stable Diffusion was trained on images in the LAION-5B dataset. Stablility AI intends to open source all of its research. The initial public release of Stable Diffusion was released under a Creative ML OpenRAIL-M license. In July 2023, Stability AI released a major update to the model, called Stable Diffusion XL (SDXL), under the CreativeML OpenRAIL++-M License.
Stable diffusion belongs to a class of deep learning models called diffusion models that utilize math similar to the concept of diffusion in physics. Diffusion models are trained to eliminate Gaussian noise from blurry images. The model is designed to generate images from the ground up, starting with a noisy and blurry initial image. Through iterative refinement, the model progressively enhances the image until it achieves a sharp and clear result that aligns with the user's desired output. At each iteration, the algorithm computes the diffusion coefficient based on the local image characteristics, such as gradients and edges. This coefficient determines the strength and direction of the diffusion, allowing the algorithm to adaptively adjust the smoothing effect across different regions of the image. The diffusion process works by redistributing the pixel values based on local information. The algorithm reduces noise by diffusing pixel values in smooth regions while preserving sharp transitions and edges. This selective smoothing helps to maintain image details and prevent blurring or loss of important features.
SDXL 1.0 is built on a new architecture composed of a 3.5B parameter base model and a 6.6B parameter refiner. The base model generates noisy latents that are further processed with a refinement model specialized for denoising. This two-stage architecture allows for robust image generation at faster speeds without significant compute resources. Stability AI states SDXL 1.0 works effectively on consumer GPUs with 8GB VRAM.
As an open-source model, Stable Diffusion can be trained on user data to align with the types of images they want to create. Stable Diffusion can be downloaded and run locally, accessed via an API, or accessed through Stability AI's official DreamStudio web app. As well as creating images based on a prompt, SDXL can modify existing images in a number of ways:
- Inpainting—making edits inside the image
- Outpainting—extending the image outside of the original image
- Image-to-image—using a sourced image to prompt the model to create a new image
Stable Diffusion has been the subject of controversy over how it sourced its training data and failed to control the use of the model. A number of tutorials show how Stability AI's tools (including DreamStudio) can be used to create deepfake, including fine-tuning the base Stable Diffusion models to generate porn. Stability AI is the subject of lawsuits from several artists and stock photo company Getty Images after the company used their work to train its generative AI models, including Stable Diffusion.
Better Prompt info
February 8, 2023
October 28, 2022
Binxu Wang, John Vastola
November 1, 2022