Stable Diffusion Download [HOT]

0 views

Skip to first unread message

Yuk Walke

unread,

Jan 21, 2024, 5:10:38 PM1/21/24

to beschdirfhocont

Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been open sourced,[8] and it can run on most consumer hardware equipped with a modest GPU with at least 4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.[9][10]

stable diffusion download

DOWNLOAD ……… https://t.co/W6WhHfSimq

The development of Stable Diffusion was funded and shaped by the start-up company Stability AI.[11][10][12][13]The technical license for the model was released by the CompVis group at Ludwig Maximilian University of Munich.[10] Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion.[7] Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project.[7]

Stable Diffusion uses a kind of diffusion model (DM), called a latent diffusion model (LDM) developed by the CompVis group at LMU Munich.[15][8] Introduced in 2015, diffusion models are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder.[16] The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image.[15] Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion.[16] The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space.[16]

With 860 millions of parameters in the U-Net and 123 millions in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs.[17]

Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"),[33] new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging[34] to algorithmically generated music.[35] However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM,[36] which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, which has only about 12 GB.[37]

The Stable Diffusion model supports the ability to generate new images from scratch through the use of a text prompt describing elements to be included or omitted from the output.[8] Existing images can be re-drawn by the model to incorporate new elements described by a text prompt (a process known as "guided image synthesis"[42]) through its diffusion-denoising mechanism.[8] In addition, the model also allows the use of prompts to partially alter existing images via inpainting and outpainting, when used with an appropriate user interface that supports such features, of which numerous different open source implementations exist.[43]

ControlNet[49] is a neural network architecture designed to manage diffusion models by incorporating additional conditions. It duplicates the weights of neural network blocks into a "locked" copy and a "trainable" copy. The "trainable" copy learns the desired condition, while the "locked" copy preserves the original model. This approach ensures that training with small datasets of image pairs does not compromise the integrity of production-ready diffusion models. The "zero convolution" is a 11 convolution with both weight and bias initialized to zero. Before training, all zero convolutions produce zero output, preventing any distortion caused by ControlNet. No layer is trained from scratch; the process is still fine-tuning, keeping the original model secure. This method enables training on small-scale or even personal devices.

It sounds like they use this: GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI
which attempts to automate all the steps, which makes it more difficult to know what actually fails.

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows to apply them to image modification tasks such as inpainting directly without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and spatial downsampling, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve highly competitive performance on various tasks, including unconditional image generation, inpainting, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.

Stable diffusion has 'models' or 'checkpoints' upon which the dataset is trained, these are often very large in size. The out of the box v1.5 model was trained on 2.3 billion images and is around 4GB. Most other models are trained with this model as their base, then you'll get merges of multiple models together and other things like that but ultimately the files will all still be gigabytes in size. Dreambooth is most commonly used to create these models, it can be a destructive process each time a model is altered.

The embedding files (.pt) sit in 'stable-diffusion-webui\embeddings'. The great thing about this method is they are tiny in size (often around 5 - 50kb). Basically you just write the filename into your prompt (minus the .pt) to use it.

For some fun, here is the logo for this project generated with prompt: "a logo for swift diffusion, with a carton animal and a text diffusion underneath". It seems have trouble to understand exactly what "diffusion" is though

There is no comparison. swift-diffusion in current form only supports Linux + CUDA. CPU support will come later and at that time, it can run on Mac / iOS. But to run it efficiently, some ops need to leverage the hardware either as Metal compute kernels or use the neural engine: tinygrad/accel/ane at master geohot/tinygrad GitHub DiffusionBee currently use MPS backend implemented in PyTorch to run on M1 efficiently. I haven't looked too deep into how the MPS backend implemented but would imagine some Metal kernels plus ANE there.

My Mac Mini is a Intel one, so this took about half an hour to finish. Your mileage may vary. Also because my Mac Mini is Intel, it doesn't support Float16, you can switch to Float16 by changing this line: swift-diffusion/main.swift at main liuliu/swift-diffusion GitHub

Getting back to this thread. I launched an app in AppStore based on work in swift-diffusion: , plan to port the features in the app over and make swift-diffusion a complete CLI tool (app is easier as I only need to deal with Apple platforms while CLI requires CUDA as well).

Hey there! I am struggling to get Stable Diffusion to work here on NixOS. It is no problem on Garuda Linux on the same PC, so it should work in theory (at least i think so). I have this issue for month now and I can not find an answer for it. So I am just stuck at this point. The project i want to use is Stable Diffusion Webui by Automatic1111: GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

Just for completeness sake in this discussion. I was trying to get things up and running with a flake which is adapted from the same github repo mentioned in the beginning here: GitHub - virchau13/automatic1111-webui-nix: AUTOMATIC1111/stable-diffusion-webui for CUDA and ROCm on NixOS

Stable Diffusion uses a particular type of generative AI called "diffusion models," named for the process of diffusion to generate new content. Diffusion is a natural phenomenon you've likely experienced before. A good example of diffusion happens if you drop some food coloring into a glass of water. No matter where that food coloring starts, eventually it will spread throughout the entire glass and color the water in a uniform way. In the case of computer pixels, random motion of those pixels will always lead to "TV static." That is the image equivalent of food coloring creating a uniform color in a glass of water. A machine-learning diffusion model works by, oddly enough, destroying its training data by successively adding "TV static," and then reversing this to generate something new. They are capable of generating high-quality images with fine details and realistic textures.