Base models are versatile AI models that are capable of generating a wide range of styles,characters, objects, and other types of content. Stable Diffusion is a popular base model that hasbeen used to train other models in different styles or to improve overall model performance. Thesemodels often have their own VAE (Variable Auto-Encoder) that can be used interchangeably with othermodels to produce slightly different outputs. Unlike Dreambooth models, base models do not requirean activator prompt and can be used in a more flexible way.
The latest version of the Stable Diffusion model will be through the StabilityAI website, asit is a paid platform that helps support the continual progress of the model. You can tryout the Stablitiy AI's website here.
Stable Diffusion 2.1 was released on December 8, 2022. In response to the controversialrelease of 2.0, Stability AI has improved upon their base model and fine-tuned it with aweaker NSFW filter applied to their dataset. This should address many of the criticisms ofthe previous version and result in more accurate generation of human bodies, celebrities,and other pop culture images. As this is a fine-tuned model, there are no major changes toits functionality, and the main purpose is to correct the mistakes of 2.0.
If these fixes are successful, 2.1 will be an excellent model with higher detail and qualityin its outputs, as well as a stronger ability to be trained on specific themes, styles, andobjects using techniques such as Dreambooth, Textual Inversion, and Hypernetworks. You candownload the 2.1 Stable Diffusion model here (requires a free account).
NOTE: In order to use the 2.1 version you will need to include a .yaml fileand rename it either v2-1_512-ema-pruned.yaml or v2-1_768-ema-pruned.yaml for its respective model. You will then simply add this file to the same modelsfolder your .ckpt file is in. Without this file your model will not load.
2.0 has been trained from scratch meaning it has no relation to previous Stable Diffusionmodels and incorporates new technology the OpenCLIP text encoder & the LAION-5Bdataset with NSFW images filtered out. To most peoples surprise, version 2.0actually performs relatively worse in general tests of generating images, particularly withartstyles, celebrities and NSFW images. This is a conscious decision by the Stablility AIteam for a few reasons and in my opinion would be related to legality issues that have arosefrom the growing popularity of AI generation.
There are multiple models available with 2.0 each with a different purpose. The mostinteresting new model is the depth model which is to be used with IMG2IMG and can actuallydetect depth information within and image and manipulate the image while retaining thatdepth information. Depth-to-Image cannot be used with txt-to-image. It can be incredibly useful to edit your image without changing oradding/removing elements that aren't consistent with the original image.
One big improvement is the ability to generate images at 512x512 & 768x768. This means youcan generate higher quality images natively with Stable Diffusion without the need ofupscaling or using something like the "high-res fix" on the AUTOMATIC1111 WebGUI. Be sure toinclude the .yaml files that correspond to each model. You will need to have the.yaml file in the same models folder as the model .ckpt file and name them the same as wellfor the model to work correctly in the webGUI.
At the time of writing, these new models are not compatible with most UI programs as the coremechanics of the model have changed compared to previous models. But it should only be amatter of time before UI's are updated to support this model.
One drawback of this new model is that it will not work as well with NSFW images asStablility AI have purposefully tried to filter out NSFW imagery. This shouldn't be ahorrible thing for most people and for those that do want NSFW images, it will simplyrequire others to train the model on those images for it to improve at them.
StabilityAI themselves have stated that this model is meant to be used as a base for othermodels to be trained on. So while the results of version 2.0 are not as amazing as peoplehave hoped for, it opens the possibility of better dreambooth, fine-tuned, textualinversions and other model training methods to produce greater results.
This model is basically a fork (a branching version) of the Stable Diffusion model that isbased of version 1.5 of SD. It has been trained for 440k more steps than the original 1.5 tospecialise in inpainting and improving the ability to remove, add and replace objects in ascene. The current version is 1.5 but it should not be confused with the original StableDiffusion 1.5.
If you're only looking to generate images, make sure to down load the v1-5-pruned-emaonly.ckpt as it is a smaller file meaning it will use less VRAM. Ifyou plan on training or fine-tuning the model, then you'll need the full v1-5-pruned.ckpt file.
Waifu Diffusion is a model trained on over 50 thousand anime related images and iscontinually being trained with improvements released regularly. It is currently the bestmodel for creating anime characters, but is much weaker at realistic imagery and landscapes.
Waifu Diffusion 1.4 is still in very early stages, but they repo is already created so if youwant to keep up to date on the latest WD version, you can check this repo and download thebeta models as they become available.
As of 03/11/22 there is a ckpt model available on this repository named wd-1-3-penultimate-ucg-cont.ckpt. I have not tested it yet, but I believe it is anextension on the 1.3 model. From my limited testing and thanks to a comment byu/Ok-Power1447, I have tested using CLIP Skip with this model and my piliminary results doshow that using CLIP Skip set to 2 does improve hands on characters.
Version 1.3.5 looks to be an experimental alpha version that hakurei has released while they work on version 1.4. As with the beta 1.4 models, it requires CLIP skip for better results from what I have seen in my own testing. I would recommend CLIP SKIP 2 or 3.
Trinart Stable Diffusion is another anime-based model. Its results are currently less cohesivecompared to Waifu Diffusion, but it can still generate excellent results. It could also give youa unique art style compared to Waifu Diffusion because they trained it on a different dataset.
Eimis's Anime model is based on highly detailed anime images and produces images at the samequality as models like Novel AI's or Anything v3. It's a different style to those models and oneI personally prefer. If you're going for the classic anime look, this may not be for you, but ifyou're wanting detailed Artstation-like anime, this is perfect.
There are two Eimis models, one for a strong anime style and one for a more realistic anime style. Both produce high quality images in an artistic style but will not be able to generate photorealistic images.
Anything 3.0 is a model similar to Waifu Diffusion, but with a more specific anime style. It has gained popularity for its ability to consistently produce high-quality artworks, some of which are on par with the closed-source NovelAI model. However, the distinct style of Anything 3.0 can be limiting, as it only allows for the generation of images in this specific art style, which can become repetitive over time.
Honey Diffusion is a realistic anime-style model with a distinctive style characterized by a more desaturated look and limited facial features. It is known for producing consistently good-quality images with less deformity than other models, making it difficult to create a poor image. The model produces good 3D/2D images, has good anatomy, and produces good reflections in glass and mirrors. It is suitable for both SFW and NSFW images, although there is limited information available on how the model was created.
Myne Factory is an anime model that sets out to differentiate itself from other models. The team behind it aims to create a model that overcomes the shortcomings of other models, particularly the repetitive style and the easy-to-spot look of some anime models. When using Myne Factory, it is recommended that you use Booru style tags when prompting for best results. According to the information page, shorter prompts work better than longer ones.
Version 1.0 is based on the Waifu Diffusion 1.4 model and has many similarities with that model in terms of usage and outputs. However, there are plans for the next major version to be based on Stable Diffusion. It's important to note that you should set Skip Clip to at least 1 and a maximum of 4 for best results. The base resolution to render at is higher than normal and can render at 768x768, as well as larger landscape or portrait images. This model is also meant to be used as a base model, which is suited for further training for specific character models or styles.
Seek.art MEGA is a model that has been fine-tuned on Stable Diffusion v1.5 with the goal of improving the overall quality of images while maintaining the flexibility of Stable Diffusion. It was trained on 10,000 high-quality public domain digital artworks, which is beneficial in the current state of copyright and other issues faced by AI generation. It is recommended to generate images at a size above 640px for optimal results.
F222 is a machine learning model based on SD 1.5 by Zeipher AI. It has been trained on a collection of NSFW (not safe for work) photography and other photographic images, which makes it particularly good at generating nude or semi-nude persons. However, it can also generate clothed individuals with fewer deformities. At the time of writing, I have not been able to find much information on this model, and the creator's website is currently offline. Overall, it seems that F222 is simply an improved version of SD 1.5 with better support for NSFW images.
A model based on Waifu Diffusion 1.2 and trained on 150k images from R34 and gelbooru. As thename suggests, their focus is on hentai related images and improving hands, obscure poses andgeneral consistency of the model.
b1e95dc632