mNo edit summary |
mNo edit summary |
||
Line 37: | Line 37: | ||
Tucked away for the moment, as to some, this is ''serious business''. | Tucked away for the moment, as to some, this is ''serious business''. | ||
== '''Resources''' == | == '''Resources''' == | ||
Before generating your own images, it's '''''important''''' to know that the base models produce ''awful'' results compared to the better community finetunes as found on CivitAI below. | |||
=== '''Stable Diffusion clients''' === | === '''Stable Diffusion clients''' === | ||
* [https://github.com/AUTOMATIC1111/stable-diffusion-webui Automatic1111], a WebUI-based client | * [https://github.com/AUTOMATIC1111/stable-diffusion-webui Automatic1111], a WebUI-based client |
Revision as of 18:27, 11 December 2024
Frequently asked questions
What are AI-generated images?
In case you missed it, a new type of AI called diffusion models are capable of generating high-quality images. They learn concepts by observing billions of images of everything. While you can have the models generate images entirely for you with text prompting, there are many other means of controlling their output to varying degrees. They can be used as render engines for CAD software, for example.
Under the hood, they're the same type of AI as large language models such as ChatGPT: transformers using the attention mechanism.
What are AI-assisted images?
AI-assisted is a term used to describe images that involve both human output and AI output. Several examples are:
- A drawing, photograph or painting created by a person, then run through a filter using AI
- A 3D work created by a person, where textures and lighting are applied by AI as a filter
- An AI-generated image that's then reworked or heavily modified by a person
Is generative AI theft, as some say?
No, not metaphorically, not literally. The difference is sharp and easy to see when the distinction is drawn between the reality and the misinformation: They not only don't deprive the original creator of their works, they don't store any portion of those works in any way. They can't be used to recreate works they learned from, outside of maliciously crafting them to do so. The only works that they have a chance of reproducing are cultural cornerstones that are short and heavily used in our media in general, like the phrase "in God we trust" or the lyrics to Happy Birthday.
Another popular claim that falls apart under scrutiny is the idea that copyrighted material is used without permission when, in actuality, no copyright violation happens. Copyrighted material shouldn't be reproduced in whole or part by anyone but the copyright holder, and when it comes to diffusion models and LLMs, none of it is.
This needs to be emphasized because there are false claims that it's already been "proven" that diffusion models store parts of source images that they then stitch together to create new images. It's in fact easy to prove to one's self that it's not possible: by going to CivitAI NSFW⚠️ to get a 6gb model, preferably a good finetune of SDXL NSFW⚠️, then getting a stable diffusion client like Automatic1111, one can use an 8gb GPU to generate vastly more unique images in unique styles than could possibly be stored as re-usable pieces in 6gb, even using jpeg compression.
It should also be noted that there is a hefty level of toxic misinformation and behavior from a vocal minority being directed at people using AI tools to make images.
It should also be noted that, despite the hate directed at AI images being unwarranted, one legitimate concern is that of styles being ripped off to a shameless degree. While inspiration is one thing, precisely imitating everything specific about another person's style, say by including "in the style of a painting by john doe" in a prompt is, even if not theft, extremely shady. More reputable models obscure or forbid using specific artist's styles, including the one I use.
For the curious, here's a good introductory video by respected mathematics educator Grant Sanderson of 3blue1brown going over how attention-based models work. Lengthy deep-dives into the technology are also available on his channel.
Why is there such venom in the backlash?
The companies that first created them were scummy enough to suggest replacing artists with them. Given that they can only produce things of value in the hands of artists, and are tools for artists, marketing diffusion models as replacements for artists is about the worst kind of sales pitch they could have conceived of.
Unfortunately, anger about that gets misdirected at people making legitimate use of these tools as part of a larger creative process, and detractors feel justified in using any tactics to fight against it. This includes fabricating misinformation and acting as toxic as they feel justified in acting. This is compounded by the fact that it's a disruptive technology to begin with.
Do you support the companies in question?
No, none of them get a red cent from me. I use a free model run locally on a PC with a beefy GPU. I fine-tune it myself, developing a style based on an aesthetic I wanted that no one was doing. I'm not releasing any of my own tunings, nor the techniques I developed to bring things to life.
Will lost jobs become a problem?
No. While a permanent decrease in the number of total jobs would be a problem, the idea that this actually happens is based on the lump of labour fallacy. If it were true, only a small percentage of us would be employed today due to past inventions like the printing press taking all the jobs. The labor market is an actual market, and it behaves like a market.
There will be a shift, but it will look like the many other shifts of the same kind that hit the labor market. It also won't be as pronounced as some fear, as the world is discovering that it takes nearly as much time to get good output from diffusion models as to create by other means.
Is it artistic?
There are various use cases, creating a spectrum. Text prompting is obviously less artistic than manually editing an image afterward, or creating a textureless 3D scene in Blender and using a diffusion model for the final render. In my current workflow, I'm doing the latter.
Where is the deadpan?
Tucked away for the moment, as to some, this is serious business.
Resources
Before generating your own images, it's important to know that the base models produce awful results compared to the better community finetunes as found on CivitAI below.
Stable Diffusion clients
- Automatic1111, a WebUI-based client
- ✅ Most popular
- ✅ User friendly
- ❌ Limited control
- ❌ Not updated much anymore
- Forge, a fork of Automatic1111
- ✅ Improved performance
- ✅ Fixes some Automatic1111 issues
- ❌ Breaks some Automatic1111 extensions
- ComfyUI, a rich but complex client
- ✅ Modular workflows for flexibility
- ✅ Extremely configurable
- ❌ Challenging to learn and use
Diffusion models
- CivitAI, the most popular diffusion model repository
- ⚠️ NSFW
- Hugging Face, the most popular machine learning resource site
Would you like to know more?