FAQ: Difference between revisions

Help

Revision as of 17:35, 11 December 2024

Frequently asked questions

What are AI-generated images?

In case you haven't heard, a new type of AI called diffusion models are capable of generating high-quality images. They learn concepts by observing billions of images of everything. While you can have the models generate images entirely for you with text prompting, there are many other means of controlling their output to varying degrees. They can be used as render engines for CAD software, for example.

Under the hood, they're the same type of AI as large language models such as ChatGPT: transformers using the attention mechanism.

What are AI-assisted images?

AI-assisted is used to describe images that involve both human output and AI output. Several examples are:

A drawing, photograph or painting created by a person, then run through a filter using AI
A 3D work created by a person, where textures and lighting are applied by AI as a filter
An AI-generated image that's then reworked or heavily modified by a person

Is generative AI theft, as some say?

No, not metaphorically, not literally. The difference is sharp and easy to see when the distinction is drawn between the reality and the misinformation: They not only don't deprive the original creator of their works, they don't store any portion of those works in any way. They can't be used to recreate works they learned from, outside of maliciously crafting them to do so. The only works that they have a chance of reproducing are cultural cornerstones that are short and heavily used in our media in general, like the phrase "in God we trust" or the lyrics to Happy Birthday.

Another popular claim that falls apart under scrutiny is the idea that copyrighted material is used without permission when, in actuality, no copyright violation happens. Copyrighted material shouldn't be reproduced in whole or part by anyone but the copyright holder, and when it comes to diffusion models and LLMs, none of it is.

This needs to be emphasized because there are false claims that it's already been "proven" that diffusion models store parts of source images that they then stitch together to create new images. It's in fact easy to prove to one's self that it's not possible: by going to civitAI to get a 6gb model, preferably a good finetune of SDXL, then getting a WebUI client like Automatic1111, one can quickly use an 8gb GPU to generate more unique images in unique styles than could possibly be stored in 6gb, even using jpeg compression.

It should also be noted that there is a massive amount of toxic misinformation and behavior from a vocal minority being directed at people using AI tools to make images.

It should also be noted that, despite most of the hate directed at AI art being unwarranted, one legitimate concern is that of styles being ripped off to a shameless degree. While inspiration is one thing, precisely imitating everything specific about another person's style, say by including "in the style of a painting by john doe" in a prompt is, even if not theft, extremely shady. More reputable models obscure or forbid using specific artist's styles, including the one I use.

For the curious, here's a good introductory video by respected mathematics educator Grant Sanderson of 3blue1brown going over how attention-based models work. Lengthy deep-dives into the technology are also available on his channel.

Why is there such venom in the backlash?

The companies that first created them were scummy enough to suggest replacing artists with them. Given that they can only produce things of value in the hands of artists, and are tools for artists, marketing diffusion models as replacements for artists is about the worst kind of sales pitch they could have conceived of.

Unfortunately, anger about that gets misdirected at people making legitimate use of these tools as part of a larger creative process, and detractors feel justified in using any tactics to fight against it. This includes fabricating misinformation and acting as toxic as they feel justified in acting. This is compounded by the fact that it's a disruptive technology to begin with.

Do you support the companies in question?

No, none of them get a red cent from me. I use a free model run locally on a PC with a beefy GPU. I fine-tune it myself, developing a style based on an aesthetic I wanted that no one was doing. I'm not releasing any of my own tunings, nor the techniques I developed to bring things to life.

Will lost jobs become a problem?

No. While a permanent decrease in the number of total jobs would be a problem, the idea that this actually happens is based on the lump of labour fallacy. If it were true, only a small percentage of us would be employed today due to past inventions like the printing press taking all the jobs. The labor market is an actual market, and it behaves like a market. There will be a shift, but it will look like the many other shifts of the same kind that hit the labor market.

Is it artistic?

There are various use cases, creating a spectrum. Text prompting is obviously less artistic than manually editing an image afterward, or creating a textureless 3D scene in Blender and using a diffusion model for the final render. In my current workflow, I'm doing the latter.

Where is the deadpan?

Tucked away for the moment, as to some, this is serious business.

Resources

Stable Diffusion clients

Automatic1111, a WebUI-based client
- ✅ Most popular
- ✅ User friendly
- ❌ Limited control
- ❌ Not updated much anymore
Forge, a fork of Automatic1111
- ✅ Improved performance
- ✅ Fixes some Automatic1111 issues
- ❌ Breaks some Automatic1111 extensions
ComfyUI, a rich but complex client
- ✅ Modular workflows for flexibility
- ✅ Extremely configurable
- ❌ Challenging to learn and use

Diffusion models

CivitAI, the most popular diffusion model repository
- ⚠️ NSFW
HuggingFace, the most popular machine learning resource site

Would you like to know more?

@@ Line 34: / Line 34: @@
 === '''Where is the deadpan?''' ===
 Tucked away for the moment, as to some, this is ''serious business''.
+== '''Resources''' ==
+=== '''Stable Diffusion clients''' ===
+* Automatic1111, a WebUI-based client
+** ✅ Most popular
+** ✅ User friendly
+** ❌ Limited control
+** ❌ Not updated much anymore
+* Forge, a fork of Automatic1111
+** ✅ Improved performance
+** ✅ Fixes some Automatic1111 issues
+** ❌ Breaks some Automatic1111 extensions
+* ComfyUI, a rich but complex client
+** ✅ Modular workflows for flexibility
+** ✅ Extremely configurable
+** ❌ Challenging to learn and use
+=== '''Diffusion models''' ===
+* CivitAI, the most popular diffusion model repository
+** ⚠️ '''NSFW'''
+* HuggingFace, the most popular machine learning resource site
 {{More}}